Filter predictions based on score and rank — apply

Helper function for filtering predictions with score above a certain threshold or rank below some limit rank.

apply_threshold(threshold, limit = NA_real_, base_compare)

Arguments

threshold: A numeric threshold between 0 and 1.
limit: An integer cutoff >= 1 for rank-based thresholding. Requires a column "rank" in input base_compare.
base_compare: A data.frame as created by create_comparison, containing columns "gold", "score".

Value

A data.frame with observations that satisfy (score >= threshold AND (if applicable) rank <= limit) OR gold == TRUE. A new logical column suggested indicates TRUE if score >= threshold AND (if applicable) rank <= limit, and FALSE for false negative observations (that may have no score, a score below the threshold or rank above the limit).

Examples


library(casimir)

gold <- tibble::tribble(
  ~doc_id, ~label_id,
  "A", "a",
  "A", "b",
  "A", "c",
  "B", "a",
  "B", "d",
  "C", "a",
  "C", "b",
  "C", "d",
  "C", "f"
)

pred <- tibble::tribble(
  ~doc_id, ~label_id, ~score,
  "A", "a", 0.9,
  "A", "d", 0.7,
  "A", "f", 0.3,
  "A", "c", 0.1,
  "B", "a", 0.8,
  "B", "e", 0.6,
  "B", "d", 0.1,
  "C", "f", 0.1,
  "C", "c", 0.2,
  "C", "e", 0.2
)

base_compare <- create_comparison(gold, pred)

res_0 <- apply_threshold(
  threshold = 0.3,
  base_compare = base_compare
)