compute_intermediate_results.RdCompute intermediate set retrieval results per group such as number of gold standard and predicted labels, number of true positives, false positives and false negatives, precision, R-precision, recall and F1 score.
compute_intermediate_results(
gold_vs_pred,
grouping_var,
propensity_scored = FALSE,
cost_fp = NULL,
drop_empty_groups = options::opt("drop_empty_groups"),
check_group_names = options::opt("check_group_names")
)
compute_intermediate_results_dplyr(
gold_vs_pred,
grouping_var,
propensity_scored = FALSE,
cost_fp = NULL
)A data.frame with logical columns "suggested",
"gold" as produced by create_comparison.
A character vector of grouping variables that must be
present in gold_vs_pred (dplyr version requires rlang symbols).
Logical, whether to use propensity scores as weights.
A numeric value > 0, defaults to NULL.
Should empty levels of factor variables be dropped in grouped set retrieval
computation? (Defaults to TRUE, overwritable using option 'casimir.drop_empty_groups' or environment variable 'R_CASIMIR_DROP_EMPTY_GROUPS')
Perform replacement of dots in grouping columns. Disable for faster
computation if you can make sure that all columns used for grouping
("doc_id", "label_id", "doc_groups", "label_groups") do not contain
dots. (Defaults to TRUE, overwritable using option 'casimir.check_group_names' or environment variable 'R_CASIMIR_CHECK_GROUP_NAMES')
A list of two elements:
results_table A data.frame with columns "n_gold",
"n_suggested", "tp", "fp", "fn", "prec", "rprec", "rec", "f1".
grouping_var The input vector grouping_var.
compute_intermediate_results_dplyr(): Variant with dplyr based
internals rather than collapse internals.
library(casimir)
gold <- tibble::tribble(
~doc_id, ~label_id,
"A", "a",
"A", "b",
"A", "c",
"B", "a",
"B", "d",
"C", "a",
"C", "b",
"C", "d",
"C", "f"
)
pred <- tibble::tribble(
~doc_id, ~label_id,
"A", "a",
"A", "d",
"A", "f",
"B", "a",
"B", "e",
"C", "f"
)
gold_vs_pred <- create_comparison(gold, pred)
compute_intermediate_results(gold_vs_pred, "doc_id")
#> $results_table
#> # A tibble: 3 × 12
#> doc_id n_gold n_suggested tp fp fn delta_relevance rprec_deno prec
#> <chr> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 A 3 3 1 2 2 0 3 0.333
#> 2 B 2 2 1 1 1 0 2 0.5
#> 3 C 1 4 1 3 0 0 1 0.25
#> # ℹ 3 more variables: rprec <dbl>, rec <dbl>, f1 <dbl>
#>
#> $grouping_var
#> [1] "doc_id"
#>