Compute the mean of intermediate results created by compute_intermediate_results.

summarise_intermediate_results(
  intermediate_results,
  propensity_scored = FALSE,
  label_distribution = NULL,
  set = FALSE,
  replace_zero_division_with = options::opt("replace_zero_division_with")
)

Arguments

intermediate_results

As produced by compute_intermediate_results. This requires a list containing:

  • results_table A data.frame with columns "prec", "rprec", "rec", "f1".

  • grouping_var A character vector of variables to group by.

propensity_scored

Logical, whether to use propensity scores as weights.

label_distribution

Expects a data.frame with columns "label_id", "label_freq", "n_docs". label_freq corresponds to the number of occurences a label has in the gold standard. n_docs corresponds to the total number of documents in the gold standard.

set

Logical. Allow in-place modification of intermediate_results. Only recommended for internal package usage.

replace_zero_division_with

In macro averaged results (doc-avg, subj-avg), it may occur that some instances have no predictions or no gold standard. In these cases, calculating precision and recall may lead to division by zero. CASIMiR standardly removes these missing values from macro averages, leading to a smaller support (count of instances that were averaged). Other implementations of macro averaged precision and recall default to 0 in these cases. This option allows to control the default. Set any value between 0 and 1. (Defaults to NULL, overwritable using option 'casimir.replace_zero_division_with' or environment variable 'R_CASIMIR_REPLACE_ZERO_DIVISION_WITH')

Value

A data.frame with columns "metric", "value".