Internal wrapper for computing bootstrapping results on one sample, combining the functionality of compute_intermediate_results and summarise_intermediate_results.

helper_f(
  sampled_id_list,
  compare_cpy,
  grouping_var,
  label_distribution = NULL,
  ps_flags = list(intermed = FALSE, summarise = FALSE),
  cost_fp = NULL,
  replace_zero_division_with = options::opt("replace_zero_division_with"),
  drop_empty_groups = options::opt("drop_empty_groups")
)

Arguments

sampled_id_list

A list of all doc_ids of this bootstrap.

compare_cpy

As created by create_comparison.

grouping_var

A vector of variables to be used for aggregation.

label_distribution

Expects a data.frame with columns "label_id", "label_freq", "n_docs". label_freq corresponds to the number of occurences a label has in the gold standard. n_docs corresponds to the total number of documents in the gold standard.

ps_flags

A list as returned by set_ps_flags.

cost_fp

A numeric value > 0, defaults to NULL.

replace_zero_division_with

In macro averaged results (doc-avg, subj-avg), it may occur that some instances have no predictions or no gold standard. In these cases, calculating precision and recall may lead to division by zero. CASIMiR standardly removes these missing values from macro averages, leading to a smaller support (count of instances that were averaged). Other implementations of macro averaged precision and recall default to 0 in these cases. This option allows to control the default. Set any value between 0 and 1. (Defaults to NULL, overwritable using option 'casimir.replace_zero_division_with' or environment variable 'R_CASIMIR_REPLACE_ZERO_DIVISION_WITH')

drop_empty_groups

Should empty levels of factor variables be dropped in grouped set retrieval computation? (Defaults to TRUE, overwritable using option 'casimir.drop_empty_groups' or environment variable 'R_CASIMIR_DROP_EMPTY_GROUPS')

Value

A data.frame as returned by summarise_intermediate_results.