Wrapper for computing n_bt bootstrap replica, combining the functionality of compute_intermediate_results and summarise_intermediate_results.

generate_replicate_results(
  base_compare,
  n_bt,
  grouping_var,
  seed = NULL,
  ps_flags = list(intermed = FALSE, summarise = FALSE),
  label_distribution = NULL,
  cost_fp = NULL,
  replace_zero_division_with = options::opt("replace_zero_division_with"),
  drop_empty_groups = options::opt("drop_empty_groups"),
  progress = options::opt("progress")
)

generate_replicate_results_dplyr(
  base_compare,
  n_bt,
  grouping_var,
  seed = NULL,
  label_distribution = NULL,
  ps_flags = list(intermed = FALSE, summarise = FALSE),
  cost_fp = NULL,
  progress = FALSE
)

Arguments

base_compare

A data.frame as generated by create_comparison.

n_bt

An integer number of resamples to be used for bootstrapping.

grouping_var

A character vector of variables that must be present in base_compare.

seed

A seed passed to resampling step for reproducibility.

ps_flags

A list as returned by set_ps_flags.

label_distribution

Expects a data.frame with columns "label_id", "label_freq", "n_docs". label_freq corresponds to the number of occurences a label has in the gold standard. n_docs corresponds to the total number of documents in the gold standard.

cost_fp

A numeric value > 0, defaults to NULL.

replace_zero_division_with

In macro averaged results (doc-avg, subj-avg), it may occur that some instances have no predictions or no gold standard. In these cases, calculating precision and recall may lead to division by zero. CASIMiR standardly removes these missing values from macro averages, leading to a smaller support (count of instances that were averaged). Other implementations of macro averaged precision and recall default to 0 in these cases. This option allows to control the default. Set any value between 0 and 1. (Defaults to NULL, overwritable using option 'casimir.replace_zero_division_with' or environment variable 'R_CASIMIR_REPLACE_ZERO_DIVISION_WITH')

drop_empty_groups

Should empty levels of factor variables be dropped in grouped set retrieval computation? (Defaults to TRUE, overwritable using option 'casimir.drop_empty_groups' or environment variable 'R_CASIMIR_DROP_EMPTY_GROUPS')

progress

Display progress bars for iterated computations (like bootstrap CI or pr curves). (Defaults to FALSE, overwritable using option 'casimir.progress' or environment variable 'R_CASIMIR_PROGRESS')

Value

A data.frame containing n_bt boot replica of results as returned by compute_intermediate_results and summarise_intermediate_results.

Functions

  • generate_replicate_results_dplyr(): Variant with dplyr based internals rather than collapse internals.