Determine the appropriate grouping variables for each aggregation mode.

set_grouping_var(mode, doc_groups, label_groups, var = NULL)

Arguments

mode

One of the following aggregation modes: "doc-avg", "subj-avg", "micro".

doc_groups

A two-column data.frame with a column "doc_id" and a second column defining groups of documents to stratify results by. It is recommended that groups are of type factor so that levels are not implicitly dropped during bootstrap replications.

label_groups

A two-column data.frame with a column "label_id" and a second column defining groups of labels to stratify results by. Results in each stratum will restrict gold standard and predictions to the specified label groups as if the vocabulary was consisting of the label group only. All modes "doc-avg", "subj-avg", "micro" are supported within label strata. Nevertheless, mixing mode = "doc-avg" with fine-grained label strata can result in many missing values on document-level results. Also rank-based thresholding (e.g. top 5) will result in inhomogeneous numbers of labels per document within the defined label strata. mode = "subj-avg" or mode = "micro" can be more appropriate in these circumstances.

var

Additional variables to include.

Value

A character vector of variables determining the grouping structure.