DNB label distribution for computing propensity scored metrics
dnb_label_distribution.Rd
A subset of labels used in the catalogue of the DNB along with their
frequencies of occurrence. The label_ids match those in the
dnb_gold_standard and dnb_test_predictions datasets.
dnb_label_distribution
Format
dnb_label_distribution
A data frame with 7,772 rows and 3 columns:
label_id
DNB identifier of a concept in the GND subject
vocabulary.
label_freq
Number of occurences of the specified label in the
overall catalogue.
n_docs
Overall number of documents in the ground truth
dataset.