A subset of labels used in the catalogue of the DNB along with their frequencies of occurrence. The label_ids match those in the dnb_gold_standard and dnb_test_predictions datasets.

dnb_label_distribution

Format

dnb_label_distribution

A data frame with 7,772 rows and 3 columns:

label_id

DNB identifier of a concept in the GND subject vocabulary.

label_freq

Number of occurences of the specified label in the overall catalogue.

n_docs

Overall number of documents in the ground truth dataset.