Introduction
This project provides a toolkit for efficiently processing bibliographic
records encoded in MARC 21, which is a popular file format used
to exchange bibliographic data between libraries. In particular, the
command line tool marc21 allows efficient filtering of records and
extraction of data into a rectangular schema. Since the extracted data
is in tabular form, it can be processed with popular frameworks such as
Polars or Tidyverse.
marc21-rs is developed by the Metadata Department of the German
National Library (DNB). It is used for data analysis and for automating
metadata workflows (data engineering) as part of automatic content
indexing.
The source code is licensed under the European Union Public License 1.2.
Getting Started
Installation
Binaries
In order to install the marc21 binary, archives with a precompiled
binary are available for Windows, macOS and Linux.
Two variants are available for Linux: a dynamically linked version and
a fully statically linked version (MUSL). In most cases, the statically
linked version should be preferred, as it is independent of the glibc
version of the host system. The following commands install the binary
into the /usr/local/bin directory:
$ tar xfz marc21-0.4.0-x86_64-unknown-linux-musl.tar.gz
$ sudo install -Dm755 marc21-0.4.0-x86_64-unknown-linux-musl/marc21 \
/usr/local/bin/marc21
From Source
If a Rust toolchain is available, marc21 can also be installed using
the Rust package manager cargo. The project requires a Rust compiler
with a minimum version of 1.93. Use the following command to install
the program with the default features:
$ cargo install marc21-cli
The binary can be built with the following features as needed:
build- Commands and functions that are only needed during the build process
or packaging are activated with the
buildfeature. This includes, for example, the commands for generating man pages (build-man) and shell completions (build-completion). performant- This feature activates optimizations aimed at improving performance. This includes, for example, the activation of SIMD or a more aggressive inline strategy. Since the main goal of the project is high performance, the feature is enabled by default.
unstable- New features that are still in the testing phase can be activated
using the
unstablefeature. Keep in mind that these functions may change at any time.
First Steps
This section provides an overview of working with the marc21 command
line tool. It demonstrates important commands using simple use cases.
An in-depth explanation of the concepts, in particular the structure of
filter expressions, has been omitted for brevity.
The marc21 tool provides various commands for processing MARC 21
records (see marc21 --help for a complete list of available commands).
Concatenate Multiple Files
The concat command can be used to combine multiple files into a
single output. In the following example, the authority data files from
the Integrated Authority Files (GND) are concatenated into the single
file GND.mrc.gz.
$ marc21 concat -ps authorities-gnd-*.mrc.gz -o GND.mrc.gz
10,122,437 records, 0 invalid | 49,035 records/s, elapsed: 00:03:19
The --skip-invalid (-s) option is used to skip invalid records that
could not be decoded. If the option is not specified, processing will
abort at the first invalid record. In addition, the processing progress
can be displayed with the --progress (-p) option.
Filtering Records
The filter command extracts those records that fulfill a specified
condition. For example, all records with status z and at least
one field 100 with indicators 1 and # (space) can be filtered
as follows:
$ marc21 filter -s 'ldr.status == "z" && 100/1#?' DUMP.mrc.gz -o out.mrc
Operators
The comparison operators ==, !=, >=, >, <=, and < can be
used for values in selected leader fields, values in control fields, and
values in subfields. Here are a few examples
$ marc21 filter -s '100/1#.a == "Lovelace, Ada"' DUMP.mrc.gz -o out.mrc
$ marc21 filter -s '100/*.a != "Curie, Marie"' DUMP.mrc.gz -o out.mrc
$ marc21 filter -s '001 == "119232022"' DUMP.mrc.gz -o out.mrc
$ marc21 filter -s 'ldr.length > 3000' DUMP.mrc.gz -o out.mrc
$ marc21 filter -s 'ldr.status == "z"' DUMP.mrc.gz -o out.mrc
To check whether a value (control field or data field) comes from a
specified list, the in operator is used. In contrast, the not in
operator checks whether a value is not contained in the list. The
following example tests whether a field 100 exists that has a subfield
a with the value “Curie, Marie” or “Lovelace, Ada”:
$ marc21 filter -s '100/*.a in ["Lovelace, Ada", "Curie, Marie"]' \
DUMP.mrc.gz -o out.mrc
The =? operator and, in negated form, !? perform a substring search
on subfield values. These operators allow simultaneous searching for
multiple patterns by using the []-notation:
$ marc21 filter -s '100/*.a =? ["Hate", "Love"]' DUMP.mrc.gz -o out.mrc.gz
$ marc21 filter -s '100/1#.a =? "Love"' DUMP.mrc.gz -o out.mrc.gz
Subfield values can be checked against one or a set of regular
expressions. The filter expression uses the =~ operator or the !~
operator in negated form. The underlying regex engine does not support
all regex features; please refer to the specification to learn
more about the syntax and possible limitations. The following example
searches for all records with a field 533 that contains a subfield n
whose value matches the regular expression for an ISBN.
$ marc21 filter -s \
'533.n =~ "(?i)ISBN(?:-1[03])?(?::?\\s*)?\\s(?:97[89][-\ ]?)?\\d{1,5}[-\\ ]?(?:\\d+[-\\ ]?){2}(?:\\d|X)"' \
DUMP.mrc.gz -o out.mrc.gz
To test whether a subfield value begins with a prefix or not, the =^
operator or, in its negated form, the !^ operator is used:
$ marc21 filter -s '400/1#.a =^ "Love"' DUMP.mrc.gz -o out.mrc.gz
$ marc21 filter -s '400/1#.a =^ ["Hate", "Love"]' DUMP.mrc.gz -o out.mrc.gz
$ marc21 filter -s '400/1#{ [ac] =^ "Count" }' DUMP.mrc.gz -o out.mrc.gz
In contrast, the =$ operator can be used to check whether a subfield
value ends with a specific suffix. Keep in mind that the $ character
often has a special meaning on the command line and may need to be
escaped.
$ marc21 filter -s '548.4 =$ "/gnd#dateOfBirthAndDeath"' DUMP.mrc.gz -o out.mrc.gz
$ marc21 filter -s '401/1#.a !$ "Ada"' DUMP.mrc.gz -o out.mrc.gz
Similarity comparisons between character strings are performed using
the =* operator (in negated form !*). The normalized Levenshtein
distance is calculated between the subfield value and the comparison
value. If this is greater than the specified threshold value, the
comparison is considered a match. The default threshold value is 0.8
and can be changed using the command line option --strsim-threshold:
$ marc21 filter -s --strsim-threshold 0.9 '100/1#.a =* "Lovelace, Bda"' \
DUMP.mrc.gz -o out.mrc.gz
Transforming records into CSV/TSV format
In the fields of data science and data engineering, it is essential that data be organized in a rectangular table schema (similar to a relational database). If the data to be analyzed is available in this format, efficient tools such as Polars can be used to perform data analysis on the underlying data. Using the select command, records can be efficiently transformed into a tabular format. By default, the output is written in CSV format.
The following example demonstrates how to create a table in CSV format,
where the first column (cn) contains the control number of the record,
the second column (label) contains the name of the authority record,
and the third column (gndsys) contains the GND classification. Since
multiple notations from the GND classification can be assigned to a
single authority record, the output generates multiple rows for these
records.
$ marc21 select -ps --header 'cn,label,gndsys' \
'001, 150.a, 065{ a | 2 == "sswd" }' DUMP.mrc.gz -o out.csv
207,505 records, 0 invalid | 102,139 records/s, elapsed: 00:00:01
$ cat out.csv
cn,label,gndsys
040000028,A 302 D,31.9b
040000230,Aargauer,17.1
040000303,Abakus,28
040000443,Abbildung,28
040000540,ABC-Schutz,7.15a
040000540,ABC-Schutz,8.4
040000567,ABC-Waffen,8.4
040000656,Abdichtung,31.3b
040000656,Abdichtung,31.6
...
Summary Statistics
The frequency command (alias freq) is used to calculate frequency
tables based on the values (columns) of a query expression. The output
is in CSV/TSV format and sorted in descending order.
The following example generates a frequency table of the combinations
gndgen and gndspec (subfield 2) found in subfield b of field
065:
$ marc21 frequency -ps -H 'gndgen,gendspec,count' \
'065{ b | 2 == "gndgen" }, 065{ b | 2 == "gndspec" }' GND.mrc.gz \
-o out.csv.gz
10,220,897 records, 0 invalid | 495,993 records/s, elapsed: 00:00:20
$ zcat out.csv.gz | head -10
gndgen,gndspec,count
p,piz,6522289
b,kiz,910734
f,vie,807018
b,,375426
u,wim,330810
u,wit,242153
g,gik,180608
s,saz,129500
b,kio,115566
Counting Records
The number of records contained in the input can be determined using the count command:
$ marc21 count GND.mrc.gz
10329438
The --where option can be used to count only those records that match
a certain criterion:
$ marc21 count GND.mrc.gz --where 'ldr.type == "z" && 075{ b == "gik" && 2 == "gndspec" }'
179672
Print Records
The print command output records in a human-readable format. The
leader, control and data fields are written on a separate line.
Consecutive records are divided by a blank line. The output of the
command can be used in combination with standard UNIX tools such as
grep, cut or sed. In the following example, a single data record
is printed on the console:
$ marc21 print tests/data/ada.mrc --where '100/*.a =? "Love"'
LDR 03612nz a2200589nc 4500
001 119232022
003 DE-101
005 20250720173911.0
008 950316n||azznnaabn | aaa |c
024/7# $a 119232022 $0 http://d-nb.info/gnd/119232022 $2 gnd
035 $a (DE-101)119232022
035 $a (DE-588)119232022
035 $z (DE-588)172642531
035 $z (DE-588a)172642531 $9 v:zg
035 $z (DE-588a)119232022 $9 v:zg
035 $z (DE-588c)4370325-2 $9 v:zg
040 $a DE-386 $c DE-386 $9 r:DE-576 $b ger $d 1841
042 $a gnd1
043 $c XA-GB
065 $a 28p $2 sswd
065 $a 9.5p $2 sswd
075 $b p $2 gndgen
075 $b piz $2 gndspec
079 $a g $q f $q s $q z $u w $u k $u v
100/1# $a Lovelace, Ada $d 1815-1852
375 $a 2 $2 iso5218
400/1# $a Lovelace, Augusta Ada of $d 1815-1852
400/1# $a Lovelace, Ada Augusta of $d 1815-1852
400/1# $a Byron, Ada $d 1815-1852
400/1# $a Byron King, Augusta Ada $d 1815-1852
400/1# $a King, Augusta Ada $d 1815-1852
400/1# $a King, Ada $d 1815-1852
400/1# $a King-Noel, Augusta Ada $c Countess of Lovelace $d 1815-1852
...
Partitioning
The input can be split into different subsets based on the values of a field or subfield using the partition command. For example, the following command partitions the authority records based on the GND classifications (field 065):
$ marc21 partition -ps '065{ a | 2 == "sswd" }' \
authorities-gnd-sachbegriff_dnbmarc.mrc.gz -o out
207,505 records, 0 invalid | 100,033 records/s, elapsed: 00:00:01
$ tree out
out
├── 00.mrc
├── 10.10.mrc
├── 10.11a.mrc
├── 10.11b.mrc
├── 10.11c.mrc
...
├── 9.5b.mrc
├── 9.5c.mrc
└── 9.5p.mrc
1 directory, 346 files
Since the path expression for a record can produce multiple values, the partitions are generally not disjoint. If a value occurs multiple times for a record, the record is written to the respective partition only once.
Reference
Commands
The marc21 tool provides the following commands:
- concat — Concatenate records from multiple inputs (alias
cat) - count — Print the number of records in the input data (alias
cnt) - filter — Filter records that fulfill a specified condition
- hash — Compute SHA-256 checksum of records
- invalid — Output invalid records that cannot be decoded
- print — Print records in human readable format
- sample — Select a random permutation of records
- split — Split the input into chunks of a given size
marc21-concat(1)
NAME
marc21-concat — Concatenate records from multiple inputs
SYNOPSIS
marc21 concat [options] [path]…marc21 cat [options] [path]…
DESCRIPTION
The concat command is used to combine records from multiple files into
a single file or output (stdout).
OPTIONS
-a,--append- Append to the given file, do not overwrite. This option is not
supported when writing to Gzip compressed output. When writing to
stdoutthis flag is ignored. --tee <path>- Write to the output and the file
<path>at the same time. This option can be particularly useful when the output is written tostdoutfor further processing in a pipeline, but the output is also needed for following processing step.
FILTER OPTIONS
-l,--limit <n>- Limit the result to first
<n>records (a limit value0means no limit) -s,--skip-invalid- Skip invalid records that can’t be decoded
--strsim-threshold- The minimum score for string similarity comparisons (0 <= score <= 100)
--where- An expression for filtering records
--filter-normalization <form>- Transliterate the given filter or query expression into the specified
Unicode normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
In the following example, the five files dnb_all_dnbmarc.1.mrc.gz
to dnb_all_dnbmarc.5.mrc.gz are concatenated into a single file
DNB.mrc.gz. Invalid data records are skipped (option -s):
$ marc21 concat -s dnb_all_dnbmarc.*.mrc.gz -o DNB.mrc.gz
marc21-count(1)
NAME
marc21-count — Print the number of records in the indput data.
SYNOPSIS
marc21 count [options] [path]…marc21 cnt [options] [path]…
DESCRIPTION
The count is used to determine the number of records contained in
the input.
OPTIONS
FILTER OPTIONS
-l,--limit <n>- Limit the result to first
<n>records (a limit value0means no limit) -s,--skip-invalid- Skip invalid records that can’t be decoded
--strsim-threshold- The minimum score for string similarity comparisons (0 <= score <= 100)
--where- An expression for filtering records
--filter-normalization <form>- Transliterate the given filter or query expression into the specified
Unicode normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
tba
marc21-dedup(1)
NAME
marc21-dedup — Remove duplicate records from the input
SYNOPSIS
marc21 count [OPTIONS] [PATH]…
DESCRIPTION
This command deduplicates records that occur multiple times. Duplicates are identified by comparing the control number (field 001) of a record.
OPTIONS
FILTER OPTIONS
-l,--limit <n>- Limit the result to first
<n>records (a limit value0means no limit) -s,--skip-invalid- Skip invalid records that can’t be decoded
--strsim-threshold- The minimum score for string similarity comparisons (0 <= score <= 100)
--where- An expression for filtering records
--filter-normalization <form>- Transliterate the given filter or query expression into the specified
Unicode normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
In the following example, all duplicate records found in the input
files s1.mrc and s2.mrc are removed and written to the output file
out.mrc:
$ marc21 dedup s1.mrc s2.mrc -o out.mrc
marc21-describe(1)
NAME
marc21-describe — Creates a frequency table of all subfield codes
SYNOPSIS
marc21 describe [OPTIONS] [PATH]…
DESCRIPTION
The describe command creates a table that lists, for each field, how
often a subfield code appears in the input. Since subfields appear only
in the data fields, control fields are not included in the output. The
columns ind1 and ind2 contain the values of the indicators.
OPTIONS
--tsv- Write output tab-separated (TSV)
-o,--output <path>- Write output to
<path>instead ofstdout. If the filename ends in.tsvor.tsv.gz, the output is automatically saved in TSV format. The output is gzip-compressed when the filename ends with.gz.
FILTER OPTIONS
-l,--limit <n>- Limit the result to first
<n>records (a limit value0means no limit) -s,--skip-invalid- Skip invalid records that can’t be decoded
--strsim-threshold- The minimum score for string similarity comparisons (0 <= score <= 100)
--where- An expression for filtering records
--filter-normalization <form>- Transliterate the given filter or query expression into the specified
Unicode normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
$ marc21 describe -s GND.mrc -o out.csv
10,220,897 records, 0 invalid | 472,874 records/s, elapsed: 00:00:21
$ head -10 out.csv
field,ind1,ind2,0,2,3,4,5,9,S,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,z
024,7, ,10220897,10863183,0,0,0,198362,0,10863183,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
034, , ,132557,135686,50,0,0,136105,0,0,0,0,136068,136068,136068,136068,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
035, , ,0,0,0,0,0,5495629,0,20441794,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6637796
040, , ,0,0,0,0,0,10220996,0,10220901,10220897,10220901,10220897,4524432,260659,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
042, , ,0,0,0,0,0,0,0,10220897,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
043, , ,0,0,0,0,0,15,0,0,0,9157636,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
065, , ,0,2617204,0,0,0,0,0,2617204,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
075, , ,0,20051640,0,0,0,0,0,0,20080983,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
079, , ,0,0,0,0,0,0,0,10220897,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12469759,0,0,0,5126056,0,0,0,0
marc21-filter(1)
NAME
marc21-filter — Filter records that fulfill a specified condition
SYNOPSIS
marc21 filter [options] [path]…
DESCRIPTION
tba
OPTIONS
--filter-normalization <form>- Transliterate the given filter expression into the specified Unicode
normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
tba
marc21-frequency(1)
NAME
marc21-frequency — Compute a frequency table of values
SYNOPSIS
marc21 frequency [OPTIONS] <QUERY> [PATH]…marc21 freq [OPTIONS] <QUERY> [PATH]…
DESCRIPTION
This command computes a frequency table over all values (columns) of the given query expression. The resulting frequency table is sorted in descending order (the most frequent value is printed first). If the count of two or more subfield values is equal, these lines are given in lexicographical order. The set of data fields, which are included in the result of a record, can be restricted by an optional predicate.
ARGUMENTS
<QUERY>- A MARC-21 query expression.
OPTIONS
-H,--header <header>- Insert a header row before the data. The header should be entered as a comma-separated list. Leading and trailing spaces in each column are automatically removed.
--tsv- Write output tab-separated (TSV)
-o,--output <path>- Write output to
<path>instead ofstdout. If the filename ends in.tsvor.tsv.gz, the output is automatically saved in TSV format. The output is gzip-compressed when the filename ends with.gz.
FILTER OPTIONS
-l,--limit <n>- Limit the result to first
<n>records (a limit value0means no limit) -s,--skip-invalid- Skip invalid records that can’t be decoded
--strsim-threshold- The minimum score for string similarity comparisons (0 <= score <= 100)
--where- An expression for filtering records
--filter-normalization <form>- Transliterate the given filter or query expression into the specified
Unicode normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
The following example creates a frequency table based on the year of the last update (field 005/00-04).
$ marc21 frequency -s -H 'year,count' '005[0:4]' GND.mrc`
year,count
2025,1193157
2024,1131644
2021,854178
2022,848635
2023,760070
2016,734399
2010,564136
2017,522303
2020,498302
2008,465916
2019,423590
2011,423077
2014,422959
2018,375568
2013,295991
2015,245866
2026,221200
2012,135738
2009,104168
marc21-hash(1)
NAME
marc21-hash — Compute SHA-256 checksum of records
SYNOPSIS
marc21 hash [options] [path]…
DESCRIPTION
tba
OPTIONS
FILTER OPTIONS
-l,--limit <n>- Limit the result to first
<n>records (a limit value0means no limit) -s,--skip-invalid- Skip invalid records that can’t be decoded
--strsim-threshold- The minimum score for string similarity comparisons (0 <= score <= 100)
--where- An expression for filtering records
--filter-normalization <form>- Transliterate the given filter or query expression into the specified
Unicode normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
tba
marc21-invalid(1)
NAME
marc21-invalid — Output invalid records that cannot be decoded
SYNOPSIS
marc21 invalid [options] [path]…
DESCRIPTION
tba
OPTIONS
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
tba
marc21-partition(1)
NAME
marc21-partition — Partition records by values.
SYNOPSIS
marc21 partition [OPTIONS] [PATH]…
DESCRIPTION
The partitions are written to the <outdir> directory. The filename can
be changed using the --template option. By default, the partitions are
saved with the corresponding value and the .mrc file extension.
If a record doesn’t have the field/subfield, the record won’t be written to a partition. A record with multiple values will be written to each partition; thus the partitions may not be disjoint. In order to prevent duplicate records in a partition , all duplicate values of a record will be removed automatically.
ARGUMENTS
<PATH>- A MARC-21 Path expression.
OPTIONS
--template <string>- A template for naming the individual partitions. The placeholder
{}is replaced by the value of the path expression. If the template ends with the suffix.gz, the partitions are compressed in Gzip format. -o,--output <path>- Write output to
<path>; by default all partitions are written to the current working directory.
FILTER OPTIONS
-l,--limit <n>- Limit the result to first
<n>records (a limit value0means no limit) -s,--skip-invalid- Skip invalid records that can’t be decoded
--strsim-threshold- The minimum score for string similarity comparisons (0 <= score <= 100)
--where- An expression for filtering records
--filter-normalization <form>- Transliterate the given filter or query expression into the specified
Unicode normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
In the following example, all authority records are partitioned based on the date of the last record transaction (field 005), using only the year (positions 0 through 3) as the values:
$ marc21 partition -ps '005[0:4]' authorities-gnd-sachbegriff_dnbmarc.mrc.gz -o out
207,505 records, 0 invalid | 111,473 records/s, elapsed: 00:00:01
$ tree out
out
├── 2009.mrc
├── 2010.mrc
├── 2011.mrc
├── 2012.mrc
├── 2013.mrc
├── 2014.mrc
├── 2015.mrc
├── 2016.mrc
├── 2017.mrc
├── 2018.mrc
├── 2019.mrc
├── 2020.mrc
├── 2021.mrc
├── 2022.mrc
├── 2023.mrc
├── 2024.mrc
├── 2025.mrc
└── 2026.mrc
1 directory, 18 files
marc21-print(1)
NAME
marc21-print — Print records in human readable format
SYNOPSIS
marc21 print [options] [path]…
DESCRIPTION
This command print records in human readable format.
OPTIONS
--translit <form>- Transliterate the output into the specified Unicode normal form.
Possible values:
nfd,nfkd,nfc,nfkc.
FILTER OPTIONS
-l,--limit <n>- Limit the result to first
<n>records (a limit value0means no limit) -s,--skip-invalid- Skip invalid records that can’t be decoded
--strsim-threshold- The minimum score for string similarity comparisons (0 <= score <= 100)
--where- An expression for filtering records
--filter-normalization <form>- Transliterate the given filter or query expression into the specified
Unicode normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
The following command prints the record from the file ada.mrc to the
console:
$ marc21 print tests/data/ada.mrc
LDR 03612nz a2200589nc 4500
001 119232022
003 DE-101
005 20250720173911.0
008 950316n||azznnaabn | aaa |c
024/7# $a 119232022 $0 http://d-nb.info/gnd/119232022 $2 gnd
035 $a (DE-101)119232022
035 $a (DE-588)119232022
035 $z (DE-588)172642531
035 $z (DE-588a)172642531 $9 v:zg
...
marc21-sample(1)
NAME
marc21-sample — Select a random permutation of records
SYNOPSIS
marc21 sample [options] [path]…
DESCRIPTION
tba
OPTIONS
FILTER OPTIONS
-l,--limit <n>- Limit the result to first
<n>records (a limit value0means no limit) -s,--skip-invalid- Skip invalid records that can’t be decoded
--strsim-threshold- The minimum score for string similarity comparisons (0 <= score <= 100)
--where- An expression for filtering records
--filter-normalization <form>- Transliterate the given filter or query expression into the specified
Unicode normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
tba
marc21-select(1)
NAME
marc21-select — Transforms records into CSV or TSV format
SYNOPSIS
marc21 select [OPTIONS] <QUERY> [PATH]…\
DESCRIPTION
This command allows you to efficiently transform records into a rectangular table schema. By default, the output is in CSV format.
ARGUMENTS
<QUERY>- A MARC-21 query expression.
OPTIONS
-H,--header <header>- Insert a header row before the data. The header should be entered as a comma-separated list. Leading and trailing spaces in each column are automatically removed.
--tsv- Write output tab-separated (TSV)
-o,--output <path>- Write output to
<path>instead ofstdout. If the filename ends in.tsvor.tsv.gz, the output is automatically saved in TSV format. The output is gzip-compressed when the filename ends with.gz.
FILTER OPTIONS
-l,--limit <n>- Limit the result to first
<n>records (a limit value0means no limit) -s,--skip-invalid- Skip invalid records that can’t be decoded
--strsim-threshold- The minimum score for string similarity comparisons (0 <= score <= 100)
--where- An expression for filtering records
--filter-normalization <form>- Transliterate the given filter or query expression into the specified
Unicode normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
This example demonstrates how to create a table in CSV format, where
the first column (cn) contains the control number of the record,
the second column (label) contains the name of the authority record,
and the third column (gndsys) contains the GND classification. Since
multiple notations from the GND classification system can be assigned
to a single data record, the output generates multiple rows for these
data records.
$ marc21 select -ps --header 'cn,label,gndsys' \
'001, 150.a, 065{ a | 2 == "sswd" }' DUMP.mrc.gz -o out.csv
207,505 records, 0 invalid | 102,139 records/s, elapsed: 00:00:01
$ head -10 out.csv
cn,label,gndsys
040000028,A 302 D,31.9b
040000230,Aargauer,17.1
040000303,Abakus,28
040000443,Abbildung,28
040000540,ABC-Schutz,7.15a
040000540,ABC-Schutz,8.4
040000567,ABC-Waffen,8.4
040000656,Abdichtung,31.3b
040000656,Abdichtung,31.6
marc21-split(1)
NAME
marc21-split — Split the input into chunks of a given size
SYNOPSIS
marc21 split [options] [path]…
DESCRIPTION
tba
OPTIONS
FILTER OPTIONS
-l,--limit <n>- Limit the result to first
<n>records (a limit value0means no limit) -s,--skip-invalid- Skip invalid records that can’t be decoded
--strsim-threshold- The minimum score for string similarity comparisons (0 <= score <= 100)
--where- An expression for filtering records
--filter-normalization <form>- Transliterate the given filter or query expression into the specified
Unicode normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
tba