marc21-dedup(1)
NAME
marc21-dedup — Remove duplicate records from the input
SYNOPSIS
marc21 count [OPTIONS] [PATH]…
DESCRIPTION
This command deduplicates records that occur multiple times. Duplicates are identified by comparing the control number (field 001) of a record.
OPTIONS
FILTER OPTIONS
-l,--limit <n>- Limit the result to first
<n>records (a limit value0means no limit) -s,--skip-invalid- Skip invalid records that can’t be decoded
--strsim-threshold- The minimum score for string similarity comparisons (0 <= score <= 100)
--where- An expression for filtering records
--filter-normalization <form>- Transliterate the given filter or query expression into the specified
Unicode normal form. Possible values:
nfd,nfkd,nfc,nfkc. This option can also be specified by setting the environment variableMARC21_FILTER_NORMALIZATION.
COMMON OPTIONS
-p,--progress- If set, show a progress bar
-c,--compression- Specify compression level (0..=9)
EXIT STATUS
0— Command succeeded.1— Command failed.
EXAMPLES
In the following example, all duplicate records found in the input
files s1.mrc and s2.mrc are removed and written to the output file
out.mrc:
$ marc21 dedup s1.mrc s2.mrc -o out.mrc