dcmp

SYNOPSIS

dcmp [OPTION] SRC DEST

DESCRIPTION

Parallel MPI application to compare two files or to recursively compare files with same relative paths within two different directories.

dcmp provides functionality similar to a recursive cmp(1). It reports how many files in two different directories are the same or different.

dcmp can be configured to compare a number of different file properties.

OPTIONS

-o, --output EXPR:FILE

Writes list of files matching expression EXPR to specified FILE. The expression consists of a set of fields and states described below. More than one -o option is allowed in a single invocation, in which case, each option should provide a different output file name.

-t, --text

Change --output to write files in text format rather than binary.

-b, --base

Enable base checks and normal stdout results when --output is used.

--bufsize SIZE

Set the I/O buffer to be SIZE bytes. Units like "MB" and "GB" may immediately follow the number without spaces (e.g. 8MB). The default bufsize is 4MB.

--chunksize SIZE

Multiple processes copy a large file in parallel by dividing it into chunks. Set chunk to be at minimum SIZE bytes. Units like "MB" and "GB" can immediately follow the number without spaces (e.g. 64MB). The default chunksize is 4MB.

--daos-api API

Specify the DAOS API to be used. By default, the API is automatically determined based on the container type, where POSIX containers use the DFS API, and all other containers use the DAOS object API. Values must be in {DFS, DAOS}.

-s, --direct

Use O_DIRECT to avoid caching file data.

--progress N

Print progress message to stdout approximately every N seconds. The number of seconds must be a non-negative integer. A value of 0 disables progress messages.

-v, --verbose

Run in verbose mode. Prints a list of statistics/timing data for the command. Files walked, started, completed, seconds, files, bytes read, byte rate, and file rate.

-q, --quiet

Run tool silently. No output is printed.

-l, --lite

lite mode does a comparison of file modification time and size. If modification time and size are the same, then the contents are assumed to be the same. Similarly, if the modification time or size is different, then the contents are assumed to be different. The lite mode does no comparison of data/content in the file.

-h, --help

Print the command usage, and the list of options available.

EXPRESSIONS

An expression is made up of one or more conditions, where each condition specifies a field and a state. A single condition consists of a field name, an '=' sign, and a state name.

Valid fields are listed below, along with the property of the entry that is checked.

Field Property of entry
EXIST whether entry exists
TYPE type of entry, e.g., regular file, directory, symlink
SIZE size of entry in bytes, if a regular file
UID user id of entry
GID group id of entry
ATIME time of last access
MTIME time of last modification
CTIME time of last status change
PERM permission bits of entry
ACL ACLs associated with entry, if any
CONTENT file contents of entry, byte-for-byte comparision, if a regular file

Valid conditions for the EXIST field are:

Condition Meaning
EXIST=ONLY_SRC entry exists only in source path
EXIST=ONLY_DEST entry exists only in destination path
EXIST=DIFFER entry exists in either source or destination, but not both
EXIST=COMMON entry exists in both source and destination

All other fields may only specify the DIFFER and COMMON states.

Conditions can be joined together with AND (@) and OR (,) operators without spaces to build complex expressions. For example, the following expression reports entries that exist in both source and destination paths, but are of different types:

EXIST=COMMON@TYPE=DIFFER

The AND operator binds with higher precedence than the OR operator. For example, the following expression matches on entries which either (exist in both source and destination and whose types differ) or (only exist in the source):

EXIST=COMMON@TYPE=DIFFER,EXIST=ONLY_SRC

Some conditions imply others. For example, for CONTENT to be considered the same, the entry must exist in both source and destination, the types must match, the sizes must match, and finally the contents must match:

SIZE=COMMON    => EXISTS=COMMON@TYPE=COMMON@SIZE=COMMON
CONTENT=COMMON => EXISTS=COMMON@TYPE=COMMON@SIZE=COMMON@CONTENT=COMMON

A successful check on any other field also implies that EXIST=COMMON.

When used with the -o option, one must also specify a file name at the end of the expression, separated with a ':'. The list of any entries that match the expression are written to the named file. For example, to list any entries matching the above expression to a file named outfile1, one should use the following option:

-o EXIST=COMMON@TYPE=DIFFER:outfile1

If the --base option is given or when no output option is specified, the following expressions are checked and numeric results are reported to stdout:

EXIST=COMMON
EXIST=DIFFER
EXIST=COMMON@TYPE=COMMON
EXIST=COMMON@TYPE=DIFFER
EXIST=COMMON@CONTENT=COMMON
EXIST=COMMON@CONTENT=DIFFER

EXAMPLES

  1. Compare two files in different directories:

mpirun -np 128 dcmp /src1/file1 /src2/file2

  1. Compare two directories with verbose output. The verbose output prints timing and number of bytes read:

mpirun -np 128 dcmp -v /src1 /src2

  1. Write list of entries to outfile1 that are only in src1 or whose names exist in both src1 and src2 but whose types differ:

mpirun -np 128 dcmp -o EXIST=COMMON@TYPE=DIFFER,EXIST=ONLY_SRC:outfile1 /src1 /src2

  1. Same as above but also write list of entries to outfile2 that exist in either src1 or src2 but not both:

mpirun -np 128 dcmp -o EXIST=COMMON@TYPE=DIFFER,EXIST=ONLY_SRC:outfile1 -o EXIST=DIFFER:outfile2 /src1 /src2

SEE ALSO

The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>