mpiFileUtils provides both a library called libmfu and a suite of MPI-based tools to manage large datasets, which may vary from large directory trees to large files. High-performance computing users often generate large datasets with parallel applications that run with many processes (millions in some cases). However those users are then stuck with single-process tools like cp and rm to manage their datasets. This suite provides MPI-based tools to handle typical jobs like copy, remove, and compare for such datasets, providing speedups of up to 50x. The libmfu library simplifies the creation of new tools and it can be called directly from within HPC applications.
The following principles drive design decisions in the project.
The library and tools should be designed such that running with more processes increases performance, provided there are sufficient data and parallelism available in the underlying file systems. The design of the tool should not impose performance scalability bottlenecks.
While it is tempting to mimic the interface, behavior, and file formats of familiar tools like cp, rm, and tar, when forced with a choice between compatibility and performance, mpiFileUtils chooses performance. For example, if an archive file format requires serialization that inhibits parallel performance, mpiFileUtils will opt to define a new file format that enables parallelism rather than being constrained to existing formats. Similarly, options in the tool command line interface may have different semantics from familiar tools in cases where performance is improved. Thus, one should be careful to learn the options of each tool.
The tools are intended to support common file systems used in HPC centers, like Lustre, GPFS, and NFS. Additionally, methods in the library should be portable and efficient across multiple file systems. Tool and library users can rely on mpiFileUtils to provide portable and performant implementations.
While the tools do not support chaining with Unix pipes, they do support interoperability through input and output files. One tool may process a dataset and generate an output file that another tool can read as input, e.g., to walk a directory tree with one tool, filter the list of file names with another, and perhaps delete a subset of matching files with a third. Additionally, when logic is deemed to be useful across multiple tools or is anticipated to be useful in future tools or applications, it should be provided in the common library.
The tools in mpiFileUtils are MPI applications. They must be launched as MPI applications, e.g., within a compute allocation on a cluster using mpirun. The tools do not currently checkpoint, so one must be careful that an invocation of the tool has sufficient time to complete before it is killed.
Experimental utilities are under active development. They are not considered to be production worthy, but they are available in the distribution for those who are interested in developing them further or to provide additional examples.
Functionality that is common to multiple tools is moved to the common library, libmfu. This goal of this library is to make it easy to develop new tools and to provide consistent behavior across tools in the suite. The library can also be useful to end applications, e.g., to efficiently create or remove a large directory tree in a portable way across different parallel file systems.
The mpiFileUtils common library defines data structures and methods on those data structures that makes it easier to develop new tools or for use within HPC applications to provide portable, performant implementations across file systems common in HPC centers.
#include "mfu.h"
This file includes all other necessary headers.
The key data structure in libmfu is a distributed file list called mfu_flist. This structure represents a list of files, each with stat-like metadata, that is distributed among a set of MPI ranks.
The library contains functions for creating and operating on these lists. For example, one may create a list by recursively walking an existing directory or by inserting new entries one at a time. Given a list as input, functions exist to create corresponding entries (inodes) on the file system or to delete the list of files. One may filter, sort, and remap entries. One can copy a list of entries from one location to another or compare corresponding entries across two different lists. A file list can be serialized and written to or read from a file.
Each MPI rank "owns" a portion of the list, and there are routines to step through the entries owned by that process. This portion is referred to as the "local" list. Functions exist to get and set properties of the items in the local list, for example to get the path name, type, and size of a file. Functions dealing with the local list can be called by the MPI process independently of other MPI processes.
Other functions operate on the global list in a collective fashion, such as deleting all items in a file list. All processes in the MPI job must invoke these functions simultaneously.
For full details, see mfu_flist.h and refer to its usage in existing tools.
mpiFileUtils represents file paths with the mfu_path structure. Functions are available to manipulate paths to prepend and append entries, to slice paths into pieces, and to compute relative paths.
Path names provided by the user on the command line (parameters) are handled through the mfu_param_path structure. Such paths may have to be checked for existence and to determine their type (file or directory). Additionally, the user may specify many such paths through invocations involving shell wildcards, so functions are available to check long lists of paths in parallel.
The mfu_io.h functions provide wrappers for many POSIX-IO functions. This is helpful for checking error codes in a consistent manner and automating retries on failed I/O calls. One should use the wrappers in mfu_io if available, and if not, one should consider adding the missing wrapper.
The mfu_util.h functions provide wrappers for error reporting and memory allocation.
mpiFileUtils and its dependencies can be installed with and without Spack. There are several common variations described here:
To use Spack, it is recommended that one first create a packages.yaml file to list system-provided packages, like MPI. Without doing this, Spack will fetch and install an MPI library that may not work on your system. Make sure that you've set up spack in your shell (see these instructions).
Once Spack has been configured, mpiFileUtils can be installed as:
spack install mpifileutils
or to enable all features:
spack install mpifileutils +lustre +gpfs +experimental
To build directly, mpiFileUtils requires CMake 3.1 or higher. First ensure MPI wrapper scripts like mpicc are loaded in your environment. Then to install the dependencies, run the following commands:
#!/bin/bash
mkdir install
installdir=`pwd`/install
mkdir deps
cd deps
wget https://github.com/hpc/libcircle/releases/download/v0.3/libcircle-0.3.0.tar.gz
wget https://github.com/llnl/lwgrp/releases/download/v1.0.2/lwgrp-1.0.2.tar.gz
wget https://github.com/llnl/dtcmp/releases/download/v1.1.0/dtcmp-1.1.0.tar.gz
tar -zxf libcircle-0.3.0.tar.gz
cd libcircle-0.3.0
./configure --prefix=$installdir
make install
cd ..
tar -zxf lwgrp-1.0.2.tar.gz
cd lwgrp-1.0.2
./configure --prefix=$installdir
make install
cd ..
tar -zxf dtcmp-1.1.0.tar.gz
cd dtcmp-1.1.0
./configure --prefix=$installdir --with-lwgrp=$installdir
make install
cd ..
cd ..
To build on PowerPC, one may need to add --build=powerpc64le-redhat-linux-gnu
to the configure commands.
Assuming the dependencies have been placed in an install directory as shown above, build mpiFileUtils from a release like v0.10:
wget https://github.com/hpc/mpifileutils/archive/v0.10.tar.gz
tar -zxf v0.10.tar.gz
mkdir build install
cd build
cmake ../mpifileutils-0.10 \
-DWITH_DTCMP_PREFIX=../install \
-DWITH_LibCircle_PREFIX=../install \
-DCMAKE_INSTALL_PREFIX=../install
make install
or to build the latest mpiFileUtils from the master branch:
git clone https://github.com/hpc/mpifileutils
mkdir build install
cd build
cmake ../mpifileutils \
-DWITH_DTCMP_PREFIX=../install \
-DWITH_LibCircle_PREFIX=../install \
-DCMAKE_INSTALL_PREFIX=../install
make install
To enable Lustre, GPFS, and experimental tools, add the following flags during CMake:
-DENABLE_LUSTRE=ON
-DENABLE_GPFS=ON
-DENABLE_EXPERIMENTAL=ON
One can use Spack to install mpiFileUtils dependencies using the spack.yaml file distributed with mpiFileUtils. From the root directory of mpiFileUtils, run the command spack find to determine which packages spack will install. Next, run spack concretize to have spack perform dependency analysis. Finally, run spack install to build the dependencies.
There are two ways to tell CMake about the dependencies. First, you can use spack load [depname] to put the installed dependency into your environment paths. Then, at configure time, CMake will automatically detect the location of these dependencies. Thus, the commands to build become:
git clone https://github.com/hpc/mpifileutils
mkdir build install
cd mpifileutils
spack install
spack load dtcmp
spack load libcircle
spack load libarchive
cd ../build
cmake ../mpifileutils
The other way to use spack is to create a "view" to the installed dependencies. Details on this are coming soon.
dbcast [OPTION] SRC DEST
Parallel MPI application to recursively broadcast a single file from a global file system to node-local storage, like ramdisk or an SSD.
The file is logically sliced into chunks and collectively copied from a global file system to node-local storage. The source file SRC must be readable by all MPI processes. The destination file DEST should be the full path of the file in node-local storage. If needed, parent directories for the destination file will be created as part of the broadcast.
In the current implementation, dbcast requires at least two MPI processes per compute node, and all compute nodes must run an equal number of MPI processes.
-s
,
--size
SIZE
¶The chunk size in bytes used to segment files during the broadcast. Units like "MB" and "GB" should be immediately follow the number without spaces (ex. 2MB). The default size is 1MB. It is recommended to use the stripe size of a file if this is known.
-h
,
--help
¶Print the command usage, and the list of options available.
mpirun -np 128 dbcast /global/path/to/filenane /ssd/filename
mpirun -np 128 dbcast -s 10MB /global/path/to/filenane /ssd/filename
lfs getstripe /global/path/to/filename
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
dbz2 [OPTIONS] [-z|-d] FILE
Parallel MPI application to compress or decompress a file.
When compressing, a new file will be created with a .dbz2 extension. When decompressing, the .dbz2 extension will be dropped from the file name.
-z
,
--compress
¶Compress the file
-d
,
--decompress
¶Decompress the file
-k
,
--keep
¶Keep the input file.
-f
,
--force
¶Overwrite the output file, if it exists.
-b
,
--blocksize
SIZE
¶Set the compression block size, from 1 to 9. Where 1=100kB ... and 9=900kB. Default is 9.
-v
,
--verbose
¶Verbose output (optional).
-q
,
--quiet
¶Quiet output
-h
,
--help
¶Print usage.
mpirun -np 128 dbz2 --compress /path/to/file
mpirun -np 128 dbz2 --force --compress /path/to/file
mpirun -np 128 dbz2 --decompress /path/to/file.dbz2
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
dchmod [OPTION] PATH ...
Parallel MPI application to recursively change permissions and/or group from a top level directory.
dchmod provides functionality similar to chmod(1), chown(1), and chgrp(1). Like chmod(1), the tool supports the use of octal or symbolic mode to change the permissions.
-i
,
--input
FILE
¶Read source list from FILE. FILE must be generated by another tool from the mpiFileUtils suite.
-u
,
--owner
USER
¶Change owner to specified USER name or numeric user id.
-g
,
--group
GROUP
¶Change group to specified GROUP name or numeric group id.
-m
,
--mode
MODE
¶The mode to apply to each item. MODE may be octal or symbolic syntax similar to chmod(1). In symbolic notation, "ugoa" are supported as are "rwxX". As with chmod, if no leading letter "ugoa" is provided, mode bits are combined with umask to determine the actual mode.
-f
,
--force
¶Attempt to change every item. By default, dchmod avoids unncessary chown and chmod calls, for example trying to change the group on an item that already has the correct group, or trying to change the group on an item that is not owned by the user running the tool. With --force, dchmod executes chown/chmod calls on every item.
-s
,
--silent
¶Suppress EPERM error messages, which is useful when running dchmod on large directories with files owned by other users.
--exclude
REGEX
¶Do not modify items whose full path matches REGEX, processed by regexec(3).
--match
REGEX
¶Only modify items whose full path matches REGEX, processed by regexec(3).
-n
,
--name
¶Change --exclude and --match to apply to item name rather than its full path.
--progress
N
¶Print progress message to stdout approximately every N seconds. The number of seconds must be a non-negative integer. A value of 0 disables progress messages.
-v
,
--verbose
¶Run in verbose mode. Prints a list of statistics including the number of files walked, the number of levels there are in the directory tree, and the number of files the command operated on, and the files/sec rate for each of those.
-q
,
--quiet
¶Run tool silently. No output is printed.
-h
,
--help
¶Print the command usage, and the list of options available.
mpirun -np 128 dchmod --mode 755 /directory
mpirun -np 128 dchmod --group mygroup --mode u+r,g+rw /directory
mpirun -np 128 dchmod --owner user1 --group mygroup /directory
mpirun -np 128 dchmod --name --exclude ‘afilename’ --mode u+rw /directory
Note: You can use --match to change file permissions on all of the files/directories that match the regex.
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
dcmp [OPTION] SRC DEST
Parallel MPI application to compare two files or to recursively compare files with same relative paths within two different directories.
dcmp provides functionality similar to a recursive cmp(1). It reports how many files in two different directories are the same or different.
dcmp can be configured to compare a number of different file properties.
-o
,
--output
EXPR:FILE
¶Writes list of files matching expression EXPR to specified FILE. The expression consists of a set of fields and states described below. More than one -o option is allowed in a single invocation, in which case, each option should provide a different output file name.
-t
,
--text
¶Change --output to write files in text format rather than binary.
-b
,
--base
¶Enable base checks and normal stdout results when --output is used.
--progress
N
¶Print progress message to stdout approximately every N seconds. The number of seconds must be a non-negative integer. A value of 0 disables progress messages.
-v
,
--verbose
¶Run in verbose mode. Prints a list of statistics/timing data for the command. Files walked, started, completed, seconds, files, bytes read, byte rate, and file rate.
-q
,
--quiet
¶Run tool silently. No output is printed.
-l
,
--lite
¶lite mode does a comparison of file modification time and size. If modification time and size are the same, then the contents are assumed to be the same. Similarly, if the modification time or size is different, then the contents are assumed to be different. The lite mode does no comparison of data/content in the file.
-h
,
--help
¶Print the command usage, and the list of options available.
An expression is made up of one or more conditions, where each condition specifies a field and a state. A single condition consists of a field name, an '=' sign, and a state name.
Valid fields are listed below, along with the property of the entry that is checked.
Field | Property of entry |
---|---|
EXIST | whether entry exists |
TYPE | type of entry, e.g., regular file, directory, symlink |
SIZE | size of entry in bytes, if a regular file |
UID | user id of entry |
GID | group id of entry |
ATIME | time of last access |
MTIME | time of last modification |
CTIME | time of last status change |
PERM | permission bits of entry |
ACL | ACLs associated with entry, if any |
CONTENT | file contents of entry, byte-for-byte comparision, if a regular file |
Valid conditions for the EXIST field are:
Condition | Meaning |
---|---|
EXIST=ONLY_SRC | entry exists only in source path |
EXIST=ONLY_DEST | entry exists only in destination path |
EXIST=DIFFER | entry exists in either source or destination, but not both |
EXIST=COMMON | entry exists in both source and destination |
All other fields may only specify the DIFFER and COMMON states.
Conditions can be joined together with AND (@) and OR (,) operators without spaces to build complex expressions. For example, the following expression reports entries that exist in both source and destination paths, but are of different types:
EXIST=COMMON@TYPE=DIFFER
The AND operator binds with higher precedence than the OR operator. For example, the following expression matches on entries which either (exist in both source and destination and whose types differ) or (only exist in the source):
EXIST=COMMON@TYPE=DIFFER,EXIST=ONLY_SRC
Some conditions imply others. For example, for CONTENT to be considered the same, the entry must exist in both source and destination, the types must match, the sizes must match, and finally the contents must match:
SIZE=COMMON => EXISTS=COMMON@TYPE=COMMON@SIZE=COMMON
CONTENT=COMMON => EXISTS=COMMON@TYPE=COMMON@SIZE=COMMON@CONTENT=COMMON
A successful check on any other field also implies that EXIST=COMMON.
When used with the -o option, one must also specify a file name at the end of the expression, separated with a ':'. The list of any entries that match the expression are written to the named file. For example, to list any entries matching the above expression to a file named outfile1, one should use the following option:
-o EXIST=COMMON@TYPE=DIFFER:outfile1
If the --base option is given or when no output option is specified, the following expressions are checked and numeric results are reported to stdout:
EXIST=COMMON
EXIST=DIFFER
EXIST=COMMON@TYPE=COMMON
EXIST=COMMON@TYPE=DIFFER
EXIST=COMMON@CONTENT=COMMON
EXIST=COMMON@CONTENT=DIFFER
mpirun -np 128 dcmp /src1/file1 /src2/file2
mpirun -np 128 dcmp -v /src1 /src2
mpirun -np 128 dcmp -o EXIST=COMMON@TYPE=DIFFER,EXIST=ONLY_SRC:outfile1 /src1 /src2
mpirun -np 128 dcmp -o EXIST=COMMON@TYPE=DIFFER,EXIST=ONLY_SRC:outfile1 -o EXIST=DIFFER:outfile2 /src1 /src2
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
dcp [OPTION] SRC DEST
Parallel MPI application to recursively copy files and directories.
dcp is a file copy tool in the spirit of cp(1) that evenly distributes the work of scanning the directory tree, and copying file data across a large cluster without any centralized state. It is designed for copying files that are located on a distributed parallel file system, and it splits large file copies across multiple processes.
-b
,
--blocksize
SIZE
¶Set the I/O buffer to be SIZE bytes. Units like "MB" and "GB" may immediately follow the number without spaces (eg. 8MB). The default blocksize is 1MB.
-i
,
--input
FILE
¶Read source list from FILE. FILE must be generated by another tool from the mpiFileUtils suite.
-k
,
--chunksize
SIZE
¶Split large files into chunks of SIZE bytes to be processed. Multiple process ranks may copy a large file in parallel. Units like "MB" and "GB" can immediately follow the number without spaces (eg. 64MB). The default chunksize is 1MB.
-p
,
--preserve
¶Preserve permissions, group, timestamps, and extended attributes.
-s
,
--synchronous
¶Use synchronous read/write calls (open files with O_DIRECT). This also avoids caching the file data on the client nodes.
-S
,
--sparse
¶Create sparse files when possible.
--progress
N
¶Print progress message to stdout approximately every N seconds. The number of seconds must be a non-negative integer. A value of 0 disables progress messages.
-v
,
--verbose
¶Run in verbose mode.
-q
,
--quiet
¶Run tool silently. No output is printed.
-h
,
--help
¶Print a brief message listing the dcp(1) options and usage.
If a long-running copy is interrupted, one should delete the partial copy and run dcp again from the beginning. One may use drm to quickly remove a partial copy of a large directory tree.
To ensure the copy is successful, one should run dcmp after dcp completes to verify the copy, especially if dcp was not run with the -s option.
mpirun -np 128 dcp /source/dir1 /dest/dir2
mkdir /dest/dir2 mpirun -np 128 dcp /source/dir1/\* /dest/dir2
mpirun -np 128 dcp -p /source/dir1/ /dest/dir2
Using the -S option for sparse files does not work yet at LLNL. If you try to use it then dcp will default to a normal copy.
The maximum supported file name length for any file transferred is approximately 4068 characters. This may be less than the number of characters that your operating system supports.
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
ddup [OPTION] PATH
Parallel MPI application to report files under a directory tree having identical content.
ddup reports path names to files having identical content (duplicate files). A top-level directory is specified, and the path name to any file that is a duplicate of another anywhere under that same directory tree is reported. The path to each file is reported, along with a final hash representing its content. Multiple sets of duplicate files can be matched using this final reported hash.
-d
,
--debug
LEVEL
¶Set verbosity level. LEVEL can be one of: fatal, err, warn, info, dbg.
-v
,
--verbose
¶Run in verbose mode.
-q
,
--quiet
¶Run tool silently. No output is printed.
-h
,
--help
¶Print the command usage, and the list of options available.
mpirun -np 128 ddup /path/to/haystack
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
dfind [OPTION] [EXPRESSION] PATH ...
Parallel MPI application to filter a list of files according to an expression.
dfind provides functionality similar to find(1).
The file list can be obtained by either walking one or more paths provided on the command line or through an input list.
The filtered list can be written to an output file.
-i
,
--input
FILE
¶Read source list from FILE. FILE must be generated by another tool from the mpiFileUtils suite.
-o
,
--output
FILE
¶Write the processed list to a file.
-v
,
--verbose
¶Run in verbose mode.
-q
,
--quiet
¶Run tool silently. No output is printed.
-h
,
--help
¶Print a brief message listing the dfind(1) options and usage.
Numeric arguments can be specified as:
+N more than N -N less than N N exactly N
--amin
N
¶File was last accessed N minutes ago.
--anewer
FILE
¶File was last accessed more recently than FILE was modified.
--atime
N
¶File was last accessed N days ago.
--cmin
N
¶File's status was last changed N minutes ago.
--cnewer
FILE
¶File's status was last changed more recently than FILE was modified.
--ctime
N
¶File's status was last changed N days ago.
--mmin
N
¶File's data was last modified N minutes ago.
--newer
FILE
¶File was modified more recently than FILE.
--mtime
N
¶File's data was last modified N days ago.
--gid
N
¶File's numeric group ID is N.
--group
NAME
¶File belongs to group NAME.
--uid
N
¶File's numeric user ID is N.
--user
NAME
¶File is owned by user NAME.
--name
PATTERN
¶Base of file name matches shell pattern PATTERN.
--path
PATTERN
¶Full path to file matches shell pattern PATTERN.
--regex
REGEX
¶Full path to file matches POSIX regular expression REGEX. Regular expressions processed by regexec(3).
--size
N
¶File size is N bytes. Units can be used like 'KB', 'MB', 'GB'.
--type
C
¶File is of type C:
b | block device |
c | char device |
d | directory |
f | regular file |
l | symbolic link |
p | pipe |
s | socket |
--print
¶Print file name to stdout.
--exec
CMD ;
¶Execute command CMD on file. All following arguments are taken as arguments to the command until ';' is encountered. The string '{}' is replaced by the current file name.
mpirun -np 128 dfind -v --user user1 --print /path/to/target
mpirun -np 128 dfind -v -o outfile --size -1GB /path/to/target
mpirun -np 128 dfind -v -i infile -o outfile --type f --mtime +180
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
dreln [OPTION] OLDPATH NEWPATH PATH ...
Parallel MPI application to recursively update symlinks within a directory.
dreln walks the specified PATH and updates any symlink whose target includes an absolute path to OLDPATH and replaces that symlink with a new link whose target points to NEWPATH instead.
This is useful to update symlinks after migrating a large directory from one file system to another, whose links specify absolute paths to the original file system.
-i
,
--input
FILE
¶Read source list from FILE. FILE must be generated by another tool from the mpiFileUtils suite.
-p
,
--preserve
¶Preserve existing modification times on links.
-r
,
--relative
¶Replace links using target paths that are relative to NEWPATH.
--progress
N
¶Print progress message to stdout approximately every N seconds. The number of seconds must be a non-negative integer. A value of 0 disables progress messages.
-v
,
--verbose
¶Run in verbose mode.
-q
,
--quiet
¶Run tool silently. No output is printed.
-h
,
--help
¶Print a brief message listing the drm(1) options and usage.
1. To update all links under /walk/path whose targets point to /orig/path and replace them with targets that point to /new/path:
mpirun -np 128 dreln -v /orig/path /new/path /walk/path
2. Same as above, but replace each link target with a relative path from the link to its new target under /new/path:
mpirun -np 128 dreln -v --relative /orig/path /new/path /walk/path
mpirun -np 128 dreln -v --preserve /orig/path /new/path /walk/path
mpirun -np 128 dreln -v /orig/path /new/path /walk/path1 /walk/path2
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
drm [OPTION] PATH...
Parallel MPI application to recursively delete a directory and its contents.
drm is a tool for removing files recursively in parallel. drm behaves like rm -rf, but it is faster.
Note
DO NOT USE SHELL REGEX!!! The --match and --exclude options use POSIX regex syntax. Because of this make sure that the shell does not try to interpret your regex before it gets passed to the program. You can generally use quotes around your regex to prevent the shell from expanding. An example of this using the --match option with --dryrun would be:
mpirun -np 128 drm --dryrun -v --name --match 'file_.*' /path/to/dir/*
-i
,
--input
FILE
¶Read source list from FILE. FILE must be generated by another tool from the mpiFileUtils suite.
-o
,
--output
FILE
¶Write the list of items drm attempts to delete to FILE in mpiFileUtils format. Format can be changed with --text option.
-t
,
--text
¶Must be used with the --output option. Write list of items drm attempts to delete to FILE in ascii text format.
-l
,
--lite
¶Walk file system without stat.
--stat
¶Walk file system with stat.
--exclude
REGEX
¶Do not remove items whose full path matches REGEX, processed by regexec(3).
--match
REGEX
¶Only remove items whose full path matches REGEX, processed by regexec(3).
--name
¶Change --exclude and match to apply to item name rather than its full path.
--dryrun
¶Print a list of files that would be deleted without deleting them. This is useful to check list of items satisfying --exclude or --match options before actually deleting anything.
--aggressive
¶This option will delete files during the walk phase, and then delete directories by level after the walk in drm. You cannot use this option with --dryrun.
-T
,
--traceless
¶Delete child items without updating the mtime on their parent directory.
--progress
N
¶Print progress message to stdout approximately every N seconds. The number of seconds must be a non-negative integer. A value of 0 disables progress messages.
-v
,
--verbose
¶Run in verbose mode.
-q
,
--quiet
¶Run tool silently. No output is printed.
-h
,
--help
¶Print a brief message listing the drm(1) options and usage.
mpirun -np 128 drm -v /dir/to/delete
mpirun -np 128 drm --match '.core$' /dir/to/delete/from
mpirun -np 128 drm --dryrun --match '.core$' /dir/to/delete/from
mpirun -np 128 drm --name --match '^foo$' /dir/to/delete/from
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
dstripe [OPTION] PATH...
Parallel MPI application to restripe files.
This tool is in active development. It currently only works on Lustre.
dstripe enables one to restripe file(s) across the underlying storage devices. One must specify a list of paths. All files in those paths can be restriped. By default, stripe size is 1MB and stripe count is -1 allowing dstripe to use all available stripes.
-c
,
--count
STRIPE_COUNT
¶The number of stripes to use during file restriping. If STRIPE_COUNT is -1, then all available stripes are used. If STRIPE_COUNT is 0, the lustre file system default is used. The default stripe count is -1.
-s
,
--size
STRIPE_SIZE
¶The stripe size to use during file restriping. Units like "MB" and "GB" can immediately follow the number without spaces (ex. 2MB). The default stripe size is 1MB.
-m
,
--minsize
SIZE
¶The minimum size a file must be to be a candidate for restriping. Files smaller than SIZE will not be restriped. Units like "MB" and "GB" can immediately follow the number without spaces (ex. 2MB). The default minimum file size is 0MB.
-r
,
--report
¶Display the file size, stripe count, and stripe size of all files found in PATH. No restriping is performed when using this option.
--progress
N
¶Print progress message to stdout approximately every N seconds. The number of seconds must be a non-negative integer. A value of 0 disables progress messages.
-v
,
--verbose
¶Run in verbose mode.
-q
,
--quiet
¶Run tool silently. No output is printed.
-h
,
--help
¶Print the command usage, and the list of options available.
mpirun -np 128 dstripe -s 1MB /path/to/file
mpirun -np 128 dstripe -c 20 -s 1GB /path/to/file
mpirun -np 128 dstripe -m 1GB /path/to/files/
mpirun -np 128 dstripe -c 10 -s 2MB /path/to/files/
mpirun -np 128 dstripe -r /path/to/files/
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
dsync [OPTION] SRC DEST
Parallel MPI application to synchronize two files or two directory trees.
dsync makes DEST match SRC, adding missing entries from DEST, and updating existing entries in DEST as necessary so that SRC and DEST have identical content, ownership, timestamps, and permissions.
--dryrun
¶Show differences without changing anything.
-b
,
--batch-files
N
¶Batch files into groups of up to size N during copy operation.
-c
,
--contents
¶Compare files byte-by-byte rather than checking size and mtime to determine whether file contents are different.
-D
,
--delete
¶Delete extraneous files from destination.
--link-dest
DIR
¶Create hardlink in DEST to files in DIR when file is unchanged rather than create a new file. One can use this option to conserve storage space during an incremental backup.
For example in the following, any file that would be copied from /src to /src.bak.inc that is the same as the file already existing in /src.bak will instead be hardlinked to the file in /src.bak:
# initial backup of /src dsync /src /src.bak
# incremental backup of /src dsync --link-dest /src.bak /src /src.bak.inc
-S
,
--sparse
¶Create sparse files when possible.
--progress
N
¶Print progress message to stdout approximately every N seconds. The number of seconds must be a non-negative integer. A value of 0 disables progress messages.
-v
,
--verbose
¶Run in verbose mode. Prints a list of statistics/timing data for the command. Files walked, started, completed, seconds, files, bytes read, byte rate, and file rate.
-q
,
--quiet
¶Run tool silently. No output is printed.
-h
,
--help
¶Print the command usage, and the list of options available.
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
dwalk [OPTION] PATH ...
Parallel MPI application to recursively walk and list contents in a directory.
dwalk provides functionality similar to ls(1) and du(1). Like du(1), the tool reports a summary of the total number of files and bytes. Like ls(1), the tool sorts and prints information about individual files.
The output can be sorted on different fields (e.g, name, user, group, size, etc). A histogram of file sizes can be computed listing the number of files that fall into user-defined bins.
-i
,
--input
FILE
¶Read source list from FILE. FILE must be generated by another tool from the mpiFileUtils suite.
-o
,
--output
FILE
¶Write the processed list to FILE in binary format. Format can be changed With --text option.
-t
,
--text
¶Must be used with the --output option. Write processed list of files to FILE in ascii text format.
-l
,
--lite
¶Walk file system without stat.
-s
,
--sort
FIELD
¶Sort output by comma-delimited fields (see below).
-d
,
--distribution
size:SEPARATORS
¶Print the distribution of file sizes. For example, specifying size:0,80,100 will report the number of files that have size 0 bytes, between 1-80 bytes, between 81-99 bytes, and 100 bytes or greater.
-f
,
--file-histogram
¶Creates a file histogram without requiring the user to provide the bin sizes. The bins are created dynamically based on the max file size. The first bin is always for only zero byte files, and the rest go up until the max file size is included in the very last bin. It always goes up by orders of magnitude in powers of two. So, an example of bin separators would be: 0, 2^10, 2^20, 2^30. Assuming the max file size was somewhere within the 2^20 - 2^30 range. The histogram also includes both files and directories.
-p
,
--print
¶Print files to the screen.
--progress
N
¶Print progress message to stdout approximately every N seconds. The number of seconds must be a non-negative integer. A value of 0 disables progress messages.
-v
,
--verbose
¶Run in verbose mode.
-q
,
--quiet
¶Run tool silently. No output is printed.
-h
,
--help
¶Print usage.
By default, the list of files dwalk captures is not sorted. To sort the list, one or more fields can be specified in a comma-delimited list:
name,user,group,uid,gid,atime,mtime,ctime,size
A field name can be preceded with ‘-’ to sort by that field in reverse order.
A lexicographic sort is executed if more than one field is given.
mpirun -np 128 dwalk -v /dir/to/walk
mpirun -np 128 dwalk –print –sort size,name /dir/to/walk
mpirun -np 128 dwalk –output out.dwalk /dir/to/walk
mpirun -np 128 dwalk -v –print -d size:0,20,1G src/
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
dgrep ...
-h
,
--help
¶Print a brief message listing the dgrep(1) options and usage.
-v
,
--version
¶Print version information and exit.
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
dparallel ...
-h
,
--help
¶Print a brief message listing the dparallel(1) options and usage.
-v
,
--version
¶Print version information and exit.
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>
dtar ...
-h
,
--help
¶Print a brief message listing the dtar(1) options and usage.
-v
,
--version
¶Print version information and exit.
The mpiFileUtils source code and all documentation may be downloaded from <https://github.com/hpc/mpifileutils>