|
Flag
|
Description
|
Notes
|
-p[rbugpca] |
Preserve r: replication number b: block
size u: user g: group p: permission c: checksum-type a: ACL |
Modification times are not preserved.
Also, when -update is specified, status updates will not be synchronized unless the file sizes also
differ (i.e. unless the file is recreated). If -pa is
specified, DistCp also preserves the permissions because ACLs are a
super-set of permissions. |
-i |
Ignore failures |
This option will keep more accurate
statistics about the copy than the default case. It also preserves logs from
failed copies, which can be valuable for debugging. Finally, a failing map
will not cause the job to fail before all splits are attempted. |
-log <logdir>
|
Write logs to
<logdir> |
DistCp keeps logs of each file it
attempts to copy as map output. If a map fails, the log output will not be
retained if it is re-executed. |
-m <num_maps> |
Maximum number of simultaneous copies |
Specify the number of maps to copy
data. Note that more maps may not necessarily improve throughput. |
-overwrite |
Overwrite destination |
If a map fails and -i is
not specified, all the files in the split, not only those that failed, will
be recopied. As discussed in the Usage documentation, it also changes the
semantics for generating destination paths, so users should use this
carefully. |
-update |
Overwrite if src size different from
dst size |
As noted in the preceding, this is not
a “sync” operation. The only criterion examined is the source and
destination file sizes; if they differ, the source file replaces the
destination file. As discussed in the Usage documentation, it also changes
the semantics for generating destination paths, so users should use this
carefully. |
-f <urilist_uri> |
Use list at
<urilist_uri> as src list |
This is equivalent to listing each
source on the command line. The urilist_uri list should be a
fully qualified URI. |
-filelimit <n> |
Limit the total number of files to be
<= n |
Deprecated! Ignored in DistCp v2. |
-sizelimit <n> |
Limit the total size to be <= n
bytes |
Deprecated! Ignored in DistCp v2. |
-delete |
Delete the files existing in the dst
but not in src |
The deletion is done by FS Shell. So
the trash will be used, if it is enabled. |
-strategy {dynamic|uniformsize}
|
Choose the copy-strategy to be used in
DistCp. |
By default, uniformsize is
used. (i.e. Maps are balanced on the total size of files copied by each map.
Similar to legacy.) If dynamic is
specified, DynamicInputFormat is used instead. (This is described in the
Architecture section, under InputFormats.) |
-bandwidth
|
Specify bandwidth per map, in
MB/second. |
Each map will be restricted to consume
only the specified bandwidth. This is not always exact. The map throttles
back its bandwidth consumption during a copy, such that the net bandwidth used tends towards the specified
value. |
-atomic {-tmp
<tmp_dir>}
|
Specify atomic commit, with optional
tmp directory. |
-atomic instructs DistCp
to copy the source data to a temporary target location, and then move the
temporary target to the final location atomically. Data will either be
available at final target in a complete and consistent form, or not at all.
Optionally, -tmp may be used to specify the location of the
tmp-target. If not specified, a default is chosen. Note: tmp_dir must be on the final target
cluster. |
-mapredSslConf <ssl_conf_file> |
Specify SSL Config file, to be used
with HSFTP source |
When using the hsftp protocol with a
source, the security-related properties may be specified in a config file
and passed to DistCp. <ssl_conf_file> needs to be in the
classpath. |
-async
|
Run DistCp asynchronously. Quits as
soon as the Hadoop Job is launched. |
The Hadoop Job-id is logged, for
tracking. |