org.apache.hadoop.tools
Class DistCpOptions

java.lang.Object
  extended by org.apache.hadoop.tools.DistCpOptions

public class DistCpOptions
extends Object

The Options class encapsulates all DistCp options. These may be set from command-line (via the OptionsParser) or may be set manually.


Nested Class Summary
static class DistCpOptions.FileAttribute
           
 
Constructor Summary
DistCpOptions(List<org.apache.hadoop.fs.Path> sourcePaths, org.apache.hadoop.fs.Path targetPath)
          Constructor, to initialize source/target paths.
DistCpOptions(org.apache.hadoop.fs.Path sourceFileListing, org.apache.hadoop.fs.Path targetPath)
          Constructor, to initialize source/target paths.
 
Method Summary
 void appendToConf(org.apache.hadoop.conf.Configuration conf)
          Add options to configuration.
protected  DistCpOptions clone()
           
 org.apache.hadoop.fs.Path getAtomicWorkPath()
          Get work path for atomic commit.
 String getCopyStrategy()
          Get the copy strategy to use.
 org.apache.hadoop.fs.Path getLogPath()
          Get output directory for writing distcp logs.
 int getMapBandwidth()
          Get the map bandwidth in KB
 int getMaxMaps()
          Get the max number of maps to use for this copy
 org.apache.hadoop.fs.Path getOutPutDirectory()
           
 org.apache.hadoop.fs.Path getSourceFileListing()
          File path (hdfs:// or file://) that contains the list of actual files to copy
 List<org.apache.hadoop.fs.Path> getSourcePaths()
          Getter for sourcePaths.
 String getSslConfigurationFile()
          Get path where the ssl configuration file is present to use for hftps://
 org.apache.hadoop.fs.Path getTargetPath()
          Getter for the targetPath.
 boolean isSkipPathValidation()
           
 boolean isUseSimpleFileListing()
           
 void preserve(DistCpOptions.FileAttribute fileAttribute)
          Add file attributes that need to be preserved.
 Iterator<DistCpOptions.FileAttribute> preserveAttributes()
          Returns an iterator with the list of file attributes to preserve
 void setAtomicCommit(boolean atomicCommit)
          Set if data need to be committed automatically
 void setAtomicWorkPath(org.apache.hadoop.fs.Path atomicWorkPath)
          Set the work path for atomic commit
 void setBlocking(boolean blocking)
          Set if Disctp should run blocking or non-blocking
 void setCopyStrategy(String copyStrategy)
          Set the copy strategy to use.
 void setDeleteMissing(boolean deleteMissing)
          Set if files only present in target should be deleted
 void setIgnoreFailures(boolean ignoreFailures)
          Set if failures during copy be ignored
 void setLogPath(org.apache.hadoop.fs.Path logPath)
          Set the log path where distcp output logs are stored Uses JobStagingDir/_logs by default
 void setMapBandwidth(int mapBandwidth)
          Set per map bandwidth (MB)
 void setMapBandwidthKB(int mapBandwidth)
          Set per map bandwidth (MB)
 void setMaxMaps(int maxMaps)
          Set the max number of maps to use for copy
 void setOutPutDirectory(org.apache.hadoop.fs.Path outputDir)
           
 void setOverwrite(boolean overwrite)
          Set if files should always be overwritten on target
 void setPreserveSrcPath(boolean preserveSrcPath)
          Set if preserve src path
 void setSkipCRC(boolean skipCRC)
          Set if checksum comparison should be skipped while determining if source and destination files are identical
 void setSkipPathValidation(boolean skipPathValidation)
           
 void setSslConfigurationFile(String sslConfigurationFile)
          Set the SSL configuration file path to use with hftps:// (local path)
 void setSyncFolder(boolean syncFolder)
          Set if source and target folder contents be sync'ed up
 void setUseSimpleFileListing(boolean useSimpleFileListing)
           
 boolean shouldAtomicCommit()
          Should the data be committed atomically?
 boolean shouldBlock()
          Should DistCp be running in blocking mode
 boolean shouldDeleteMissing()
          Should target files missing in source should be deleted?
 boolean shouldIgnoreFailures()
          Should failures be logged and ignored during copy?
 boolean shouldOverwrite()
          Should files be overwritten always?
 boolean shouldPreserve(DistCpOptions.FileAttribute attribute)
          Checks if the input attibute should be preserved or not
 boolean shouldPreserveSrcPath()
          Should preserve src path
 boolean shouldSkipCRC()
          Should CRC/checksum check be skipped while checking files are identical
 boolean shouldSyncFolder()
          Should the data be sync'ed between source and target paths?
 String toString()
          Utility to easily string-ify Options, for logging.
 void validate(DistCpOptionSwitch option, boolean value)
           
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

DistCpOptions

public DistCpOptions(List<org.apache.hadoop.fs.Path> sourcePaths,
                     org.apache.hadoop.fs.Path targetPath)
Constructor, to initialize source/target paths.

Parameters:
sourcePaths: - List of source-paths (including wildcards) to be copied to target.
targetPath: - Destination path for the dist-copy.

DistCpOptions

public DistCpOptions(org.apache.hadoop.fs.Path sourceFileListing,
                     org.apache.hadoop.fs.Path targetPath)
Constructor, to initialize source/target paths.

Parameters:
sourceFileListing - : File containing list of source paths
targetPath - : Destination path for the dist-copy.
Method Detail

isUseSimpleFileListing

public boolean isUseSimpleFileListing()

setUseSimpleFileListing

public void setUseSimpleFileListing(boolean useSimpleFileListing)

isSkipPathValidation

public boolean isSkipPathValidation()

setSkipPathValidation

public void setSkipPathValidation(boolean skipPathValidation)

shouldAtomicCommit

public boolean shouldAtomicCommit()
Should the data be committed atomically?

Returns:
true if data should be committed automically. false otherwise

setAtomicCommit

public void setAtomicCommit(boolean atomicCommit)
Set if data need to be committed automatically

Parameters:
atomicCommit - - boolean switch

shouldSyncFolder

public boolean shouldSyncFolder()
Should the data be sync'ed between source and target paths?

Returns:
true if data should be sync'ed up. false otherwise

setSyncFolder

public void setSyncFolder(boolean syncFolder)
Set if source and target folder contents be sync'ed up

Parameters:
syncFolder - - boolean switch

shouldDeleteMissing

public boolean shouldDeleteMissing()
Should target files missing in source should be deleted?

Returns:
true if zoombie target files to be removed. false otherwise

setDeleteMissing

public void setDeleteMissing(boolean deleteMissing)
Set if files only present in target should be deleted

Parameters:
deleteMissing - - boolean switch

shouldPreserveSrcPath

public boolean shouldPreserveSrcPath()
Should preserve src path

Returns:
true if path has to be preserved. false otherwise

setPreserveSrcPath

public void setPreserveSrcPath(boolean preserveSrcPath)
Set if preserve src path

Parameters:
preserveSrcPath - - boolean switch

shouldIgnoreFailures

public boolean shouldIgnoreFailures()
Should failures be logged and ignored during copy?

Returns:
true if failures are to be logged and ignored. false otherwise

setIgnoreFailures

public void setIgnoreFailures(boolean ignoreFailures)
Set if failures during copy be ignored

Parameters:
ignoreFailures - - boolean switch

shouldBlock

public boolean shouldBlock()
Should DistCp be running in blocking mode

Returns:
true if should run in blocking, false otherwise

setBlocking

public void setBlocking(boolean blocking)
Set if Disctp should run blocking or non-blocking

Parameters:
blocking - - boolean switch

shouldOverwrite

public boolean shouldOverwrite()
Should files be overwritten always?

Returns:
true if files in target that may exist before distcp, should always be overwritten. false otherwise

setOverwrite

public void setOverwrite(boolean overwrite)
Set if files should always be overwritten on target

Parameters:
overwrite - - boolean switch

shouldSkipCRC

public boolean shouldSkipCRC()
Should CRC/checksum check be skipped while checking files are identical

Returns:
true if checksum check should be skipped while checking files are identical. false otherwise

setSkipCRC

public void setSkipCRC(boolean skipCRC)
Set if checksum comparison should be skipped while determining if source and destination files are identical

Parameters:
skipCRC - - boolean switch

getMaxMaps

public int getMaxMaps()
Get the max number of maps to use for this copy

Returns:
Max number of maps

setMaxMaps

public void setMaxMaps(int maxMaps)
Set the max number of maps to use for copy

Parameters:
maxMaps - - Number of maps

getMapBandwidth

public int getMapBandwidth()
Get the map bandwidth in KB

Returns:
Bandwidth in KB

setMapBandwidth

public void setMapBandwidth(int mapBandwidth)
Set per map bandwidth (MB)

Parameters:
mapBandwidth - - per map bandwidth

setMapBandwidthKB

public void setMapBandwidthKB(int mapBandwidth)
Set per map bandwidth (MB)

Parameters:
mapBandwidth - - per map bandwidth

getSslConfigurationFile

public String getSslConfigurationFile()
Get path where the ssl configuration file is present to use for hftps://

Returns:
Path on local file system

setSslConfigurationFile

public void setSslConfigurationFile(String sslConfigurationFile)
Set the SSL configuration file path to use with hftps:// (local path)

Parameters:
sslConfigurationFile - - Local ssl config file path

preserveAttributes

public Iterator<DistCpOptions.FileAttribute> preserveAttributes()
Returns an iterator with the list of file attributes to preserve

Returns:
iterator of file attributes to preserve

shouldPreserve

public boolean shouldPreserve(DistCpOptions.FileAttribute attribute)
Checks if the input attibute should be preserved or not

Parameters:
attribute - - Attribute to check
Returns:
True if attribute should be preserved, false otherwise

preserve

public void preserve(DistCpOptions.FileAttribute fileAttribute)
Add file attributes that need to be preserved. This method may be called multiple times to add attributes.

Parameters:
fileAttribute - - Attribute to add, one at a time

getAtomicWorkPath

public org.apache.hadoop.fs.Path getAtomicWorkPath()
Get work path for atomic commit. If null, the work path would be parentOf(targetPath) + "/._WIP_" + nameOf(targetPath)

Returns:
Atomic work path on the target cluster. Null if not set

setAtomicWorkPath

public void setAtomicWorkPath(org.apache.hadoop.fs.Path atomicWorkPath)
Set the work path for atomic commit

Parameters:
atomicWorkPath - - Path on the target cluster

getLogPath

public org.apache.hadoop.fs.Path getLogPath()
Get output directory for writing distcp logs. Otherwise logs are temporarily written to JobStagingDir/_logs and deleted upon job completion

Returns:
Log output path on the cluster where distcp job is run

setLogPath

public void setLogPath(org.apache.hadoop.fs.Path logPath)
Set the log path where distcp output logs are stored Uses JobStagingDir/_logs by default

Parameters:
logPath - - Path where logs will be saved

getCopyStrategy

public String getCopyStrategy()
Get the copy strategy to use. Uses appropriate input format

Returns:
copy strategy to use

setCopyStrategy

public void setCopyStrategy(String copyStrategy)
Set the copy strategy to use. Should map to a strategy implementation in distp-default.xml

Parameters:
copyStrategy - - copy Strategy to use

getSourceFileListing

public org.apache.hadoop.fs.Path getSourceFileListing()
File path (hdfs:// or file://) that contains the list of actual files to copy

Returns:
- Source listing file path

getSourcePaths

public List<org.apache.hadoop.fs.Path> getSourcePaths()
Getter for sourcePaths.

Returns:
List of source-paths.

getTargetPath

public org.apache.hadoop.fs.Path getTargetPath()
Getter for the targetPath.

Returns:
The target-path.

setOutPutDirectory

public void setOutPutDirectory(org.apache.hadoop.fs.Path outputDir)

getOutPutDirectory

public org.apache.hadoop.fs.Path getOutPutDirectory()

validate

public void validate(DistCpOptionSwitch option,
                     boolean value)

appendToConf

public void appendToConf(org.apache.hadoop.conf.Configuration conf)
Add options to configuration. These will be used in the Mapper/committer

Parameters:
conf - - Configruation object to which the options need to be added

toString

public String toString()
Utility to easily string-ify Options, for logging.

Overrides:
toString in class Object
Returns:
String representation of the Options.

clone

protected DistCpOptions clone()
                       throws CloneNotSupportedException
Overrides:
clone in class Object
Throws:
CloneNotSupportedException


Copyright © 2014 InMobi. All rights reserved.