org.apache.hadoop.tools.util
Class DistCpUtils

java.lang.Object
  extended by org.apache.hadoop.tools.util.DistCpUtils

public class DistCpUtils
extends Object

Utility functions used in DistCp.


Constructor Summary
DistCpUtils()
           
 
Method Summary
static boolean checksumsAreEqual(org.apache.hadoop.fs.FileSystem sourceFS, org.apache.hadoop.fs.Path source, org.apache.hadoop.fs.FileSystem targetFS, org.apache.hadoop.fs.Path target)
          Utility to compare checksums for the paths specified.
static boolean compareFs(org.apache.hadoop.fs.FileSystem srcFs, org.apache.hadoop.fs.FileSystem destFs)
           
static long getFileSize(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration configuration)
          Retrieves size of the file at the specified path.
static DecimalFormat getFormatter()
           
static int getInt(org.apache.hadoop.conf.Configuration configuration, String label)
          Utility to retrieve a specified key from a Configuration.
static long getLong(org.apache.hadoop.conf.Configuration configuration, String label)
          Utility to retrieve a specified key from a Configuration.
static String getRelativePath(org.apache.hadoop.fs.Path sourceRootPath, org.apache.hadoop.fs.Path childPath)
          Gets relative path of child path with respect to a root path For ex.
static Class<? extends org.apache.hadoop.mapreduce.InputFormat> getStrategy(org.apache.hadoop.conf.Configuration conf, DistCpOptions options)
          Returns the class that implements a copy strategy.
static String getStringDescriptionFor(long nBytes)
           
static String packAttributes(EnumSet<DistCpOptions.FileAttribute> attributes)
          Pack file preservation attributes into a string, containing just the first character of each preservation attribute
static void preserve(org.apache.hadoop.fs.FileSystem targetFS, org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.FileStatus srcFileStatus, EnumSet<DistCpOptions.FileAttribute> attributes)
          Preserve attribute on file matching that of the file status being sent as argument.
static
<T> void
publish(org.apache.hadoop.conf.Configuration configuration, String label, T value)
          Utility to publish a value to a configuration.
static org.apache.hadoop.fs.Path sortListing(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path sourceListing)
          Sort sequence file containing FileStatus and Text as key and value respecitvely
static EnumSet<DistCpOptions.FileAttribute> unpackAttributes(String attributes)
          Un packs preservation attribute string containing the first character of each preservation attribute back to a set of attributes to preserve
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DistCpUtils

public DistCpUtils()
Method Detail

getFileSize

public static long getFileSize(org.apache.hadoop.fs.Path path,
                               org.apache.hadoop.conf.Configuration configuration)
                        throws IOException
Retrieves size of the file at the specified path.

Parameters:
path: - The path of the file whose size is sought.
configuration: - Configuration, to retrieve the appropriate FileSystem.
Returns:
The file-size, in number of bytes.
Throws:
IOException, - on failure.
IOException

publish

public static <T> void publish(org.apache.hadoop.conf.Configuration configuration,
                               String label,
                               T value)
Utility to publish a value to a configuration.

Type Parameters:
T - The type of the value.
Parameters:
configuration - The Configuration to which the value must be written.
label - The label for the value being published.
value - The value being published.

getInt

public static int getInt(org.apache.hadoop.conf.Configuration configuration,
                         String label)
Utility to retrieve a specified key from a Configuration. Throw exception if not found.

Parameters:
configuration - The Configuration in which the key is sought.
label - The key being sought.
Returns:
Integer value of the key.

getLong

public static long getLong(org.apache.hadoop.conf.Configuration configuration,
                           String label)
Utility to retrieve a specified key from a Configuration. Throw exception if not found.

Parameters:
configuration - The Configuration in which the key is sought.
label - The key being sought.
Returns:
Long value of the key.

getStrategy

public static Class<? extends org.apache.hadoop.mapreduce.InputFormat> getStrategy(org.apache.hadoop.conf.Configuration conf,
                                                                                   DistCpOptions options)
Returns the class that implements a copy strategy. Looks up the implementation for a particular strategy from distcp-default.xml

Parameters:
conf - - Configuration object
options - - Handle to input options
Returns:
Class implementing the strategy specified in options.

getRelativePath

public static String getRelativePath(org.apache.hadoop.fs.Path sourceRootPath,
                                     org.apache.hadoop.fs.Path childPath)
Gets relative path of child path with respect to a root path For ex. If childPath = /tmp/abc/xyz/file and sourceRootPath = /tmp/abc Relative path would be /xyz/file If childPath = /file and sourceRootPath = / Relative path would be /file

Parameters:
sourceRootPath - - Source root path
childPath - - Path for which relative path is required
Returns:
- Relative portion of the child path (always prefixed with / unless it is empty

packAttributes

public static String packAttributes(EnumSet<DistCpOptions.FileAttribute> attributes)
Pack file preservation attributes into a string, containing just the first character of each preservation attribute

Parameters:
attributes - - Attribute set to preserve
Returns:
- String containing first letters of each attribute to preserve

unpackAttributes

public static EnumSet<DistCpOptions.FileAttribute> unpackAttributes(String attributes)
Un packs preservation attribute string containing the first character of each preservation attribute back to a set of attributes to preserve

Parameters:
attributes - - Attribute string
Returns:
- Attribute set

preserve

public static void preserve(org.apache.hadoop.fs.FileSystem targetFS,
                            org.apache.hadoop.fs.Path path,
                            org.apache.hadoop.fs.FileStatus srcFileStatus,
                            EnumSet<DistCpOptions.FileAttribute> attributes)
                     throws IOException
Preserve attribute on file matching that of the file status being sent as argument. Barring the block size, all the other attributes are preserved by this function

Parameters:
targetFS - - File system
path - - Path that needs to preserve original file status
srcFileStatus - - Original file status
attributes - - Attribute set that need to be preserved
Throws:
IOException - - Exception if any (particularly relating to group/owner change or any transient error)

sortListing

public static org.apache.hadoop.fs.Path sortListing(org.apache.hadoop.fs.FileSystem fs,
                                                    org.apache.hadoop.conf.Configuration conf,
                                                    org.apache.hadoop.fs.Path sourceListing)
                                             throws IOException
Sort sequence file containing FileStatus and Text as key and value respecitvely

Parameters:
fs - - File System
conf - - Configuration
sourceListing - - Source listing file
Returns:
Path of the sorted file. Is source file with _sorted appended to the name
Throws:
IOException - - Any exception during sort.

getFormatter

public static DecimalFormat getFormatter()

getStringDescriptionFor

public static String getStringDescriptionFor(long nBytes)

checksumsAreEqual

public static boolean checksumsAreEqual(org.apache.hadoop.fs.FileSystem sourceFS,
                                        org.apache.hadoop.fs.Path source,
                                        org.apache.hadoop.fs.FileSystem targetFS,
                                        org.apache.hadoop.fs.Path target)
                                 throws IOException
Utility to compare checksums for the paths specified. If checksums's can't be retrieved, it doesn't fail the test Only time the comparison would fail is when checksums are available and they don't match

Parameters:
sourceFS - FileSystem for the source path.
source - The source path.
targetFS - FileSystem for the target path.
target - The target path.
Returns:
If either checksum couldn't be retrieved, the function returns false. If checksums are retrieved, the function returns true if they match, and false otherwise.
Throws:
IOException - if there's an exception while retrieving checksums.

compareFs

public static boolean compareFs(org.apache.hadoop.fs.FileSystem srcFs,
                                org.apache.hadoop.fs.FileSystem destFs)


Copyright © 2014 InMobi. All rights reserved.