org.apache.hadoop.tools.mapred
Class UniformSizeInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.fs.FileStatus>
      extended by org.apache.hadoop.tools.mapred.UniformSizeInputFormat

public class UniformSizeInputFormat
extends org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.fs.FileStatus>

UniformSizeInputFormat extends the InputFormat<> class, to produce input-splits for DistCp. It looks at the copy-listing and groups the contents into input-splits such that the total-number of bytes to be copied for each input split is uniform.


Constructor Summary
UniformSizeInputFormat()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.fs.FileStatus> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
          Implementation of InputFormat::createRecordReader().
 List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
          Implementation of InputFormat::getSplits().
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UniformSizeInputFormat

public UniformSizeInputFormat()
Method Detail

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
                                                       throws IOException,
                                                              InterruptedException
Implementation of InputFormat::getSplits(). Returns a list of InputSplits, such that the number of bytes to be copied for all the splits are approximately equal.

Specified by:
getSplits in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.fs.FileStatus>
Parameters:
context: - JobContext for the job.
Returns:
The list of uniformly-distributed input-splits.
Throws:
IOException: - On failure.
InterruptedException
IOException

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.fs.FileStatus> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                                              org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                                                       throws IOException,
                                                                                                                              InterruptedException
Implementation of InputFormat::createRecordReader().

Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.fs.FileStatus>
Parameters:
split: - The split for which the RecordReader is sought.
context: - The context of the current task-attempt.
Returns:
A SequenceFileRecordReader instance, (since the copy-listing is a simple sequence-file.)
Throws:
IOException
InterruptedException


Copyright © 2014 InMobi. All rights reserved.