org.apache.hadoop.tools.mapred.lib
Class DynamicInputFormat<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<K,V>
      extended by org.apache.hadoop.tools.mapred.lib.DynamicInputFormat<K,V>

public class DynamicInputFormat<K,V>
extends org.apache.hadoop.mapreduce.InputFormat<K,V>

DynamicInputFormat implements the "Worker pattern" for DistCp. Rather than to split up the copy-list into a set of static splits, the DynamicInputFormat does the following: 1. Splits the copy-list into small chunks on the DFS. 2. Creates a set of empty "dynamic" splits, that each consume as many chunks as it can. This arrangement ensures that a single slow mapper won't slow down the entire job (since the slack will be picked up by other mappers, who consume more chunks.) By varying the split-ratio, one can vary chunk sizes to achieve different performance characteristics.


Constructor Summary
DynamicInputFormat()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<K,V> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
          Implementation of Inputformat::createRecordReader().
 List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
          Implementation of InputFormat::getSplits().
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DynamicInputFormat

public DynamicInputFormat()
Method Detail

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
                                                       throws IOException,
                                                              InterruptedException
Implementation of InputFormat::getSplits(). This method splits up the copy-listing file into chunks, and assigns the first batch to different tasks.

Specified by:
getSplits in class org.apache.hadoop.mapreduce.InputFormat<K,V>
Parameters:
jobContext: - JobContext for the map job.
Returns:
The list of (empty) dynamic input-splits.
Throws:
IOException, - on failure.
InterruptedException
IOException

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<K,V> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
                                                                        org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
                                                                 throws IOException,
                                                                        InterruptedException
Implementation of Inputformat::createRecordReader().

Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<K,V>
Parameters:
inputSplit: - The split for which the RecordReader is required.
taskAttemptContext: - TaskAttemptContext for the current attempt.
Returns:
DynamicRecordReader instance.
Throws:
IOException, - on failure.
InterruptedException
IOException


Copyright © 2014 InMobi. All rights reserved.