org.apache.hadoop.tools.mapred.lib
Class DynamicInputFormat<K,V>
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.tools.mapred.lib.DynamicInputFormat<K,V>
public class DynamicInputFormat<K,V>
- extends org.apache.hadoop.mapreduce.InputFormat<K,V>
DynamicInputFormat implements the "Worker pattern" for DistCp.
Rather than to split up the copy-list into a set of static splits,
the DynamicInputFormat does the following:
1. Splits the copy-list into small chunks on the DFS.
2. Creates a set of empty "dynamic" splits, that each consume as many chunks
as it can.
This arrangement ensures that a single slow mapper won't slow down the entire
job (since the slack will be picked up by other mappers, who consume more
chunks.)
By varying the split-ratio, one can vary chunk sizes to achieve different
performance characteristics.
Method Summary |
org.apache.hadoop.mapreduce.RecordReader<K,V> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
Implementation of Inputformat::createRecordReader(). |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
Implementation of InputFormat::getSplits(). |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DynamicInputFormat
public DynamicInputFormat()
getSplits
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
throws IOException,
InterruptedException
- Implementation of InputFormat::getSplits(). This method splits up the
copy-listing file into chunks, and assigns the first batch to different
tasks.
- Specified by:
getSplits
in class org.apache.hadoop.mapreduce.InputFormat<K,V>
- Parameters:
jobContext:
- JobContext for the map job.
- Returns:
- The list of (empty) dynamic input-splits.
- Throws:
IOException,
- on failure.
InterruptedException
IOException
createRecordReader
public org.apache.hadoop.mapreduce.RecordReader<K,V> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
throws IOException,
InterruptedException
- Implementation of Inputformat::createRecordReader().
- Specified by:
createRecordReader
in class org.apache.hadoop.mapreduce.InputFormat<K,V>
- Parameters:
inputSplit:
- The split for which the RecordReader is required.taskAttemptContext:
- TaskAttemptContext for the current attempt.
- Returns:
- DynamicRecordReader instance.
- Throws:
IOException,
- on failure.
InterruptedException
IOException
Copyright © 2014 InMobi. All rights reserved.