org.apache.pig
Interface OrderedLoadFunc
- All Known Implementing Classes:
- AllLoader, AvroStorage, BinStorage, CSVExcelStorage, CSVLoader, FileInputLoadFunc, HBaseStorage, HiveColumnarLoader, InterStorage, PigStorage, PigStorageSchema, SequenceFileLoader, TableLoader, TFileStorage
@InterfaceAudience.Public
@InterfaceStability.Evolving
public interface OrderedLoadFunc
Implementing this interface indicates to Pig that a given loader
can be used for MergeJoin. It does not mean the data itself is ordered,
but rather that the splits returned by the underlying InputFormat
can be ordered to match the order of the data they are loading. For
example, files splits have a natural order (that of the file they are
from) while splits of RDBMS does not (since tables have no inherent order).
The position as represented by the
WritableComparable object is stored in the index created by
a MergeJoin sampling MapReduce job to get an ordered sequence of splits.
It is necessary to read splits in order during a merge join to assure
data is being read according to the sort order.
- Since:
- Pig 0.7
Method Summary |
org.apache.hadoop.io.WritableComparable<?> |
getSplitComparable(org.apache.hadoop.mapreduce.InputSplit split)
The WritableComparable object returned will be used to compare
the position of different splits in an ordered stream |
getSplitComparable
org.apache.hadoop.io.WritableComparable<?> getSplitComparable(org.apache.hadoop.mapreduce.InputSplit split)
throws IOException
- The WritableComparable object returned will be used to compare
the position of different splits in an ordered stream
- Parameters:
split
- An InputSplit from the InputFormat underlying this loader.
- Returns:
- WritableComparable representing the position of the split in input
- Throws:
IOException
Copyright © 2012 The Apache Software Foundation