OrderedLoadFunc (Pig 0.9.3-SNAPSHOT API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.pig
Interface OrderedLoadFunc

All Known Implementing Classes:: AllLoader, AvroStorage, BinStorage, CSVExcelStorage, CSVLoader, FileInputLoadFunc, HBaseStorage, HiveColumnarLoader, InterStorage, PigStorage, PigStorageSchema, SequenceFileLoader, TableLoader, TFileStorage

@InterfaceAudience.Public @InterfaceStability.Evolving public interface OrderedLoadFunc

Implementing this interface indicates to Pig that a given loader can be used for MergeJoin. It does not mean the data itself is ordered, but rather that the splits returned by the underlying InputFormat can be ordered to match the order of the data they are loading. For example, files splits have a natural order (that of the file they are from) while splits of RDBMS does not (since tables have no inherent order). The position as represented by the WritableComparable object is stored in the index created by a MergeJoin sampling MapReduce job to get an ordered sequence of splits. It is necessary to read splits in order during a merge join to assure data is being read according to the sort order.

Since:: Pig 0.7

Method Summary
`org.apache.hadoop.io.WritableComparable<?>`	`getSplitComparable(org.apache.hadoop.mapreduce.InputSplit split)` The WritableComparable object returned will be used to compare the position of different splits in an ordered stream

Method Detail

getSplitComparable

org.apache.hadoop.io.WritableComparable<?> getSplitComparable(org.apache.hadoop.mapreduce.InputSplit split)
                                                              throws IOException

The WritableComparable object returned will be used to compare the position of different splits in an ordered stream

Parameters:: split - An InputSplit from the InputFormat underlying this loader.
Returns:: WritableComparable representing the position of the split in input
Throws:: IOException