org.apache.pig.impl.io
Class ReadToEndLoader
java.lang.Object
org.apache.pig.LoadFunc
org.apache.pig.impl.io.ReadToEndLoader
public class ReadToEndLoader
- extends LoadFunc
This is wrapper Loader which wraps a real LoadFunc underneath and allows
to read a file completely starting a given split (indicated by a split index
which is used to look in the List returned by the underlying
InputFormat's getSplits() method). So if the supplied split index is 0, this
loader will read the entire file. If it is non zero it will read the partial
file beginning from that split to the last split.
The call sequence to use this is:
1) construct an object using the constructor
2) Call getNext() in a loop till it returns null
Constructor Summary |
ReadToEndLoader(LoadFunc wrappedLoadFunc,
org.apache.hadoop.conf.Configuration conf,
String inputLocation,
int splitIndex)
|
ReadToEndLoader(LoadFunc wrappedLoadFunc,
org.apache.hadoop.conf.Configuration conf,
String inputLocation,
int[] toReadSplitIdxs)
This constructor takes an array of split indexes (toReadSplitIdxs) of the
splits to be read. |
Method Summary |
org.apache.hadoop.mapreduce.InputFormat |
getInputFormat()
This will be called during planning on the front end. |
LoadCaster |
getLoadCaster()
This will be called on the front end during planning and not on the back
end during execution. |
Tuple |
getNext()
Retrieves the next tuple to be processed. |
void |
prepareToRead(org.apache.hadoop.mapreduce.RecordReader reader,
PigSplit split)
Initializes LoadFunc for reading data. |
void |
setLocation(String location,
org.apache.hadoop.mapreduce.Job job)
Communicate to the loader the location of the object(s) being loaded. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ReadToEndLoader
public ReadToEndLoader(LoadFunc wrappedLoadFunc,
org.apache.hadoop.conf.Configuration conf,
String inputLocation,
int splitIndex)
throws IOException
- Parameters:
wrappedLoadFunc
- conf
- inputLocation
- splitIndex
-
- Throws:
IOException
InterruptedException
ReadToEndLoader
public ReadToEndLoader(LoadFunc wrappedLoadFunc,
org.apache.hadoop.conf.Configuration conf,
String inputLocation,
int[] toReadSplitIdxs)
throws IOException
- This constructor takes an array of split indexes (toReadSplitIdxs) of the
splits to be read.
- Parameters:
wrappedLoadFunc
- conf
- inputLocation
- toReadSplitIdxs
-
- Throws:
IOException
InterruptedException
getNext
public Tuple getNext()
throws IOException
- Description copied from class:
LoadFunc
- Retrieves the next tuple to be processed. Implementations should NOT reuse
tuple objects (or inner member objects) they return across calls and
should return a different tuple object in each call.
- Specified by:
getNext
in class LoadFunc
- Returns:
- the next tuple to be processed or null if there are no more tuples
to be processed.
- Throws:
IOException
- if there is an exception while retrieving the next
tuple
getInputFormat
public org.apache.hadoop.mapreduce.InputFormat getInputFormat()
throws IOException
- Description copied from class:
LoadFunc
- This will be called during planning on the front end. This is the
instance of InputFormat (rather than the class name) because the
load function may need to instantiate the InputFormat in order
to control how it is constructed.
- Specified by:
getInputFormat
in class LoadFunc
- Returns:
- the InputFormat associated with this loader.
- Throws:
IOException
- if there is an exception during InputFormat
construction
getLoadCaster
public LoadCaster getLoadCaster()
throws IOException
- Description copied from class:
LoadFunc
- This will be called on the front end during planning and not on the back
end during execution.
- Overrides:
getLoadCaster
in class LoadFunc
- Returns:
- the
LoadCaster
associated with this loader. Returning null
indicates that casts from byte array are not supported for this loader.
construction
- Throws:
IOException
- if there is an exception during LoadCaster
prepareToRead
public void prepareToRead(org.apache.hadoop.mapreduce.RecordReader reader,
PigSplit split)
- Description copied from class:
LoadFunc
- Initializes LoadFunc for reading data. This will be called during execution
before any calls to getNext. The RecordReader needs to be passed here because
it has been instantiated for a particular InputSplit.
- Specified by:
prepareToRead
in class LoadFunc
- Parameters:
reader
- RecordReader
to be used by this instance of the LoadFuncsplit
- The input PigSplit
to process
setLocation
public void setLocation(String location,
org.apache.hadoop.mapreduce.Job job)
throws IOException
- Description copied from class:
LoadFunc
- Communicate to the loader the location of the object(s) being loaded.
The location string passed to the LoadFunc here is the return value of
LoadFunc.relativeToAbsolutePath(String, Path)
. Implementations
should use this method to communicate the location (and any other information)
to its underlying InputFormat through the Job object.
This method will be called in the backend multiple times. Implementations
should bear in mind that this method is called multiple times and should
ensure there are no inconsistent side effects due to the multiple calls.
- Specified by:
setLocation
in class LoadFunc
- Parameters:
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, Path)
job
- the Job
object
store or retrieve earlier stored information from the UDFContext
- Throws:
IOException
- if the location is not valid.
Copyright © 2012 The Apache Software Foundation