org.apache.pig.piggybank.storage
Class SequenceFileLoader
java.lang.Object
org.apache.pig.LoadFunc
org.apache.pig.FileInputLoadFunc
org.apache.pig.piggybank.storage.SequenceFileLoader
- All Implemented Interfaces:
- OrderedLoadFunc
public class SequenceFileLoader
- extends FileInputLoadFunc
A Loader for Hadoop-Standard SequenceFiles.
able to work with the following types as keys or values:
Text, IntWritable, LongWritable, FloatWritable, DoubleWritable, BooleanWritable, ByteWritable
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
protected static final org.apache.commons.logging.Log LOG
mTupleFactory
protected TupleFactory mTupleFactory
serializationFactory
protected org.apache.hadoop.io.serializer.SerializationFactory serializationFactory
keyType
protected byte keyType
valType
protected byte valType
SequenceFileLoader
public SequenceFileLoader()
setKeyType
protected void setKeyType(Class<?> keyClass)
throws BackendException
- Throws:
BackendException
setValueType
protected void setValueType(Class<?> valueClass)
throws BackendException
- Throws:
BackendException
inferPigDataType
protected byte inferPigDataType(Type t)
translateWritableToPigDataType
protected Object translateWritableToPigDataType(org.apache.hadoop.io.Writable w,
byte dataType)
getNext
public Tuple getNext()
throws IOException
- Description copied from class:
LoadFunc
- Retrieves the next tuple to be processed. Implementations should NOT reuse
tuple objects (or inner member objects) they return across calls and
should return a different tuple object in each call.
- Specified by:
getNext
in class LoadFunc
- Returns:
- the next tuple to be processed or null if there are no more tuples
to be processed.
- Throws:
IOException
- if there is an exception while retrieving the next
tuple
getInputFormat
public org.apache.hadoop.mapreduce.InputFormat getInputFormat()
throws IOException
- Description copied from class:
LoadFunc
- This will be called during planning on the front end. This is the
instance of InputFormat (rather than the class name) because the
load function may need to instantiate the InputFormat in order
to control how it is constructed.
- Specified by:
getInputFormat
in class LoadFunc
- Returns:
- the InputFormat associated with this loader.
- Throws:
IOException
- if there is an exception during InputFormat
construction
prepareToRead
public void prepareToRead(org.apache.hadoop.mapreduce.RecordReader reader,
PigSplit split)
throws IOException
- Description copied from class:
LoadFunc
- Initializes LoadFunc for reading data. This will be called during execution
before any calls to getNext. The RecordReader needs to be passed here because
it has been instantiated for a particular InputSplit.
- Specified by:
prepareToRead
in class LoadFunc
- Parameters:
reader
- RecordReader
to be used by this instance of the LoadFuncsplit
- The input PigSplit
to process
- Throws:
IOException
- if there is an exception during initialization
setLocation
public void setLocation(String location,
org.apache.hadoop.mapreduce.Job job)
throws IOException
- Description copied from class:
LoadFunc
- Communicate to the loader the location of the object(s) being loaded.
The location string passed to the LoadFunc here is the return value of
LoadFunc.relativeToAbsolutePath(String, Path)
. Implementations
should use this method to communicate the location (and any other information)
to its underlying InputFormat through the Job object.
This method will be called in the backend multiple times. Implementations
should bear in mind that this method is called multiple times and should
ensure there are no inconsistent side effects due to the multiple calls.
- Specified by:
setLocation
in class LoadFunc
- Parameters:
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, Path)
job
- the Job
object
store or retrieve earlier stored information from the UDFContext
- Throws:
IOException
- if the location is not valid.
Copyright © 2012 The Apache Software Foundation