org.apache.pig.impl.builtin
Class RandomSampleLoader
java.lang.Object
org.apache.pig.LoadFunc
org.apache.pig.impl.builtin.SampleLoader
org.apache.pig.impl.builtin.RandomSampleLoader
public class RandomSampleLoader
- extends SampleLoader
A loader that samples the data.
It randomly samples tuples from input. The number of tuples to be sampled
has to be set before the first call to getNext().
see documentation of getNext() call.
Method Summary |
Tuple |
getNext()
Allocate a buffer for numSamples elements, populate it with the
first numSamples tuples, and continue scanning rest of the input. |
void |
prepareToRead(org.apache.hadoop.mapreduce.RecordReader reader,
PigSplit split)
Initializes LoadFunc for reading data. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
nextSampleIdx
protected int nextSampleIdx
RandomSampleLoader
public RandomSampleLoader(String funcSpec,
String ns)
- Construct with a class of loader to use.
- Parameters:
funcSpec
- func spec of the loader to use.ns
- Number of samples per map to collect.
Arguments are passed as strings instead of actual types (FuncSpec, int)
because FuncSpec only supports string arguments to
UDF constructors.
getNext
public Tuple getNext()
throws IOException
- Allocate a buffer for numSamples elements, populate it with the
first numSamples tuples, and continue scanning rest of the input.
For every ith next() call, we generate a random number r s.t. 0<=r
- Specified by:
getNext
in class LoadFunc
- Returns:
- the next tuple to be processed or null if there are no more tuples
to be processed.
- Throws:
IOException
- if there is an exception while retrieving the next
tuple
prepareToRead
public void prepareToRead(org.apache.hadoop.mapreduce.RecordReader reader,
PigSplit split)
throws IOException
- Description copied from class:
LoadFunc
- Initializes LoadFunc for reading data. This will be called during execution
before any calls to getNext. The RecordReader needs to be passed here because
it has been instantiated for a particular InputSplit.
- Overrides:
prepareToRead
in class SampleLoader
- Parameters:
reader
- RecordReader
to be used by this instance of the LoadFuncsplit
- The input PigSplit
to process
- Throws:
IOException
- if there is an exception during initialization
Copyright © 2012 The Apache Software Foundation