org.apache.pig.impl.builtin
Class PartitionSkewedKeys
java.lang.Object
org.apache.pig.EvalFunc<Map<String,Object>>
org.apache.pig.impl.builtin.PartitionSkewedKeys
public class PartitionSkewedKeys
- extends EvalFunc<Map<String,Object>>
Partition reducers for skewed keys. This is used in skewed join during
sampling process. It figures out how many reducers required to process a
skewed key without causing spill and allocate this number of reducers to this
key. This UDF outputs a map which contains 2 keys:
"totalreducers": the value is an integer wich indicates the
number of total reducers for this join job
"partition.list": the value is a bag which contains a
list of tuples with each tuple representing partitions for a skewed key.
The tuple has format of <join key>,<min index of reducer>,
<max index of reducer>
For example, a join job configures 10 reducers, and the sampling process
finds out 2 skewed keys, "swpv" needs 4 reducers and "swps"
needs 2 reducers. The output file would be like following:
{totalreducers=10, partition.list={(swpv,0,3), (swps,4,5)}}
The name of this file is set into next MR job which does the actual join.
That job uses this information to partition skewed keys properly
Method Summary |
Map<String,Object> |
exec(Tuple in)
first field in the input tuple is the number of reducers
second field is the *sorted* bag of samples
this should be called only once |
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getCacheFiles, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, outputSchema, progress, setPigLogger, setReporter, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PARTITION_LIST
public static final String PARTITION_LIST
- See Also:
- Constant Field Values
TOTAL_REDUCERS
public static final String TOTAL_REDUCERS
- See Also:
- Constant Field Values
DEFAULT_PERCENT_MEMUSAGE
public static final float DEFAULT_PERCENT_MEMUSAGE
- See Also:
- Constant Field Values
PartitionSkewedKeys
public PartitionSkewedKeys()
PartitionSkewedKeys
public PartitionSkewedKeys(String[] args)
exec
public Map<String,Object> exec(Tuple in)
throws IOException
- first field in the input tuple is the number of reducers
second field is the *sorted* bag of samples
this should be called only once
- Specified by:
exec
in class EvalFunc<Map<String,Object>>
- Parameters:
in
- the Tuple to be processed.
- Returns:
- result, of type T.
- Throws:
IOException
Copyright © 2012 The Apache Software Foundation