|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.EvalFunc<Tuple>
org.apache.pig.piggybank.evaluation.ExtremalTupleByNthField
public class ExtremalTupleByNthField
This class is similar to MaxTupleBy1stField except that it allows you to specify with field to use (instead of just using 1st field) and to specify ascending or descending. The first parameter in the constructor specifies which field to use and the second parameter to the constructor specifies which extremal to retrieve. Strings prefixed by "min", "least", "desc", "small" and "-", irrespective of capitalization and leading white spacing, specifies the computation of the minimum and all other strings means maximum;
define myMin ExtremalTupleByNthField( '4', 'min' );
T = group G ALL;
R = foreach T generate myMin(G);
is equivalent to:
T = order G by $3 asc;
R = limit G 1;
Note above 4 indicates the field with index 3 in the tuple. The 4th field can
be any comparable type, so you can use float, int, string, or even tuples.
By default constructor, this UDF behaves as MaxTupleBy1stField in that it
chooses the max tuple by the comparable in the first field.
define myMax ExtremalTupleByNthField( '3' );
T = group G ALL;
R = foreach T generate myMax(G);
is equivalent to:
T = order G by $2 desc;
R = limit G 1;
define biggestBag ExtremalTupleByNthField('1', max);
R = group TABLE by (key1, key2);
G = cogroup L by key1, R by group.key1;
V = foreach G generate L, biggestBag(R);
This results in each L(eft) bag associated with only the largest bag from the
R(ight) table. If all bags in R are of equal size, the comparator continues
on to perform element-wise comparison. In case of a complete tie in the
comparison, which result is returned is nondeterministic. But because this
class is able to compare any comparable we are able to specify a secondary
key.
define biggestBag ExtremalTupleByNthField('1', max);
G = cogroup L by key1, M by key1, R by key1;
V = foreach G generate FLATTEN(L),
biggestBag(R.($0, $1, $2, $5)) as best_result_by_0,
biggestBag(R.($3, $1, $2, $5)) as best_result_by_3,
biggestBag(M.($0, $2)) as best_misc_data;
this will generate two sets of results and misc data based on two separate
criterion. Since all tuples in the bags have the same size (4, 4, 2
respectively), the tuple comparator continues on and compares the members of
tuples until it finds one. best_result_by_0 and best_result_by3 are ordered
by 1st and 4th member of the tuples. Within each group, ties are broken by
second and third field.
Finally, note that the udf implements both Algebraic and Accumulator, so it
is relatively efficient because it's a one-pass algorithm.
Nested Class Summary | |
---|---|
static class |
ExtremalTupleByNthField.HelperClass
Utility classes and methods |
Field Summary |
---|
Fields inherited from class org.apache.pig.EvalFunc |
---|
log, pigLogger, reporter, returnType |
Constructor Summary | |
---|---|
ExtremalTupleByNthField()
Constructors |
|
ExtremalTupleByNthField(String fieldIndexString)
|
|
ExtremalTupleByNthField(String fieldIndexString,
String order)
|
Method Summary | |
---|---|
void |
accumulate(Tuple b)
Pass tuples to the UDF. |
void |
cleanup()
Called after getValue() to prepare processing for next key. |
Tuple |
exec(Tuple input)
The EvalFunc interface |
protected static Tuple |
extreme(int pind,
int psign,
Tuple input,
PigProgressable reporter)
|
String |
getFinal()
Get the final function. |
String |
getInitial()
Algebraic interface |
String |
getIntermed()
Get the intermediate function. |
Type |
getReturnType()
Get the Type that this EvalFunc returns. |
Tuple |
getValue()
Called when all tuples from current key have been passed to accumulate. |
Schema |
outputSchema(Schema input)
Report the schema of the output of this UDF. |
protected static int |
parseFieldIndex(String inputFieldIndex)
|
protected static int |
parseOrdering(String order)
|
Methods inherited from class org.apache.pig.EvalFunc |
---|
finish, getArgToFuncMapping, getCacheFiles, getLogger, getPigLogger, getReporter, getSchemaName, isAsynchronous, progress, setPigLogger, setReporter, warn |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public ExtremalTupleByNthField() throws ExecException
ExecException
public ExtremalTupleByNthField(String fieldIndexString) throws ExecException
ExecException
public ExtremalTupleByNthField(String fieldIndexString, String order) throws ExecException
ExecException
Method Detail |
---|
public Tuple exec(Tuple input) throws IOException
exec
in class EvalFunc<Tuple>
input
- the Tuple to be processed.
IOException
public Type getReturnType()
EvalFunc
getReturnType
in class EvalFunc<Tuple>
public Schema outputSchema(Schema input)
EvalFunc
outputSchema
in class EvalFunc<Tuple>
input
- Schema of the input
public String getInitial()
getInitial
in interface Algebraic
public String getIntermed()
Algebraic
getIntermed
in interface Algebraic
public String getFinal()
Algebraic
getFinal
in interface Algebraic
public void accumulate(Tuple b) throws IOException
Accumulator
accumulate
in interface Accumulator<Tuple>
b
- A tuple containing a single field, which is a bag. The bag will contain the set
of tuples being passed to the UDF in this iteration.
IOException
public void cleanup()
Accumulator
cleanup
in interface Accumulator<Tuple>
public Tuple getValue()
Accumulator
getValue
in interface Accumulator<Tuple>
protected static final Tuple extreme(int pind, int psign, Tuple input, PigProgressable reporter) throws ExecException
ExecException
protected static int parseFieldIndex(String inputFieldIndex) throws ExecException
ExecException
protected static int parseOrdering(String order)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |