|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.EvalFunc<DataBag>
org.apache.pig.builtin.TOP
public class TOP
Top UDF accepts a bag of tuples and returns top-n tuples depending upon the tuple field value of type long. Both n and field number needs to be provided to the UDF. The UDF iterates through the input bag and just retains top-n tuples by storing them in a priority queue of size n+1 where priority is the long field. This is efficient as priority queue provides constant time - O(1) removal of the least element and O(log n) time for heap restructuring. The UDF is especially helpful for turning the nested grouping operation inside out and retaining top-n in a nested group. Assumes all tuples in the bag contain an element of the same type in the compared column. Sample usage: A = LOAD 'test.tsv' as (first: chararray, second: chararray); B = GROUP A BY (first, second); C = FOREACH B generate FLATTEN(group), COUNT(*) as count; D = GROUP C BY first; // again group by first topResults = FOREACH D { result = Top(10, 2, C); // and retain top 10 occurrences of 'second' in first GENERATE FLATTEN(result); }
Nested Class Summary | |
---|---|
static class |
TOP.Final
|
static class |
TOP.Initial
|
static class |
TOP.Intermed
|
Field Summary |
---|
Fields inherited from class org.apache.pig.EvalFunc |
---|
pigLogger, reporter, returnType |
Constructor Summary | |
---|---|
TOP()
|
Method Summary | |
---|---|
DataBag |
exec(Tuple tuple)
This callback method must be implemented by all subclasses. |
List<FuncSpec> |
getArgToFuncMapping()
Allow a UDF to specify type specific implementations of itself. |
String |
getFinal()
Get the final function. |
String |
getInitial()
Get the initial function. |
String |
getIntermed()
Get the intermediate function. |
Schema |
outputSchema(Schema input)
Report the schema of the output of this UDF. |
protected static void |
updateTop(PriorityQueue<Tuple> store,
int limit,
DataBag inputBag)
|
Methods inherited from class org.apache.pig.EvalFunc |
---|
finish, getCacheFiles, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setPigLogger, setReporter, warn |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public TOP()
Method Detail |
---|
public DataBag exec(Tuple tuple) throws IOException
EvalFunc
exec
in class EvalFunc<DataBag>
tuple
- the Tuple to be processed.
IOException
protected static void updateTop(PriorityQueue<Tuple> store, int limit, DataBag inputBag)
public List<FuncSpec> getArgToFuncMapping() throws FrontendException
EvalFunc
getArgToFuncMapping
in class EvalFunc<DataBag>
FrontendException
public Schema outputSchema(Schema input)
EvalFunc
outputSchema
in class EvalFunc<DataBag>
input
- Schema of the input
public String getInitial()
Algebraic
getInitial
in interface Algebraic
public String getIntermed()
Algebraic
getIntermed
in interface Algebraic
public String getFinal()
Algebraic
getFinal
in interface Algebraic
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |