|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
@InterfaceAudience.Public @InterfaceStability.Stable public interface Algebraic
An interface to declare that an EvalFunc's calculation can be decomposed into intitial, intermediate, and final steps. More formally, suppose we have to compute an function f over a bag X. In general, we need to know the entire X before we can make any progress on f. However, some functions are algebraic e.g. SUM. In these cases, you can apply some initital function f_init on subsets of X to get partial results. You can then combine partial results from different subsets of X using an intermediate function f_intermed. To get the final answers, several partial results can be combined by invoking a final f_final function. For the function SUM, f_init, f_intermed, and f_final are all SUM. See the code for builtin AVG to get a better idea of how algebraic works. When eval functions implement this interface, Pig will attempt to use MapReduce's combiner. The initial funciton will be called in the map phase and be passed a single tuple. The intermediate function will be called 0 or more times in the combiner phase. And the final function will be called once in the reduce phase. It is important that the results be the same whether the intermediate function is called 0, 1, or more times. Hadoop makes no guarantees about how many times the combiner will be called in a job.
Method Summary | |
---|---|
String |
getFinal()
Get the final function. |
String |
getInitial()
Get the initial function. |
String |
getIntermed()
Get the intermediate function. |
Method Detail |
---|
String getInitial()
String getIntermed()
String getFinal()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |