|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.EvalFunc<DataBag>
org.apache.pig.builtin.COR
public class COR
Computes the correlation between sets of data. The returned value
will be a bag which will contain a tuple for each combination of input
schema and inside tuple we will have two schema name and correlation between
those two schemas.
A = load 'input.xml' using PigStorage(':');
B = group A all;
D = foreach B generate group,COR(A.$0,A.$1,A.$2);
Nested Class Summary | |
---|---|
static class |
COR.Final
|
static class |
COR.Initial
|
static class |
COR.Intermed
|
Field Summary | |
---|---|
protected Vector<String> |
schemaName
|
Fields inherited from class org.apache.pig.EvalFunc |
---|
log, pigLogger, reporter, returnType |
Constructor Summary | |
---|---|
COR()
|
|
COR(String... schemaName)
|
Method Summary | |
---|---|
protected static Tuple |
combine(DataBag values)
combine results of different data chunk |
protected static Tuple |
computeAll(DataBag first,
DataBag second)
compute sum(XY), sum(X), sum(Y), sum(XX), sum(YY) from given data sets |
DataBag |
exec(Tuple input)
Function to compute correlation between data sets. |
String |
getFinal()
Get the final function. |
String |
getInitial()
Get the initial function. |
String |
getIntermed()
Get the intermediate function. |
Schema |
outputSchema(Schema input)
Report the schema of the output of this UDF. |
String |
toString()
Function to return argument of constructor as string. |
Methods inherited from class org.apache.pig.EvalFunc |
---|
finish, getArgToFuncMapping, getCacheFiles, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setPigLogger, setReporter, warn |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected Vector<String> schemaName
Constructor Detail |
---|
public COR()
public COR(String... schemaName)
Method Detail |
---|
public DataBag exec(Tuple input) throws IOException
exec
in class EvalFunc<DataBag>
input
- input tuple which contains data sets.
IOException
public String toString()
toString
in class Object
public String getInitial()
Algebraic
getInitial
in interface Algebraic
public String getIntermed()
Algebraic
getIntermed
in interface Algebraic
public String getFinal()
Algebraic
getFinal
in interface Algebraic
protected static Tuple combine(DataBag values) throws IOException
values
- DataBag containing partial results computed on different data chunks
IOException
protected static Tuple computeAll(DataBag first, DataBag second) throws IOException
first
- DataBag containing first data setsecond
- DataBag containing second data set
IOException
public Schema outputSchema(Schema input)
EvalFunc
outputSchema
in class EvalFunc<DataBag>
input
- Schema of the input
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |