org.apache.pig
Class EvalFunc<T>
java.lang.Object
org.apache.pig.EvalFunc<T>
- Direct Known Subclasses:
- ABS, ABS, ARITY, AVG, AVG.Final, AVG.Initial, AVG.Intermediate, BagSize, Base, Base, Bin, BinCond, CONCAT, ConstantSize, copySign, COR, COR, COR.Final, COR.Final, COR.Initial, COR.Initial, COR.Intermed, COR.Intermed, COUNT, COUNT_STAR, COUNT_STAR.Final, COUNT_STAR.Initial, COUNT_STAR.Intermediate, COUNT.Final, COUNT.Initial, COUNT.Intermediate, COV, COV, COV.Final, COV.Final, COV.Initial, COV.Initial, COV.Intermed, COV.Intermed, CustomFormatToISO, DateExtractor, Decode, DIFF, DiffDate, Distinct, Distinct.Final, Distinct.Initial, Distinct.Intermediate, DoubleAbs, DoubleAbs, DoubleAvg, DoubleAvg.Final, DoubleAvg.Initial, DoubleAvg.Intermediate, DoubleCopySign, DoubleGetExponent, DoubleMax, DoubleMax, DoubleMax.Final, DoubleMax.Initial, DoubleMax.Intermediate, DoubleMin, DoubleMin, DoubleMin.Final, DoubleMin.Initial, DoubleMin.Intermediate, DoubleNextAfter, DoubleNextup, DoubleRound, DoubleRound, DoubleSignum, DoubleSum, DoubleSum.Final, DoubleSum.Initial, DoubleSum.Intermediate, DoubleUlp, ExtremalTupleByNthField, ExtremalTupleByNthField.HelperClass, FilterFunc, FindQuantiles, FloatAbs, FloatAbs, FloatAvg, FloatAvg.Final, FloatAvg.Initial, FloatAvg.Intermediate, FloatCopySign, FloatGetExponent, FloatMax, FloatMax, FloatMax.Final, FloatMax.Initial, FloatMax.Intermediate, FloatMin, FloatMin, FloatMin.Final, FloatMin.Initial, FloatMin.Intermediate, FloatNextAfter, FloatNextup, FloatRound, FloatRound, FloatSignum, FloatSum, FloatSum.Final, FloatSum.Initial, FloatSum.Intermediate, FloatUlp, GenericInvoker, getExponent, GetMemNumRows, GFAny, GFCross, GFReplicate, HashFNV, HostExtractor, IdentityColumn, INDEXOF, INDEXOF, IntAbs, IntAbs, IntAvg, IntAvg.Final, IntAvg.Initial, IntAvg.Intermediate, IntMax, IntMax, IntMax.Final, IntMax.Initial, IntMax.Intermediate, IntMin, IntMin, IntMin.Final, IntMin.Initial, IntMin.Intermediate, IntSum, IntSum.Final, IntSum.Initial, IntSum.Intermediate, ISODaysBetween, ISOHoursBetween, ISOMinutesBetween, ISOMonthsBetween, ISOSecondsBetween, ISOToDay, ISOToHour, ISOToMinute, ISOToMonth, ISOToSecond, ISOToUnix, ISOToWeek, ISOToYear, ISOYearsBetween, JsFunction, JythonFunction, LAST_INDEX_OF, LASTINDEXOF, LcFirst, LCFIRST, LENGTH, LongAbs, LongAbs, LongAvg, LongAvg.Final, LongAvg.Initial, LongAvg.Intermediate, LongMax, LongMax, LongMax.Final, LongMax.Initial, LongMax.Intermediate, LongMin, LongMin, LongMin.Final, LongMin.Initial, LongMin.Intermediate, LongSum, LongSum.Final, LongSum.Initial, LongSum.Intermediate, LookupInFiles, LOWER, LOWER, MapSize, MAX, MAX, MAX.Final, MAX.Initial, MAX.Intermediate, MaxTupleBy1stField, MaxTupleBy1stField.Final, MaxTupleBy1stField.Initial, MaxTupleBy1stField.Intermediate, MIN, MIN, MIN.Final, MIN.Initial, MIN.Intermediate, nextAfter, NEXTUP, PartitionSkewedKeys, RANDOM, RANDOM, ReadScalars, REGEX_EXTRACT, REGEX_EXTRACT_ALL, RegexExtract, RegexExtractAll, RegexMatch, REPLACE, REPLACE, Reverse, ROUND, ROUND, SCALB, SearchEngineExtractor, SearchQuery, SearchTermExtractor, SIGNUM, SIZE, Split, StringConcat, StringMax, StringMax.Final, StringMax.Initial, StringMax.Intermediate, StringMin, StringMin.Final, StringMin.Initial, StringMin.Intermediate, StringSize, STRSPLIT, SUBSTRING, SUBSTRING, SUM, SUM.Final, SUM.Initial, SUM.Intermediate, ToBag, TOBAG, TOKENIZE, TOMAP, Top, TOP, Top.Final, TOP.Final, Top.Initial, TOP.Initial, Top.Intermed, TOP.Intermed, ToTuple, TOTUPLE, Trim, TRIM, TupleSize, UcFirst, UCFIRST, ULP, UnixToISO, UPPER, UPPER
@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class EvalFunc<T>
- extends Object
The class is used to implement functions to be applied to
fields in a dataset. The function is applied to each Tuple in the set.
The programmer should not make assumptions about state maintained
between invocations of the exec() method since the Pig runtime
will schedule and localize invocations based on information provided
at runtime. The programmer also should not make assumptions about when or
how many times the class will be instantiated, since it may be instantiated
multiple times in both the front and back end.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
reporter
protected PigProgressable reporter
- Reporter to send heartbeats to Hadoop. If exec will take more than a
a few seconds
PigProgressable.progress()
should be called
occasionally to avoid timeouts. Default Hadoop timeout is 600 seconds.
log
protected org.apache.commons.logging.Log log
- Logging object. Log calls made on the front end will be sent to
pig's log on the client. Log calls made on the backend will be
sent to stdout and can be seen in the Hadoop logs.
pigLogger
protected PigLogger pigLogger
- Logger for aggregating warnings. Any warnings to be sent to the user
should be logged to this via
PigLogger.warn(java.lang.Object, java.lang.String, java.lang.Enum)
.
returnType
protected Type returnType
- Return type of this instance of EvalFunc.
EvalFunc
public EvalFunc()
getSchemaName
protected String getSchemaName(String name,
Schema input)
getReturnType
public Type getReturnType()
- Get the Type that this EvalFunc returns.
- Returns:
- Type
progress
public final void progress()
- Utility method to allow UDF to report progress. If exec will take more than a
a few seconds
PigProgressable.progress()
should be called
occasionally to avoid timeouts. Default Hadoop timeout is 600 seconds.
warn
public final void warn(String msg,
Enum warningEnum)
- Issue a warning. Warning messages are aggregated and reported to
the user.
- Parameters:
msg
- String message of the warningwarningEnum
- type of warning
finish
public void finish()
- Placeholder for cleanup to be performed at the end. User defined functions can override.
Default implementation is a no-op.
exec
public abstract T exec(Tuple input)
throws IOException
- This callback method must be implemented by all subclasses. This
is the method that will be invoked on every Tuple of a given dataset.
Since the dataset may be divided up in a variety of ways the programmer
should not make assumptions about state that is maintained between
invocations of this method.
- Parameters:
input
- the Tuple to be processed.
- Returns:
- result, of type T.
- Throws:
IOException
outputSchema
public Schema outputSchema(Schema input)
- Report the schema of the output of this UDF. Pig will make use of
this in error checking, optimization, and planning. The schema
of input data to this UDF is provided.
- Parameters:
input
- Schema of the input
- Returns:
- Schema of the output
isAsynchronous
@Deprecated
public boolean isAsynchronous()
- Deprecated.
- This function should be overriden to return true for functions that return their values
asynchronously. Currently pig never attempts to execute a function
asynchronously.
- Returns:
- true if the function can be executed asynchronously.
getReporter
public PigProgressable getReporter()
setReporter
public final void setReporter(PigProgressable reporter)
- Set the reporter. Called by Pig to provide a reference of
the reporter to the UDF.
- Parameters:
reporter
- Hadoop reporter
getArgToFuncMapping
public List<FuncSpec> getArgToFuncMapping()
throws FrontendException
- Allow a UDF to specify type specific implementations of itself. For example,
an implementation of arithmetic sum might have int and float implementations,
since integer arithmetic performs much better than floating point arithmetic. Pig's
typechecker will call this method and using the returned list plus the schema
of the function's input data, decide which implementation of the UDF to use.
- Returns:
- A List containing FuncSpec objects representing the EvalFunc class
which can handle the inputs corresponding to the schema in the objects. Each
FuncSpec should be constructed with a schema that describes the input for that
implementation. For example, the sum function above would return two elements in its
list:
- FuncSpec(this.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.DOUBLE)))
- FuncSpec(IntSum.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.INTEGER)))
This would indicate that the main implementation is used for doubles, and the special
implementation IntSum is used for ints.
- Throws:
FrontendException
getCacheFiles
public List<String> getCacheFiles()
- Allow a UDF to specify a list of files it would like placed in the distributed
cache. These files will be put in the cache for every job the UDF is used in.
The default implementation returns null.
- Returns:
- A list of files
getPigLogger
public PigLogger getPigLogger()
setPigLogger
public final void setPigLogger(PigLogger pigLogger)
- Set the PigLogger object. Called by Pig to provide a reference
to the UDF.
- Parameters:
pigLogger
- PigLogger object.
getLogger
public org.apache.commons.logging.Log getLogger()
Copyright © 2012 The Apache Software Foundation