org.apache.pig.piggybank.evaluation.util
Class SearchQuery
java.lang.Object
org.apache.pig.EvalFunc<String>
org.apache.pig.piggybank.evaluation.util.SearchQuery
public class SearchQuery
- extends EvalFunc<String>
This small UDF takes a search engine URL (Google/Yahoo/AOL/Live) containing
the search query and extracts it. The URL is assumed to be encoded. The query
is normalized, converting it to lower-case, removing punctuations, removing
extra spaces.
Methods inherited from class org.apache.pig.EvalFunc |
finish, getCacheFiles, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setPigLogger, setReporter, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SearchQuery
public SearchQuery()
exec
public String exec(Tuple tuple)
throws IOException
- Description copied from class:
EvalFunc
- This callback method must be implemented by all subclasses. This
is the method that will be invoked on every Tuple of a given dataset.
Since the dataset may be divided up in a variety of ways the programmer
should not make assumptions about state that is maintained between
invocations of this method.
- Specified by:
exec
in class EvalFunc<String>
- Parameters:
tuple
- the Tuple to be processed.
- Returns:
- result, of type T.
- Throws:
IOException
getArgToFuncMapping
public List<FuncSpec> getArgToFuncMapping()
throws FrontendException
- Description copied from class:
EvalFunc
- Allow a UDF to specify type specific implementations of itself. For example,
an implementation of arithmetic sum might have int and float implementations,
since integer arithmetic performs much better than floating point arithmetic. Pig's
typechecker will call this method and using the returned list plus the schema
of the function's input data, decide which implementation of the UDF to use.
- Overrides:
getArgToFuncMapping
in class EvalFunc<String>
- Returns:
- A List containing FuncSpec objects representing the EvalFunc class
which can handle the inputs corresponding to the schema in the objects. Each
FuncSpec should be constructed with a schema that describes the input for that
implementation. For example, the sum function above would return two elements in its
list:
- FuncSpec(this.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.DOUBLE)))
- FuncSpec(IntSum.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.INTEGER)))
This would indicate that the main implementation is used for doubles, and the special
implementation IntSum is used for ints.
- Throws:
FrontendException
outputSchema
public Schema outputSchema(Schema input)
- Description copied from class:
EvalFunc
- Report the schema of the output of this UDF. Pig will make use of
this in error checking, optimization, and planning. The schema
of input data to this UDF is provided.
- Overrides:
outputSchema
in class EvalFunc<String>
- Parameters:
input
- Schema of the input
- Returns:
- Schema of the output
Copyright © 2012 The Apache Software Foundation