|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.hadoop.zebra.mapred.TableInputFormat
@Deprecated public class TableInputFormat
InputFormat
class for reading one or more
BasicTables.
Usage Example:
In the main program, add the following code.
jobConf.setInputFormat(TableInputFormat.class); TableInputFormat.setInputPaths(jobConf, new Path("path/to/table1", new Path("path/to/table2"); TableInputFormat.setProjection(jobConf, "Name, Salary, BonusPct");The above code does the following things:
static class MyMapClass implements Mapper<BytesWritable, Tuple, K, V> { // keep the tuple object for reuse. // indices of various fields in the input Tuple. int idxName, idxSalary, idxBonusPct; @Override public void configure(JobConf job) { Schema projection = TableInputFormat.getProjection(job); // determine the field indices. idxName = projection.getColumnIndex("Name"); idxSalary = projection.getColumnIndex("Salary"); idxBonusPct = projection.getColumnIndex("BonusPct"); } @Override public void map(BytesWritable key, Tuple value, OutputCollector<K, V> output, Reporter reporter) throws IOException { try { String name = (String) value.get(idxName); int salary = (Integer) value.get(idxSalary); double bonusPct = (Double) value.get(idxBonusPct); // do something with the input data } catch (ExecException e) { e.printStackTrace(); } } @Override public void close() throws IOException { // no-op } }A little bit more explanation on the PIG
Tuple
objects. A Tuple is an
ordered list of PIG datum objects. The permitted PIG datum types can be
categorized as Scalar types and Composite types.
Supported Scalar types include seven native Java types: Boolean, Byte,
Integer, Long, Float, Double, String, as well as one PIG class called
DataByteArray
that represents type-less byte array.
Supported Composite types include:
Map
: It is the same as Java Map class, with the additional
restriction that the key-type must be one of the scalar types PIG recognizes,
and the value-type any of the scaler or composite types PIG understands.
DataBag
: A DataBag is a collection of Tuples.
Tuple
: Yes, Tuple itself can be a datum in another Tuple.
Field Summary | |
---|---|
static String |
INPUT_DELETED_CGS
Deprecated. |
static String |
INPUT_EXPR
Deprecated. |
static String |
INPUT_FE
Deprecated. |
static String |
INPUT_PROJ
Deprecated. |
static String |
INPUT_SORT
Deprecated. |
Constructor Summary | |
---|---|
TableInputFormat()
Deprecated. |
Method Summary | |
---|---|
static String |
getProjection(org.apache.hadoop.mapred.JobConf conf)
Deprecated. Get the projection from the JobConf |
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.BytesWritable,Tuple> |
getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf conf,
org.apache.hadoop.mapred.Reporter reporter)
Deprecated. |
static Schema |
getSchema(org.apache.hadoop.mapred.JobConf conf)
Deprecated. Get the schema of a table expr |
static SortInfo |
getSortInfo(org.apache.hadoop.mapred.JobConf conf)
Deprecated. Get the SortInfo object regarding a Zebra table |
org.apache.hadoop.mapred.InputSplit[] |
getSplits(org.apache.hadoop.mapred.JobConf conf,
int numSplits)
Deprecated. |
static TableRecordReader |
getTableRecordReader(org.apache.hadoop.mapred.JobConf conf,
String projection)
Deprecated. Get a TableRecordReader on a single split |
static void |
requireSortedTable(org.apache.hadoop.mapred.JobConf conf,
ZebraSortInfo sortInfo)
Deprecated. Requires sorted table or table union |
static void |
setInputPaths(org.apache.hadoop.mapred.JobConf conf,
org.apache.hadoop.fs.Path... paths)
Deprecated. Set the paths to the input table. |
static void |
setMinSplitSize(org.apache.hadoop.mapred.JobConf conf,
long minSize)
Deprecated. Set the minimum split size. |
static void |
setProjection(org.apache.hadoop.mapred.JobConf conf,
String projection)
Deprecated. Use setProjection(JobConf, ZebraProjection) instead. |
static void |
setProjection(org.apache.hadoop.mapred.JobConf conf,
ZebraProjection projection)
Deprecated. Set the input projection in the JobConf object. |
void |
validateInput(org.apache.hadoop.mapred.JobConf conf)
Deprecated. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String INPUT_EXPR
public static final String INPUT_PROJ
public static final String INPUT_SORT
public static final String INPUT_FE
public static final String INPUT_DELETED_CGS
Constructor Detail |
---|
public TableInputFormat()
Method Detail |
---|
public static void setInputPaths(org.apache.hadoop.mapred.JobConf conf, org.apache.hadoop.fs.Path... paths)
conf
- JobConf object.paths
- one or more paths to BasicTables. The InputFormat class will
produce splits on the "union" of these BasicTables.public static Schema getSchema(org.apache.hadoop.mapred.JobConf conf) throws IOException
conf
- JobConf object.
IOException
public static void setProjection(org.apache.hadoop.mapred.JobConf conf, String projection) throws ParseException
setProjection(JobConf, ZebraProjection)
instead.
conf
- JobConf object.projection
- A common separated list of column names. If we want select all
columns, pass projection==null. The syntax of the projection
conforms to the Schema
string.
ParseException
public static void setProjection(org.apache.hadoop.mapred.JobConf conf, ZebraProjection projection) throws ParseException
conf
- JobConf object.projection
- A common separated list of column names. If we want select all
columns, pass projection==null. The syntax of the projection
conforms to the Schema
string.
ParseException
public static String getProjection(org.apache.hadoop.mapred.JobConf conf) throws IOException, ParseException
conf
- The JobConf object
IOException
ParseException
public static SortInfo getSortInfo(org.apache.hadoop.mapred.JobConf conf) throws IOException
conf
- JobConf object
IOException
public static void requireSortedTable(org.apache.hadoop.mapred.JobConf conf, ZebraSortInfo sortInfo) throws IOException
conf
- JobConf object.sortInfo
- ZebraSortInfo object containing sorting information.
IOException
public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.BytesWritable,Tuple> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf conf, org.apache.hadoop.mapred.Reporter reporter) throws IOException
getRecordReader
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
IOException
InputFormat.getRecordReader(InputSplit, JobConf, Reporter)
public static TableRecordReader getTableRecordReader(org.apache.hadoop.mapred.JobConf conf, String projection) throws IOException, ParseException
conf
- JobConf object.projection
- comma-separated column names in projection. null means all columns in projection
IOException
ParseException
public static void setMinSplitSize(org.apache.hadoop.mapred.JobConf conf, long minSize)
conf
- The job conf object.minSize
- Minimum size.public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf conf, int numSplits) throws IOException
getSplits
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
IOException
InputFormat.getSplits(JobConf, int)
@Deprecated public void validateInput(org.apache.hadoop.mapred.JobConf conf) throws IOException
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |