TableInputFormat (Pig 0.9.3-SNAPSHOT API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.zebra.mapred
Class TableInputFormat

java.lang.Object
  org.apache.hadoop.zebra.mapred.TableInputFormat

All Implemented Interfaces:: org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.BytesWritable,Tuple>

Deprecated.

@Deprecated public class TableInputFormat
extends Object
implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
extends Object
implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.BytesWritable,Tuple>

InputFormat class for reading one or more BasicTables. Usage Example:

In the main program, add the following code.

 jobConf.setInputFormat(TableInputFormat.class);
 TableInputFormat.setInputPaths(jobConf, new Path("path/to/table1", new Path("path/to/table2");
 TableInputFormat.setProjection(jobConf, "Name, Salary, BonusPct");

The above code does the following things:

Set the input format class to TableInputFormat.
Set the paths to the BasicTables to be consumed by user's Mapper code.
Set the projection on the input tables. In this case, the Mapper code is only interested in three fields: "Name", "Salary", "BonusPct". "Salary" (perhaps for the purpose of calculating the person's total payout). If no project is specified, then all columns from the input tables will be retrieved. If input tables have different schemas, then the input contains the union of all columns from all the input tables. Absent fields will be left as nul in the input tuple.

The user Mapper code should look like the following:

 static class MyMapClass implements Mapper<BytesWritable, Tuple, K, V> {
   // keep the tuple object for reuse.
   // indices of various fields in the input Tuple.
   int idxName, idxSalary, idxBonusPct;
 
   @Override
   public void configure(JobConf job) {
     Schema projection = TableInputFormat.getProjection(job);
     // determine the field indices.
     idxName = projection.getColumnIndex("Name");
     idxSalary = projection.getColumnIndex("Salary");
     idxBonusPct = projection.getColumnIndex("BonusPct");
   }
 
   @Override
   public void map(BytesWritable key, Tuple value, OutputCollector<K, V> output,
       Reporter reporter) throws IOException {
     try {
       String name = (String) value.get(idxName);
       int salary = (Integer) value.get(idxSalary);
       double bonusPct = (Double) value.get(idxBonusPct);
       // do something with the input data
     } catch (ExecException e) {
       e.printStackTrace();
     }
   }
 
   @Override
   public void close() throws IOException {
     // no-op
   }
 }

A little bit more explanation on the PIG Tuple objects. A Tuple is an ordered list of PIG datum objects. The permitted PIG datum types can be categorized as Scalar types and Composite types.

Supported Scalar types include seven native Java types: Boolean, Byte, Integer, Long, Float, Double, String, as well as one PIG class called DataByteArray that represents type-less byte array.

Supported Composite types include:

Map : It is the same as Java Map class, with the additional restriction that the key-type must be one of the scalar types PIG recognizes, and the value-type any of the scaler or composite types PIG understands.
DataBag : A DataBag is a collection of Tuples.
Tuple : Yes, Tuple itself can be a datum in another Tuple.

Field Summary
`static String`	`INPUT_DELETED_CGS` Deprecated.
`static String`	`INPUT_EXPR` Deprecated.
`static String`	`INPUT_FE` Deprecated.
`static String`	`INPUT_PROJ` Deprecated.
`static String`	`INPUT_SORT` Deprecated.

Constructor Summary
`TableInputFormat()` Deprecated.

Method Summary
`static String`	`getProjection(org.apache.hadoop.mapred.JobConf conf)` Deprecated. Get the projection from the JobConf
`org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.BytesWritable,Tuple>`	`getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf conf, org.apache.hadoop.mapred.Reporter reporter)` Deprecated.
`static Schema`	`getSchema(org.apache.hadoop.mapred.JobConf conf)` Deprecated. Get the schema of a table expr
`static SortInfo`	`getSortInfo(org.apache.hadoop.mapred.JobConf conf)` Deprecated. Get the SortInfo object regarding a Zebra table
`org.apache.hadoop.mapred.InputSplit[]`	`getSplits(org.apache.hadoop.mapred.JobConf conf, int numSplits)` Deprecated.
`static TableRecordReader`	`getTableRecordReader(org.apache.hadoop.mapred.JobConf conf, String projection)` Deprecated. Get a TableRecordReader on a single split
`static void`	`requireSortedTable(org.apache.hadoop.mapred.JobConf conf, ZebraSortInfo sortInfo)` Deprecated. Requires sorted table or table union
`static void`	`setInputPaths(org.apache.hadoop.mapred.JobConf conf, org.apache.hadoop.fs.Path... paths)` Deprecated. Set the paths to the input table.
`static void`	`setMinSplitSize(org.apache.hadoop.mapred.JobConf conf, long minSize)` Deprecated. Set the minimum split size.
`static void`	`setProjection(org.apache.hadoop.mapred.JobConf conf, String projection)` Deprecated. Use `setProjection(JobConf, ZebraProjection)` instead.
`static void`	`setProjection(org.apache.hadoop.mapred.JobConf conf, ZebraProjection projection)` Deprecated. Set the input projection in the JobConf object.
`void`	`validateInput(org.apache.hadoop.mapred.JobConf conf)` Deprecated.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

INPUT_EXPR

public static final String INPUT_EXPR

Deprecated.

See Also:: Constant Field Values

INPUT_PROJ

public static final String INPUT_PROJ

Deprecated.

See Also:: Constant Field Values

INPUT_SORT

public static final String INPUT_SORT

Deprecated.

See Also:: Constant Field Values

INPUT_FE

public static final String INPUT_FE

Deprecated.

See Also:: Constant Field Values

INPUT_DELETED_CGS

public static final String INPUT_DELETED_CGS

Deprecated.

See Also:: Constant Field Values

Constructor Detail