|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Hadoop Table - tabular data storage for Hadoop MapReduce and PIG.
Hadoop Table provides tabular-type data storage for Hadoop MapReduce Framework. It is also planned to allow Table to be closely integrated with PIG.
For this release, the basic construct of HadoopTable is called
BasicTable
. A BasicTable is a create-once,
read-only kind of persisten data storage entity. A BasicTable contains zero
or more keyed rows.
The API uses Hadoop BytesWritable
objects to
represent row keys, and PIG Tuple
objects to
represent rows.
Each BasicTable maintains a Schema
,
which, for this release, is nothing but a collection of column names. Given a
schema, we can deduce the integer index of a particular column, and use it to
extract (get) the desired datum from PIG Tuple object (which only allows
index-based access).
Typically, applications use
BasicTableOutputFormat
(which implements
the Hadoop OutputFormat
interface) to create
BasicTables through MapReduce. And they use
TableInputFormat
(which implements the
Hadoop InputFormat
to feed the data as their
MapReduce input.
The API is structured in three packages:
org.apache.hadoop.zebra.mapreduce
: The MapReduce layer. It contains
two classes: BasicTableOutputFormat for creating BasicTable; and
TableInputFormat for readding table.
org.apache.hadoop.zebra.types
: Miscellaneous facilities that handle
column types and tuple serializations. Currently, it is a place holder that
redirects to PIG serialization. There is no type information being managed by
Table for individual columns.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |