|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
See:
Description
contrib: Zebra | |
---|---|
org.apache.hadoop.zebra | Hadoop Table - tabular data storage for Hadoop MapReduce and PIG. |
org.apache.hadoop.zebra.io | Physical I/O management of Hadoop Zebra Tables. |
org.apache.hadoop.zebra.mapred | Providing InputFormat and
OutputFormat adaptor classes for Hadoop
Zebra Table. |
org.apache.hadoop.zebra.mapreduce | Providing InputFormat and
OutputFormat adaptor classes for Hadoop
Zebra Table. |
org.apache.hadoop.zebra.pig | Implementation of PIG Storer/Loader Interfaces |
org.apache.hadoop.zebra.pig.comparator | Utilities to allow PIG Storer to generate keys for sorted Zebra tables |
org.apache.hadoop.zebra.schema | Zebra Schema |
org.apache.hadoop.zebra.tfile | |
org.apache.hadoop.zebra.types | Data types being shared between the io and mapred packages. |
Pig is a platform for a data flow programming on large data sets in a parallel environment. It consists of a language to specify these programs, Pig Latin, a compiler for this language, and an execution engine to execute the programs.
Pig runs on hadoop MapReduce, reading data from and writing data to HDFS, and doing processing via one or more MapReduce jobs.
Pig's design is guided by our pig philosophy.
Pig shares many similarities with a traditional RDBMS design. It has a parser, type checker, optimizer, and operators that perform the data processing. However, there are some significant differences. Pig does not have a data catalog, there are no transactions, pig does not directly manage data storage, nor does it implement the execution framework.
PhysicalPlan
.
This Physical Plan contains the operators that will be applied to the data. This is then
divided into a set of MapReduce jobs by the
MRCompiler
into an
MROperPlan
. This
MROperPlan (aka the map reduce plan) is then optimized (for example, the combiner is used where
possible, jobs that scan the same input data are combined where possible, etc.). Finally a set of
MapReduce jobs are generated by the
JobControlCompiler
. These are
submitted to Hadoop and monitored by the
MapReduceLauncher
.
On the backend, each
PigGenericMapReduce.Map
,
PigCombiner.Combine
, and
PigGenericMapReduce.Reduce
use the pipeline of physical operators constructed in the front end to load, process, and store
data.
In addition to the command line and grunt interfaces, users can connect to
PigServer
from a Java program.
Pig makes it easy for users to extend its functionality by implementing User Defined Functions
(UDFs). There are interfaces for defining functions to load data
LoadFunc
, storing data StoreFunc
, doing evaluations
on fields (including collections of data, so user defined aggregates are possible)
EvalFunc
and filtering data FilterFunc
.
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |