|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
org.apache.hadoop.zebra.mapreduce.BasicTableOutputFormat
public class BasicTableOutputFormat
OutputFormat class for creating a
BasicTable.
Usage Example:
In the main program, add the following code.
job.setOutputFormatClass(BasicTableOutputFormat.class);
Path outPath = new Path("path/to/the/BasicTable");
BasicTableOutputFormat.setOutputPath(job, outPath);
BasicTableOutputFormat.setSchema(job, "Name, Age, Salary, BonusPct");
The above code does the following things:
String multiLocs = "commaSeparatedPaths"
job.setOutputFormatClass(BasicTableOutputFormat.class);
BasicTableOutputFormat.setMultipleOutputPaths(job, multiLocs);
job.setOutputFormat(BasicTableOutputFormat.class);
BasicTableOutputFormat.setSchema(job, "Name, Age, Salary, BonusPct");
BasicTableOutputFormat.setZebraOutputPartitionClass(
job, MultipleOutputsTest.OutputPartitionerClass.class);
The user ZebraOutputPartitionClass should like this
static class OutputPartitionerClass implements ZebraOutputPartition {
@Override
public int getOutputPartition(BytesWritable key, Tuple value) {
return someIndexInOutputParitionlist0;
}
The user Reducer code (or similarly Mapper code if it is a Map-only job)
should look like the following:
static class MyReduceClass implements Reducer<K, V, BytesWritable, Tuple> {
// keep the tuple object for reuse.
Tuple outRow;
// indices of various fields in the output Tuple.
int idxName, idxAge, idxSalary, idxBonusPct;
@Override
public void configure(Job job) {
Schema outSchema = BasicTableOutputFormat.getSchema(job);
// create a tuple that conforms to the output schema.
outRow = TypesUtils.createTuple(outSchema);
// determine the field indices.
idxName = outSchema.getColumnIndex("Name");
idxAge = outSchema.getColumnIndex("Age");
idxSalary = outSchema.getColumnIndex("Salary");
idxBonusPct = outSchema.getColumnIndex("BonusPct");
}
@Override
public void reduce(K key, Iterator<V> values,
OutputCollector<BytesWritable, Tuple> output, Reporter reporter)
throws IOException {
String name;
int age;
int salary;
double bonusPct;
// ... Determine the value of the individual fields of the row to be inserted.
try {
outTuple.set(idxName, name);
outTuple.set(idxAge, new Integer(age));
outTuple.set(idxSalary, new Integer(salary));
outTuple.set(idxBonusPct, new Double(bonusPct));
output.collect(new BytesWritable(name.getBytes()), outTuple);
}
catch (ExecException e) {
// should never happen
}
}
@Override
public void close() throws IOException {
// no-op
}
}
| Constructor Summary | |
|---|---|
BasicTableOutputFormat()
|
|
| Method Summary | |
|---|---|
void |
checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext jobContext)
Note: we perform the Initialization of the table here. |
static void |
close(org.apache.hadoop.mapreduce.JobContext jobContext)
Close the output BasicTable, No more rows can be added into the table. |
org.apache.hadoop.mapreduce.OutputCommitter |
getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext taContext)
|
static String |
getOutputPartitionClassArguments(org.apache.hadoop.conf.Configuration conf)
Get the output partition class arguments string from job configuration |
static org.apache.hadoop.fs.Path |
getOutputPath(org.apache.hadoop.mapreduce.JobContext jobContext)
Get the output path of the BasicTable from JobContext |
static org.apache.hadoop.fs.Path[] |
getOutputPaths(org.apache.hadoop.mapreduce.JobContext jobContext)
Get the multiple output paths of the BasicTable from JobContext |
org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.BytesWritable,Tuple> |
getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext taContext)
|
static Schema |
getSchema(org.apache.hadoop.mapreduce.JobContext jobContext)
Get the table schema in JobContext. |
static SortInfo |
getSortInfo(org.apache.hadoop.mapreduce.JobContext jobContext)
Get the SortInfo object |
static org.apache.hadoop.io.BytesWritable |
getSortKey(Object builder,
Tuple t)
Generates a BytesWritable key for the input key using keygenerate provided. |
static Object |
getSortKeyGenerator(org.apache.hadoop.mapreduce.JobContext jobContext)
Generates a zebra specific sort key generator which is used to generate BytesWritable key Sort Key(s) are used to generate this object |
static String |
getStorageHint(org.apache.hadoop.mapreduce.JobContext jobContext)
Get the table storage hint in JobContext. |
static Class<? extends ZebraOutputPartition> |
getZebraOutputPartitionClass(org.apache.hadoop.mapreduce.JobContext jobContext)
|
static void |
setMultipleOutputs(org.apache.hadoop.mapreduce.JobContext jobContext,
Class<? extends ZebraOutputPartition> theClass,
org.apache.hadoop.fs.Path... paths)
Set the multiple output paths of the BasicTable in JobContext |
static void |
setMultipleOutputs(org.apache.hadoop.mapreduce.JobContext jobContext,
Class<? extends ZebraOutputPartition> theClass,
String arguments,
org.apache.hadoop.fs.Path... paths)
Set the multiple output paths of the BasicTable in JobContext |
static void |
setMultipleOutputs(org.apache.hadoop.mapreduce.JobContext jobContext,
String commaSeparatedLocations,
Class<? extends ZebraOutputPartition> theClass)
Deprecated. Use #setMultipleOutputs(JobContext, class extends ZebraOutputPartition>, Path ...) instead. |
static void |
setOutputPath(org.apache.hadoop.mapreduce.JobContext jobContext,
org.apache.hadoop.fs.Path path)
Set the output path of the BasicTable in JobContext |
static void |
setSchema(org.apache.hadoop.mapreduce.JobContext jobContext,
String schema)
Deprecated. Use setStorageInfo(JobContext, ZebraSchema, ZebraStorageHint, ZebraSortInfo) instead. |
static void |
setSortInfo(org.apache.hadoop.mapreduce.JobContext jobContext,
String sortColumns)
Deprecated. Use setStorageInfo(JobContext, ZebraSchema, ZebraStorageHint, ZebraSortInfo) instead. |
static void |
setSortInfo(org.apache.hadoop.mapreduce.JobContext jobContext,
String sortColumns,
Class<? extends org.apache.hadoop.io.RawComparator<Object>> comparatorClass)
Deprecated. Use setStorageInfo(JobContext, ZebraSchema, ZebraStorageHint, ZebraSortInfo) instead. |
static void |
setStorageHint(org.apache.hadoop.mapreduce.JobContext jobContext,
String storehint)
Deprecated. Use setStorageInfo(JobContext, ZebraSchema, ZebraStorageHint, ZebraSortInfo) instead. |
static void |
setStorageInfo(org.apache.hadoop.mapreduce.JobContext jobContext,
ZebraSchema zSchema,
ZebraStorageHint zStorageHint,
ZebraSortInfo zSortInfo)
Set the table storage info including ZebraSchema, |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public BasicTableOutputFormat()
| Method Detail |
|---|
public static void setMultipleOutputs(org.apache.hadoop.mapreduce.JobContext jobContext,
String commaSeparatedLocations,
Class<? extends ZebraOutputPartition> theClass)
throws IOException
#setMultipleOutputs(JobContext, class extends ZebraOutputPartition>, Path ...) instead.
jobContext - The JobContext object.commaSeparatedLocations - The comma separated output paths to the tables.
The path must either not existent, or must be an empty directory.theClass - Zebra output partitioner class
IOException
public static void setMultipleOutputs(org.apache.hadoop.mapreduce.JobContext jobContext,
Class<? extends ZebraOutputPartition> theClass,
org.apache.hadoop.fs.Path... paths)
throws IOException
jobContext - The JobContext object.theClass - Zebra output partitioner classpaths - The list of paths
The path must either not existent, or must be an empty directory.
IOException
public static void setMultipleOutputs(org.apache.hadoop.mapreduce.JobContext jobContext,
Class<? extends ZebraOutputPartition> theClass,
String arguments,
org.apache.hadoop.fs.Path... paths)
throws IOException
jobContext - The JobContext object.theClass - Zebra output partitioner classarguments - Arguments string to partitioner classpaths - The list of paths
The path must either not existent, or must be an empty directory.
IOExceptionpublic static String getOutputPartitionClassArguments(org.apache.hadoop.conf.Configuration conf)
conf - The job configuration object.
public static org.apache.hadoop.fs.Path[] getOutputPaths(org.apache.hadoop.mapreduce.JobContext jobContext)
throws IOException
jobContext - The JobContext object.
IOException
public static Class<? extends ZebraOutputPartition> getZebraOutputPartitionClass(org.apache.hadoop.mapreduce.JobContext jobContext)
throws IOException
IOException
public static void setOutputPath(org.apache.hadoop.mapreduce.JobContext jobContext,
org.apache.hadoop.fs.Path path)
jobContext - The JobContext object.path - The output path to the table. The path must either not existent,
or must be an empty directory.public static org.apache.hadoop.fs.Path getOutputPath(org.apache.hadoop.mapreduce.JobContext jobContext)
jobContext - jobContext object
public static void setSchema(org.apache.hadoop.mapreduce.JobContext jobContext,
String schema)
setStorageInfo(JobContext, ZebraSchema, ZebraStorageHint, ZebraSortInfo) instead.
jobContext - The JobContext object.schema - The schema of the BasicTable to be created. For the initial
implementation, the schema string is simply a comma separated list
of column names, such as "Col1, Col2, Col3".
public static Schema getSchema(org.apache.hadoop.mapreduce.JobContext jobContext)
throws ParseException
jobContext - The JobContext object.
ParseException
public static Object getSortKeyGenerator(org.apache.hadoop.mapreduce.JobContext jobContext)
throws IOException,
ParseException
jobContext - The JobContext object.
IOException
ParseException
public static org.apache.hadoop.io.BytesWritable getSortKey(Object builder,
Tuple t)
throws Exception
builder - Opaque key generator created by getSortKeyGenerator() methodt - Tuple to create sort key from
Exception
public static void setStorageHint(org.apache.hadoop.mapreduce.JobContext jobContext,
String storehint)
throws ParseException,
IOException
setStorageInfo(JobContext, ZebraSchema, ZebraStorageHint, ZebraSortInfo) instead.
jobContext - The JobContext object.storehint - The storage hint of the BasicTable to be created. The format would
be like "[f1, f2.subfld]; [f3, f4]".
ParseException
IOExceptionpublic static String getStorageHint(org.apache.hadoop.mapreduce.JobContext jobContext)
jobContext - The JobContext object.
public static void setSortInfo(org.apache.hadoop.mapreduce.JobContext jobContext,
String sortColumns,
Class<? extends org.apache.hadoop.io.RawComparator<Object>> comparatorClass)
setStorageInfo(JobContext, ZebraSchema, ZebraStorageHint, ZebraSortInfo) instead.
jobContext - The JobContext object.sortColumns - Comma-separated sort column namescomparatorClass - comparator class name; null for default
public static void setSortInfo(org.apache.hadoop.mapreduce.JobContext jobContext,
String sortColumns)
setStorageInfo(JobContext, ZebraSchema, ZebraStorageHint, ZebraSortInfo) instead.
jobContext - The JobContext object.sortColumns - Comma-separated sort column names
public static void setStorageInfo(org.apache.hadoop.mapreduce.JobContext jobContext,
ZebraSchema zSchema,
ZebraStorageHint zStorageHint,
ZebraSortInfo zSortInfo)
throws ParseException,
IOException
jobcontext - The JobContext object.zSchema - The ZebraSchema object containing schema information.zStorageHint - The ZebraStorageHint object containing storage hint information.zSortInfo - The ZebraSortInfo object containing sorting information.
ParseException
IOException
public static SortInfo getSortInfo(org.apache.hadoop.mapreduce.JobContext jobContext)
throws IOException
jobContext - The JobContext object.
IOException
public void checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext jobContext)
throws IOException
BasicTableOutputFormat#getRecordWriter(FileSystem, JobContext, String, Progressable)
checkOutputSpecs in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.BytesWritable,Tuple>IOExceptionOutputFormat.checkOutputSpecs(JobContext)
public org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.BytesWritable,Tuple> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext taContext)
throws IOException
getRecordWriter in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.BytesWritable,Tuple>IOExceptionOutputFormat.getRecordWriter(TaskAttemptContext)
public static void close(org.apache.hadoop.mapreduce.JobContext jobContext)
throws IOException
jobContext - The JobContext object.
IOException
public org.apache.hadoop.mapreduce.OutputCommitter getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext taContext)
throws IOException,
InterruptedException
getOutputCommitter in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.BytesWritable,Tuple>IOException
InterruptedException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||