JsonMetadata (Pig 0.9.3-SNAPSHOT API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.pig.piggybank.storage
Class JsonMetadata

java.lang.Object
  org.apache.pig.piggybank.storage.JsonMetadata

All Implemented Interfaces:: LoadMetadata, StoreMetadata

public class JsonMetadata
extends Object
implements LoadMetadata, StoreMetadata
extends Object
implements LoadMetadata, StoreMetadata

Reads and Writes metadata using JSON in metafiles next to the data.

Constructor Summary
`JsonMetadata()`

Method Summary
`protected Set<ElementDescriptor>`	`findMetaFile(String path, String prefix, org.apache.hadoop.conf.Configuration conf)` .
`String[]`	`getPartitionKeys(String location, org.apache.hadoop.mapreduce.Job job)` Find what columns are partition keys for this input.
`ResourceSchema`	`getSchema(String location, org.apache.hadoop.mapreduce.Job job)` For JsonMetadata schema is considered optional This method suppresses (and logs) errors if they are encountered.
`ResourceStatistics`	`getStatistics(String location, org.apache.hadoop.mapreduce.Job job)` For JsonMetadata stats are considered optional This method suppresses (and logs) errors if they are encountered.
`void`	`setFieldDel(byte fieldDel)`
`void`	`setPartitionFilter(Expression partitionFilter)` Set the filter for partitioning.
`void`	`setRecordDel(byte recordDel)`
`void`	`storeSchema(ResourceSchema schema, String location, org.apache.hadoop.mapreduce.Job job)` Store schema of the data being written
`void`	`storeStatistics(ResourceStatistics stats, String location, org.apache.hadoop.mapreduce.Job job)` Store statistics about the data being written.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

JsonMetadata

public JsonMetadata()

Method Detail

findMetaFile

protected Set<ElementDescriptor> findMetaFile(String path,
                                              String prefix,
                                              org.apache.hadoop.conf.Configuration conf)
                                       throws IOException

. Given a path, which may represent a glob pattern, a directory, or a file, this method finds the set of relevant metadata files on the storage system. The algorithm for finding the metadata file is as follows:

For each file represented by the path (either directly, or via a glob): If parentPath/prefix.fileName exists, use that as the metadata file. Else if parentPath/prefix exists, use that as the metadata file.

Resolving conflicts, merging the metadata, etc, is not handled by this method and should be taken care of by downstream code. This can go into a util package if metadata files are considered a general enough pattern

Parameters:: path - Path, as passed in to a LoadFunc (may be a Hadoop glob); prefix - Metadata file designation, such as .pig_schema or .pig_stats; conf - configuration object
Returns:: Set of element descriptors for all metadata files associated with the files on the path.
Throws:: IOException

getPartitionKeys

public String[] getPartitionKeys(String location,
                                 org.apache.hadoop.mapreduce.Job job)

Description copied from interface: LoadMetadata

Find what columns are partition keys for this input.

Specified by:: getPartitionKeys in interface LoadMetadata

Parameters:: location - Location as returned by LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path); job - The Job object - this should be used only to obtain cluster properties through JobContext.getConfiguration() and not to set/query any runtime job information.
Returns:: array of field names of the partition keys. Implementations should return null to indicate that there are no partition keys

setPartitionFilter

public void setPartitionFilter(Expression partitionFilter)
                        throws IOException

Description copied from interface: LoadMetadata

Set the filter for partitioning. It is assumed that this filter will only contain references to fields given as partition keys in getPartitionKeys. So if the implementation returns null in LoadMetadata.getPartitionKeys(String, Job), then this method is not called by Pig runtime. This method is also not called by the Pig runtime if there are no partition filter conditions.

Specified by:: setPartitionFilter in interface LoadMetadata

Parameters:: partitionFilter - that describes filter for partitioning
Throws:: IOException - if the filter is not compatible with the storage mechanism or contains non-partition fields.

getSchema

public ResourceSchema getSchema(String location,
                                org.apache.hadoop.mapreduce.Job job)
                         throws IOException

For JsonMetadata schema is considered optional This method suppresses (and logs) errors if they are encountered.

Specified by:: getSchema in interface LoadMetadata

Parameters:: location - Location as returned by LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path); job - The Job object - this should be used only to obtain cluster properties through JobContext.getConfiguration() and not to set/query any runtime job information.
Returns:: schema for the data to be loaded. This schema should represent all tuples of the returned data. If the schema is unknown or it is not possible to return a schema that represents all returned data, then null should be returned. The schema should not be affected by pushProjection, ie. getSchema should always return the original schema even after pushProjection
Throws:: IOException - if an exception occurs while determining the schema

getStatistics

public ResourceStatistics getStatistics(String location,
                                        org.apache.hadoop.mapreduce.Job job)
                                 throws IOException

For JsonMetadata stats are considered optional This method suppresses (and logs) errors if they are encountered.

Specified by:: getStatistics in interface LoadMetadata

Parameters:: location - Location as returned by LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path); job - The Job object - this should be used only to obtain cluster properties through JobContext.getConfiguration() and not to set/query any runtime job information.
Returns:: statistics about the data to be loaded. If no statistics are available, then null should be returned.
Throws:: IOException - if an exception occurs while retrieving statistics
See Also:: org.apache.pig.LoadMetadata#getStatistics(String, Configuration)

storeStatistics

public void storeStatistics(ResourceStatistics stats,
                            String location,
                            org.apache.hadoop.mapreduce.Job job)
                     throws IOException

Description copied from interface: StoreMetadata

Store statistics about the data being written.

Specified by:: storeStatistics in interface StoreMetadata

Parameters:: stats - statistics to be recorded; location - Location as returned by LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path); job - The Job object - this should be used only to obtain cluster properties through JobContext.getConfiguration() and not to set/query any runtime job information.
Throws:: IOException

storeSchema

public void storeSchema(ResourceSchema schema,
                        String location,
                        org.apache.hadoop.mapreduce.Job job)
                 throws IOException