|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.piggybank.storage.JsonMetadata
public class JsonMetadata
Reads and Writes metadata using JSON in metafiles next to the data.
Constructor Summary | |
---|---|
JsonMetadata()
|
Method Summary | |
---|---|
protected Set<ElementDescriptor> |
findMetaFile(String path,
String prefix,
org.apache.hadoop.conf.Configuration conf)
. |
String[] |
getPartitionKeys(String location,
org.apache.hadoop.mapreduce.Job job)
Find what columns are partition keys for this input. |
ResourceSchema |
getSchema(String location,
org.apache.hadoop.mapreduce.Job job)
For JsonMetadata schema is considered optional This method suppresses (and logs) errors if they are encountered. |
ResourceStatistics |
getStatistics(String location,
org.apache.hadoop.mapreduce.Job job)
For JsonMetadata stats are considered optional This method suppresses (and logs) errors if they are encountered. |
void |
setFieldDel(byte fieldDel)
|
void |
setPartitionFilter(Expression partitionFilter)
Set the filter for partitioning. |
void |
setRecordDel(byte recordDel)
|
void |
storeSchema(ResourceSchema schema,
String location,
org.apache.hadoop.mapreduce.Job job)
Store schema of the data being written |
void |
storeStatistics(ResourceStatistics stats,
String location,
org.apache.hadoop.mapreduce.Job job)
Store statistics about the data being written. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public JsonMetadata()
Method Detail |
---|
protected Set<ElementDescriptor> findMetaFile(String path, String prefix, org.apache.hadoop.conf.Configuration conf) throws IOException
For each file represented by the path (either directly, or via a glob): If parentPath/prefix.fileName exists, use that as the metadata file. Else if parentPath/prefix exists, use that as the metadata file.
Resolving conflicts, merging the metadata, etc, is not handled by this method and should be taken care of by downstream code. This can go into a util package if metadata files are considered a general enough pattern
path
- Path, as passed in to a LoadFunc (may be a Hadoop glob)prefix
- Metadata file designation, such as .pig_schema or .pig_statsconf
- configuration object
IOException
public String[] getPartitionKeys(String location, org.apache.hadoop.mapreduce.Job job)
LoadMetadata
getPartitionKeys
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
public void setPartitionFilter(Expression partitionFilter) throws IOException
LoadMetadata
LoadMetadata.getPartitionKeys(String, Job)
, then this method is not
called by Pig runtime. This method is also not called by the Pig runtime
if there are no partition filter conditions.
setPartitionFilter
in interface LoadMetadata
partitionFilter
- that describes filter for partitioning
IOException
- if the filter is not compatible with the storage
mechanism or contains non-partition fields.public ResourceSchema getSchema(String location, org.apache.hadoop.mapreduce.Job job) throws IOException
getSchema
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
IOException
- if an exception occurs while determining the schemapublic ResourceStatistics getStatistics(String location, org.apache.hadoop.mapreduce.Job job) throws IOException
getStatistics
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
IOException
- if an exception occurs while retrieving statisticsorg.apache.pig.LoadMetadata#getStatistics(String, Configuration)
public void storeStatistics(ResourceStatistics stats, String location, org.apache.hadoop.mapreduce.Job job) throws IOException
StoreMetadata
storeStatistics
in interface StoreMetadata
stats
- statistics to be recordedlocation
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
IOException
public void storeSchema(ResourceSchema schema, String location, org.apache.hadoop.mapreduce.Job job) throws IOException
StoreMetadata
storeSchema
in interface StoreMetadata
schema
- Schema to be recordedlocation
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
IOException
public void setFieldDel(byte fieldDel)
public void setRecordDel(byte recordDel)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |