org.apache.hadoop.hive.serde2
Class RegexSerDe

java.lang.Object
  extended by org.apache.hadoop.hive.serde2.AbstractSerDe
      extended by org.apache.hadoop.hive.serde2.RegexSerDe
All Implemented Interfaces:
Deserializer, SerDe, Serializer

public class RegexSerDe
extends AbstractSerDe

RegexSerDe uses regular expression (regex) to deserialize data. It doesn't support data serialization. It can deserialize the data using regex and extracts groups as columns. In deserialization stage, if a row does not match the regex, then all columns in the row will be NULL. If a row matches the regex but has less than expected groups, the missing groups will be NULL. If a row matches the regex but has more than expected groups, the additional groups are just ignored. NOTE: Regex SerDe supports primitive column types such as TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, BOOLEAN and DECIMAL NOTE: This implementation uses javaStringObjectInspector for STRING. A more efficient implementation should use UTF-8 encoded Text and writableStringObjectInspector. We should switch to that when we have a UTF-8 based Regex library.


Field Summary
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
RegexSerDe()
           
 
Method Summary
 Object deserialize(org.apache.hadoop.io.Writable blob)
          Deserialize an object out of a Writable blob.
 ObjectInspector getObjectInspector()
          Get the object inspector that can be used to navigate through the internal structure of the Object returned from deserialize(...).
 SerDeStats getSerDeStats()
          Returns statistics collected when serializing
 Class<? extends org.apache.hadoop.io.Writable> getSerializedClass()
          Returns the Writable class that would be returned by the serialize method.
 void initialize(org.apache.hadoop.conf.Configuration conf, Properties tbl)
          Initialize the HiveSerializer.
 org.apache.hadoop.io.Writable serialize(Object obj, ObjectInspector objInspector)
          Serialize an object by navigating inside the Object with the ObjectInspector.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

RegexSerDe

public RegexSerDe()
Method Detail

initialize

public void initialize(org.apache.hadoop.conf.Configuration conf,
                       Properties tbl)
                throws SerDeException
Description copied from class: AbstractSerDe
Initialize the HiveSerializer.

Specified by:
initialize in interface Deserializer
Specified by:
initialize in interface Serializer
Specified by:
initialize in class AbstractSerDe
Parameters:
conf - System properties
tbl - table properties
Throws:
SerDeException

getObjectInspector

public ObjectInspector getObjectInspector()
                                   throws SerDeException
Description copied from class: AbstractSerDe
Get the object inspector that can be used to navigate through the internal structure of the Object returned from deserialize(...).

Specified by:
getObjectInspector in interface Deserializer
Specified by:
getObjectInspector in class AbstractSerDe
Throws:
SerDeException

getSerializedClass

public Class<? extends org.apache.hadoop.io.Writable> getSerializedClass()
Description copied from class: AbstractSerDe
Returns the Writable class that would be returned by the serialize method. This is used to initialize SequenceFile header.

Specified by:
getSerializedClass in interface Serializer
Specified by:
getSerializedClass in class AbstractSerDe

deserialize

public Object deserialize(org.apache.hadoop.io.Writable blob)
                   throws SerDeException
Description copied from class: AbstractSerDe
Deserialize an object out of a Writable blob. In most cases, the return value of this function will be constant since the function will reuse the returned object. If the client wants to keep a copy of the object, the client needs to clone the returned value by calling ObjectInspectorUtils.getStandardObject().

Specified by:
deserialize in interface Deserializer
Specified by:
deserialize in class AbstractSerDe
Parameters:
blob - The Writable object containing a serialized object
Returns:
A Java object representing the contents in the blob.
Throws:
SerDeException

serialize

public org.apache.hadoop.io.Writable serialize(Object obj,
                                               ObjectInspector objInspector)
                                        throws SerDeException
Description copied from class: AbstractSerDe
Serialize an object by navigating inside the Object with the ObjectInspector. In most cases, the return value of this function will be constant since the function will reuse the Writable object. If the client wants to keep a copy of the Writable, the client needs to clone the returned value.

Specified by:
serialize in interface Serializer
Specified by:
serialize in class AbstractSerDe
Throws:
SerDeException

getSerDeStats

public SerDeStats getSerDeStats()
Description copied from class: AbstractSerDe
Returns statistics collected when serializing

Specified by:
getSerDeStats in interface Deserializer
Specified by:
getSerDeStats in interface Serializer
Specified by:
getSerDeStats in class AbstractSerDe


Copyright © 2014 The Apache Software Foundation. All rights reserved.