General Purpose Parsers
The general-purpose parser is primarily designed for lower-velocity topologies or for quickly setting up a temporary parser for a new telemetry.
General purpose parsers are defined using a config file, and you need not recompile the topology to change them. HCP supports two general purpose parsers: Grok and CSV.
Grok parser
The Grok parser class name (parserClassName) is
org.apache.metron,parsers.GrokParser.
Grok has the following entries and predefined patterns for
parserConfig:
-
grokPath -
The patch in HDFS (or in the Jar) to the Grok statement
-
patternLabel -
The pattern label to use from the Grok statement
-
timestampField -
The field to use for timestamp
-
timeFields -
A list of fields to be treated as time
-
dateFormat -
The date format to use to parse the time fields
-
timezone -
The timezone to use.
UTCis the default.
CSV Parser
The CSV parser class name (parserClassName) is
org.apache.metron.parsers.csv.CSVParser
CSV has the following entries and predefined patterns for
parserConfig:
-
timestampFormat -
The date format of the timestamp to use. If unspecified, the parser assumes the timestamp is starts at UNIX epoch.
-
columns -
A map of column names you wish to extract from the CSV to their offsets. For example,
{ 'name' : 1,'profession' : 3}would be a column map for extracting the 2nd and 4th columns from a CSV. -
separator -
The column separator. The default value is ",".
JSON Map Parser
The JSON parser class name (parserClassName) is
org.apache.metron.parsers.csv.JSONMapParser
JSON has the following entries and predefined patterns for
parserConfig:
- mapStrategy
-
A strategy to indicate how to handle multi-dimensional Maps. This is one of:
-
DROP -
Drop fields which contain maps
-
UNFOLD -
Unfold inner maps. So
{ "foo" : { "bar" : 1} }would turn into{"foo.bar" : 1} -
ALLOW -
Allow multidimensional maps
-
ERROR -
Throw an error when a multidimensional map is encountered
-
-
timestamp -
This field is expected to exist and, if it does not, then current time is inserted.
- jsonQuery
- If this JSON query string is present, the result of the query will be a list of messages. This is useful if you have a JSON document that contains a list or array of messages embedded in it, and you do not have another means of splitting the message.

