Create a Parser for Your New Data Source by Using the CLI
As an alternative to using the HCP Management module to parse your new data source, you can use the CLI.
- Determine the format of the new data source’s log entries, so you can parse them: - Use ssh to access the host for the new data source. 
- Look at the different log files and determine which to parse: - sudo su - cd /var/log/$NEW_DATASOURCE ls - The file you want is typically the - access.log, but your data source might use a different name.
- Generate entries for the log that needs to be parsed so that you can see the format of the entries: - timestamp | time elapsed | remotehost | code/status | bytes | method | URL rfc931 peerstatus/peerhost | type 
 
- Create a Kafka topic for the new data source: - Log in to $KAFKA_HOST as root. 
- Create a Kafka topic with the same name as the new data source: - /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER_HOST:2181 --create --topic $NEW_DATASOURCE --partitions 1 --replication-factor 1 
- Verify your new topic by listing the Kafka topics: - /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER_HOST:2181 --list 
 
- Create a Grok statement file that defines the Grok expression for the log type you identified in Step 1. ![[Important]](../common/images/admon/important.png) - Important - You must include - timestampto ensure that the system uses the event time rather than the system time.- Refer to the Grok documentation for additional details. 
- Save the Grok pattern and load it into Hadoop Distributed File System (HDFS) in a named location: - Create a local file for the new data source: - touch /tmp/$DATASOURCE 
- Open $DATASOURCE and add the Grok pattern defined in Step 3b: - vi /tmp/$DATASOURCE 
- Put the $DATASOURCE file into the HDFS directory where Metron stores its Grok parsers. - Existing Grok parsers that ship with HCP are staged under - /apps/metron/patterns:- su - hdfs hadoop fs -rmr /apps/metron/patterns/$DATASOURCE hdfs dfs -put /tmp/$DATASOURCE /apps/metron/patterns/ 
 
- Define a parser configuration for the Metron Parsing Topology. - As root, log into the host with HCP installed. - ssh $HCP_HOST 
- Create a $DATASOURCE parser configuration file at - $METRON_HOME/config/zookeeper/parsers/$DATASOURCE.json:- { "parserClassName": "org.apache.metron.parsers.GrokParser", "sensorTopic": "$DATASOURCE", "readMetadata" : true "mergeMetadata" : true "metron.metadata.topic : topic" "metron.metadata.customer_id : "my_customer_id" "filterClassName" : "STELLAR" ,"parserConfig" : { "filter.query" : "exists(field1)" "parserConfig": { "grokPath": "/apps/metron/patterns/$DATASOURCE", "patternLabel": "$DATASOURCE_DELIMITED", "timestampField": "timestamp" }, "fieldTransformations" : [ { "transformation" : "STELLAR" ,"output" : [ "full_hostname", "domain_without_subdomains" ] ,"config" : { "full_hostname" : "URL_TO_HOST(url)" ,"domain_without_subdomains" : "DOMAIN_REMOVE_SUBDOMAINS(full_hostname)" } } ] }- parserClassName
- The name of the parser's class in the .jar file. 
- filterClassName
- The filter to use. - This can be the fully qualified name of a class that implements the - org.apache.metron.parsers.interfaces.MessageFilter<JSONObject>interface. Message filters enable you to ignore a set of messages by using custom logic. The existing implementation is:- STELLAR: Enables you to apply a Stellar statement that returns a Boolean, which passes every message for which the statement returns- true. The Stellar statement is specified by the- filter.queryproperty in the- parserConfig. For example, the following Stellar filter includes messages that contain a- field1field:- { "filterClassName" : "STELLAR" ,"parserConfig" : { "filter.query" : "exists(field1)" } }
 
 
- sensorTopic
- The Kafka topic on which the telemetry is being streamed. 
- readMetadata
- A Boolean indicating whether to read metadata and make it available to field transformations ( - falseby default).- There are two types of metadata supported in HCP: - Environmental metadata about the whole system - For example, if you have multiple Kafka topics being processed by one parser, you might want to tag the messages with the Kafka topic. 
- Custom metadata from an individual telemetry source that you might want to use within Metron 
 
- mergeMetadata
- A Boolean indicating whether to merge metadata with the message ( - falseby default).- If this property is set to - true, then every metadata field becomes part of the messages and, consequently, is also available for field transformations.
- parserConfig
- The configuration file. 
- grokPath
- The path for the Grok statement. 
- patternLabel
- The top-level pattern of the Grok file. 
- fieldTransformations
- An array of complex objects representing the transformations to be performed on the message generated from the parser before writing to the Kafka topic. - In this example, the Grok parser is designed to extract the URL, but the only information that you need is the domain (or even the domain without subdomains). To obtain this, you can use the Stellar Field Transformation (under the fieldTransformations element). The Stellar Field Transformation enables you to use the Stellar DSL (Domain Specific Language) to define extra transformations to be performed on the messages flowing through the topology. For more information about using the fieldTransformations element in the parser configuration, see Understanding Parsers. 
- spoutParallelism
- The Kafka spout parallelism (default to - 1). You can override the default on the command line.
- spoutNumTasks
- The number of tasks for the spout (default to - 1). You can override the default on the command line.
- parserParallelism
- The parser bolt parallelism (default to - 1). You can override the default on the command line.
- parserNumTasks
- The number of tasks for the parser bolt (default to - 1). You can override the default on the command line.
- errorWriterParallelism
- The error writer bolt parallelism (default to - 1). You can override the default on the command line.
- errorWriterNumTasks
- The number of tasks for the error writer bolt (default to - 1). You can override the default on the command line.
- numWorkers
- The number of workers to use in the topology (default is the storm default of - 1).
- numAckers
- The number of acker executors to use in the topology (default is the Storm default of - 1).
- spoutConfig
- A map representing a custom spout configuration (this is a map). You can override the default on the command line. 
- securityProtocol
- The security protocol to use for reading from Kafka (this is a string). You can be override this on the command line and also specify a value in the spout configuration via the - security.protocolkey. If both are specified, then they are merged and the CLI will take precedence.
- stormConfig
- The storm configuration to use (this is a map). You can override this on the command line. If both are specified, they are merged with CLI properties taking precedence. 
 
- Use the following script to upload configurations to Apache ZooKeeper: - $METRON_HOME/bin/zk_load_configs.sh --mode PUSH -i $METRON_HOME/config/zookeeper -z $ZOOKEEPER_HOST:2181 - You can safely ignore any resulting warning messages. 
 
- Deploy the new parser topology to the cluster: - Log in to the host that has Metron installed as root user. 
- Deploy the new parser topology: - $METRON_HOME/bin/start_parser_topology.sh -k $KAFKA_HOST:6667 -z $ZOOKEEPER_HOST:2181 -s $DATASOURCE 
- Use the Apache Storm UI to verify that the new topology is listed and that it has no errors. 
 
This new data source processor topology ingests from the $DATASOURCE Kafka topic that you created earlier and then parses the event with the HCP Grok framework using the Grok pattern defined earlier.

