Configuring an Extractor Configuration File
You use the extractor configuration file to bulk load the enrichment store into HBase. Complete the following steps to configure the extractor configuration file:
On the host on which Metron is installed, log in as root.
Determine the schema of the enrichment source.
Create an extractor configuration file called
extractor_config_temp.jsonand populate it with the enrichment source schema:{ "config" : { "columns" : { "domain" : 0 ,"owner" : 1 ,"home_country" : 2 ,"registrar": 3 ,"domain_created_timestamp": 4 } ,"indicator_column" : "domain" ,"type" : "whois" ,"separator" : "," } ,"extractor" : "CSV" }Transform and filter the enrichment data as it is loaded into HBase by using Stellar extractor properties in the extractor configuration file.
HCP supports the following Stellar extractor properties:
value_transformTransforms fields defined in the
columnsmapping with Stellar transformations. New keys introduced in the transform are added to the key metadata:"value_transform" : { "domain" : "DOMAIN_REMOVE_TLD(domain)"value_filterAllows additional filtering with Stellar predicates based on results from the value transformations. In the following example, records whose domain property is empty after removing the TLD are omitted:
"value_filter" : "LENGTH(domain) > 0", "indicator_column" : "domain",
indicator_transformTransforms the
indicatorcolumn independent of the value transformations. You can refer to the original indicator value by usingindicatoras the variable name, as shown in the following example:"indicator_transform" : { "indicator" : "DOMAIN_REMOVE_TLD(indicator)"In addition, if you prefer to piggyback your transformations, you can refer to the variable
domain, which allows your indicator transforms to inherit transformations to this value.indicator_filterAllows additional filtering with Stellar predicates based on results from the value transformations. In the following example, records with empty indicator values after removing the TLD are omitted:
"indicator_filter" : "LENGTH(indicator) > 0", "type" : "top_domains",
Including all of the supported Stellar extractor properties in the extractor configuration file, looks similar to the following:
{ "config" : { "zk_quorum" : "$ZOOKEEPER_HOST:2181", "columns" : { "rank" : 0, "domain" : 1 }, "value_transform" : { "domain" : "DOMAIN_REMOVE_TLD(domain)" }, "value_filter" : "LENGTH(domain) > 0", "indicator_column" : "domain", "indicator_transform" : { "indicator" : "DOMAIN_REMOVE_TLD(indicator)" }, "indicator_filter" : "LENGTH(indicator) > 0", "type" : "top_domains", "separator" : "," }, "extractor" : "CSV" }If you run a file import with this data and extractor configuration, you get the following two extracted data records:
Indicator Type Value google top_domains { "rank" : "1", "domain" : "google" } yahoo top_domains { "rank" : "2", "domain" : "yahoo" } To access properties that reside in the global configuration file, provide a ZooKeeper quorum by using the
zk_quorumproperty.If the global configuration looks like
"global_property" : "metron-ftw", enter the following to expand thevalue_transform:"value_transform" : { "domain" : "DOMAIN_REMOVE_TLD(domain)", "a-new-prop" : "global_property" },The resulting value data looks like the following:
Indicator Type Value google top_domains { "rank" : "1", "domain" : "google", "a-new-prop" : "metron-ftw" } yahoo top_domains { "rank" : "2", "domain" : "yahoo", "a-new-prop" : "metron-ftw" } Remove any non-ASCII invisible characters included when you cut and pasted the value_transform information:
iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json
The extractor_config.json file is not stored by the loader. If you want to reuse it, you must store it yourself.

