Indexing (HDFS) Tuning Example
There are 48 partitions set for the indexing partition, which carries through from the enrichment exercise above. The enrichment output matches the input for the indexing partition.
These are the batch size settings for the Bro index.
cat $METRON_HOME/config/zookeeper/indexing/bro.json
{
"hdfs" : {
"index": "bro",
"batchSize": 50,
"enabled" : true
}...
}
And here are the settings we used for the indexing topology:
General storm settings
topology.workers: 4 topology.acker.executors: 24 topology.max.spout.pending: 2000
Spout and Bolt Settings
hdfsSyncPolicy
org.apache.storm.hdfs.bolt.sync.CountSyncPolicy
constructor arg=100000
hdfsRotationPolicy
bolt.hdfs.rotation.policy.units=DAYS
bolt.hdfs.rotation.policy.count=1
kafkaSpout
parallelism: 24
session.timeout.ms=29999
enable.auto.commit=false
setPollTimeoutMs=200
setMaxUncommittedOffsets=10000000
setOffsetCommitPeriodMs=30000
hdfsIndexingBolt
parallelism: 24

