5. Configure Hive and HiveServer2 for Tez

 5.1. Hive-on-Tez Configuration Parameters

Apart from the configurations generally recommended for Hive and HiveServer2, for a multi-tenant use-case, only the following configurations need to be added to the hive-site.xml configuration file to configure Hive for use with Tez.

 

Table 10.2. Hive-Related Configuration Parameters

Configuration ParameterDescriptionDefault Value
hive.tez.container.sizeThe memory (in MB) to be used for Tez tasks. If this is not specified (-1), the memory settings from the MapReduce configurations (mapreduce.map.memory.mb)will be used by default for map tasks. -1(not specified) If this is not specified, the memory settings from the MapReduce configurations (mapreduce.map.memory.mb)will be used by default.
hive.tez.java.optsJava command line options for Tez. If this is not specified, the MapReduce java opts settings (mapreduce.map.java.opts) will be used by default for map tasks.If this is not specified, the MapReduce java opts settings (mapreduce.map.java.opts) will be used by default.
hive.server2.tez.default.queuesA comma-separated list of queues configured for the cluster.The default value is an empty string, which prevents execution of all queries. To enable query execution with Tez for HiveServer2, this parameter must configured.
hive.server2.tez.sessions.per.default.queueThe number of sessions for each queue named in the hive.server2.tez.default.queues.1 Larger clusters may improve performance of HiveServer2 by increasing this number.
hive.server2.tez.initialize.default.sessionsEnables a user to use HiveServer2 without enabling Tez for HiveServer2. Users may potentially may want to run queries with Tez without a pool of sessions.false
hive.server2.enable.doAsRequired when the queue-related configurations above are used.false

Examples of Hive-Related Configuration Properties:

  <property>
    <name>hive.tez.container.size</name>
    <value>-1</value>
    <description>Memory in mb to be used for Tez tasks. If this is not specified (-1) then the memory settings for map tasks will be used from mapreduce configuration</description>
  </property>
 
  <property>
    <name>hive.tez.java.opts</name>
    <value></value>
    <description>Java opts to be specified for Tez tasks. If this is not specified then java opts for map tasks will be used from mapreduce configuration</description>
  </property>
  
  <property>
    <name>hive.server2.tez.default.queues</name>
    <value>default</value>
  </property>
  
  <property>
    <name>hive.server2.tez.sessions.per.default.queue</name>
    <value>1</value>
  </property>
  
  <property>
    <name>hive.server2.tez.initialize.default.sessions</name>
    <value>false</value>
  </property>
  
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
  </property>

[Note]Note

Users running HiveServer2 in data analytic tools such as Tableau must reconnect to HiveServer2 after switching between the Tez and MapReduce execution engines.

[Tip]Tip

You can retrieve a list of queues by executing the following command: hadoop queue -list.

 5.2. Using Hive-on-Tez with Capacity Scheduler

You can use the tez.queue.name property to specify which queue will be used for Hive-on-Tez jobs. You can also set this property in the Hive shell, or in a Hive script. For more details, see Configuring Tez with the Capacity Scheduler.

 5.3. Example hive-site.xml File for Hive-on-Tez

The following is an example of a Hive-on-Tez hive-site.xml file that has been configured using MySQL as the Hive Metastore database.

In order to use this file, you would need to replace the hive.tez.container.size and hive.tez.java.opts place-holder values (indicated with a $ symbol) with actual values, as described previously, and also change the values of the javax.jdo.option.ConnectionURL, javax.jdo.option.ConnectionUserName, and javax.jdo.option.ConnectionPassword properties to reflect the Hive Metastore database installed on your cluster.

<configuration>
    <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/mysql?createDatabaseIfNotExist=true</value>
  </property>
    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>dbuser</value>
  </property>
  <property>       
   <name>javax.jdo.option.ConnectionPassword</name>       
   <value>dbuser</value>  
   <description>Enter your MySQL credentials. </description>
  </property>
  <property>
    <name>fs.file.impl.disable.cache</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.auto.convert.join</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.auto.convert.join.noconditionaltask</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.auto.convert.sortmerge.join.noconditionaltask</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.auto.convert.join.noconditionaltask.size</name>
    <value>1200000000</value>
  </property>
  <property>
    <name>hive.auto.convert.sortmerge.join</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.compactor.abortedtxn.threshold</name>
    <value>1000</value>
  </property>
  <property>
    <name>hive.compactor.check.interval</name>
    <value>300L</value>
  </property>
  <property>
    <name>hive.compactor.delta.num.threshold</name>
    <value>10</value>
  </property>
  <property>
    <name>hive.compactor.delta.pct.threshold</name>
    <value>0.1f</value>
  </property>
  <property>
    <name>hive.compactor.initiator.on</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.compactor.worker.threads</name>
    <value>0</value>
  </property>
  <property>
    <name>hive.compactor.worker.timeout</name>
    <value>86400L</value>
  </property>
  <property>
    <name>hive.compute.query.using.stats</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.enforce.bucketing</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.enforce.sorting</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.enforce.sortmergebucketmapjoin</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.exec.failure.hooks</name>
    <value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
  </property>
  <property>
    <name>hive.exec.post.hooks</name>
    <value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
  </property>
  <property>
    <name>hive.exec.pre.hooks</name>
    <value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
  </property>
  <property>
    <name>hive.execution.engine</name>
    <value>tez</value>
  </property>
  <property>
    <name>hive.limit.pushdown.memory.usage</name>
    <value>0.04</value>
  </property>
  <property>
    <name>hive.map.aggr</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.mapjoin.bucket.cache.size</name>
    <value>10000</value>
  </property>
  <property>
    <name>hive.mapred.reduce.tasks.speculative.execution</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.metastore.cache.pinobjtypes</name>
    <value>Table,Database,Type,FieldSchema,Order</value>
  </property>
  <property>
    <name>hive.metastore.client.socket.timeout</name>
    <value>60</value>
  </property>
  <property>
    <name>hive.metastore.execute.setugi</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/apps/hive/warehouse</value>
  </property>
  <property>
    <name>hive.optimize.bucketmapjoin.sortedmerge</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.optimize.bucketmapjoin</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.optimize.index.filter</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.optimize.mapjoin.mapreduce</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.optimize.reducededuplication.min.reducer</name>
    <value>4</value>
  </property>
  <property>
    <name>hive.optimize.reducededuplication</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.orc.splits.include.file.footer</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.security.authenticator.manager</name>
    <value>org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator</value>
  </property>
  <property>
    <name>hive.security.authorization.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.security.authorization.manager</name>
    <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value>
  </property>
  <property>
    <name>hive.security.metastore.authorization.manager</name>
    <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value>
  </property>
  <property>
    <name>hive.semantic.analyzer.factory.impl</name>
    <value>org.apache.hivealog.cli.HCatSemanticAnalyzerFactory</value>
  </property>
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.server2.tez.default.queues</name>
    <value>default</value>
  </property>
  <property>
    <name>hive.server2.tez.initialize.default.sessions</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.server2.tez.sessions.per.default.queue</name>
    <value>1</value>
  </property>
  <property>
    <name>hive.stats.autogather</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.stats.dbclass</name>
    <value>fs</value>
  </property>
  <property>
    <name>hive.tez.container.size</name>
    <value>${mapreduce.map.memory.mb}</value>
  </property>
  <property>
    <name>hive.tez.input.format</name>
    <value>org.apache.hadoop.hive.ql.io.HiveInputFormat</value>
  </property>
  <property>
    <name>hive.tez.java.opts</name>
    <value>${mapreduce.map.java.opts} -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseParallelGC</value>
  </property>
  <property>
    <name>hive.txn.manager</name>
    <value>org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager</value>
  </property>
  <property>
    <name>hive.txn.max.open.batch</name>
    <value>1000</value>
  </property>
  <property>
    <name>hive.txn.timeout</name>
    <value>300</value>
  </property>
  <property>
    <name>hive.vectorized.execution.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.vectorized.groupby.checkinterval</name>
    <value>4096</value>
  </property>
</configuration>


loading table of contents...