Known Issues in Apache Spark
This topic describes known issues and workarounds for using Spark in this release of Cloudera Runtime.
- CDPD-22670 and CDPD-23103: There are two configurations in Spark, "Atlas dependency" and "spark_lineage_enabled", which are conflicted. The issue is when Atlas dependency is turned off but spark_lineage_enabled is turned on.
 - Run Spark application, Spark will log some error message and cannot continue. That can be restored by correcting the configurations and restarting Spark component with distributing client configurations.
 
- CDPD-217: HBase/Spark connectors are not supported
 - The Apache HBase Spark Connector
            (
hbase-connectors/spark) and the Apache Spark - Apache HBase Connector (shc) are not supported in the initial CDP release. 
- CDPD-3038: Launching 
pysparkdisplays several HiveConf warning messages - When 
pysparkstarts, several Hive configuration warning messages are displayed, similar to the following:19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.checked.expressions does not exist 19/08/09 11:48:04 WARN conf.HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist 
- CDPD-3293: Cannot create views (CREATE VIEW statement) from Spark
 - Apache Ranger in CDP disallows Spark users from running
            
CREATE VIEWstatements. 
Technical Service Bulletins
- TSB 2021-441: CDP Powered by Apache Spark may incorrectly read/write pre-Gregorian timestamps
 - Spark may incorrectly read or write TIMESTAMP data for values before the start of the
            Gregorian calendar ('1582-10-15 00:00:00.0'). This could happen when Spark is:
- Using dynamic partition inserts
 - Reading or writing from an ORC table when the:
spark.sql.hive.convertMetastoreOrc propertyis set tofalse. Its default value is true.spark.sql.hive.convertMetastoreOrcproperty is set totruebut thespark.sql.orc.impl propertyis set tohive. Its default is native.
 - Reading or writing from a Parquet table when the:
spark.sql.hive.convertMetastoreParquetproperty is set tofalse. Its default value is true.
 
 - Knowledge article
 - For the latest update on this issue see the corresponding Knowledge article: TSB 2021-441: Spark may incorrectly read/write pre-Gregorian timestamps
 
