Replicating data to Impala clusters
Impala metadata is replicated as part of regular Hive/Impala replication operations. Impala metadata replication is performed as a part of Hive external table replication. Impala replication is only supported between two CDH clusters. The Impala and Hive services must be running on both clusters.
Replicating Impala Metadata
To enable Impala metadata replication, set the field to Yes during Hive external table replication policy creation. After the replication job completes, you can view the Impala UDFs (user-defined functions) on the target cluster, just as on the source cluster. As part of replicating the UDFs, the binaries in which they are defined are also replicated.
Invalidating Impala Metadata
For Impala clusters that do not use LDAP authentication, configure during Hive external table replication policy creation so that the
replication job automatically invalidates Impala metadata after replication completes. If
the clusters use Sentry, the Impala user should have permissions to run INVALIDATE
METADATA.
The configuration causes the Hive/Impala replication job to run the Impala
INVALIDATE METADATA statement per table on the destination cluster after
completing the replication. The statement purges the metadata of the replicated tables and
views within the destination cluster's Impala upon completion of replication, allowing other
Impala clients at the destination to query these tables successfully with accurate results.
However, this operation is potentially unsafe if DDL operations are being performed on any
of the replicated tables or views while the replication is running. In general, directly
modifying replicated data/metadata on the destination is not recommended. Ignoring this can
lead to unexpected or incorrect behavior of applications and queries using these tables or
views.
Alternatively, you can run the INVALIDATE METADATA statement manually for
replicated tables.
