Skip to main content
Version: 1.3.1.0

Read Modes

HWC supports several read modes. Choose a mode based on security requirements, performance, and data size.

Configure with:

spark.datasource.hive.warehouse.read.mode=secure_access
spark.datasource.hive.warehouse.read.jdbc.mode=cluster

Direct reader (LLAP)

direct_reader_v1 and direct_reader_v2 read ORC data directly via LLAP without routing queries through HS2. This is the fastest option but does not enforce HS2/Ranger policies. Use it for trusted ETL pipelines.

Key characteristics:

  • Reads a consistent snapshot of a single table at query time.
  • Requires HDFS permissions to access table data.
  • Does not apply Hive authorization (Ranger) at HS2.
  • Does not support writes or streaming inserts.

JDBC mode

jdbc_cluster and jdbc_client send queries to HS2 and return results to Spark. Use JDBC when you need HS2 authorization enforcement and query semantics provided by Hive.

  • jdbc_client: results flow through the driver (simpler, slower).
  • jdbc_cluster: HS2 streams results to executors (better for larger results).

Secure access mode

secure_access executes the query in HS2 and stages the results in a temporary directory using a CTAS workflow. Spark then reads the staged ORC data. This is the recommended mode for Ranger-protected clusters.

Requirements:

  • Set spark.datasource.hive.warehouse.load.staging.dir to a fully qualified URI.
  • Ensure the staging directory has proper permissions.

Notes:

  • Spark UDFs are not supported because the query is executed by Hive.
  • Reads can fail with lock errors if another transaction holds an exclusive lock; retry after commit.
  • Cache can be disabled with spark.hadoop.secure.access.cache.disable=true.