Skip to main content
Version: 1.3.1.0

Integrations and Examples

This page summarizes integration of Hive Warehouse Connector (HWC) in Spark applications, including example code snippets.

spark-submit

Include the assembly jar and required Spark configs:

spark-submit \
--class com.yourorg.YourApp \
--jars target/scala-2.12/hive-warehouse-connector-assembly-1.3.1.jar \
--conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hs2-host:10001/;transportMode=http;httpPath=cliservice;ssl=true" \
--conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/hs2-host@EXAMPLE.COM \
--conf spark.datasource.hive.warehouse.read.mode=secure_access \
--conf spark.datasource.hive.warehouse.load.staging.dir=hdfs://nameservice/apps/hwc_staging \
your-app.jar

PySpark

Add the assembly jar and Python package:

pyspark \
--jars target/scala-2.12/hive-warehouse-connector-assembly-1.3.1.jar \
--py-files python/pyspark_hwc-1.3.1.zip

Example: simple ETL

val hwc = com.hortonworks.hwc.HiveWarehouseSession.session(spark).build()

val src = spark.table("default.source")
val transformed = src.filter("flag = true")

transformed.write
.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
.option("database", "default")
.option("table", "target")
.mode("overwrite")
.save()