Skip to main content
Version: 1.3.1.0

Deploying Trino on Kubernetes via Ambari

Tech Preview — ODP 1.3.2.0

This feature will be included in ODP 1.3.2.0 as a Tech Preview, currently in qualification. It is available for early enterprise testing.

Interested in early access? Contact our team to join the enterprise early access program.

Why Trino on Kubernetes

Trino is a distributed SQL query engine designed for interactive analytics at scale. While ODP includes Hive and Impala for SQL workloads on the cluster, Trino on Kubernetes addresses a distinct set of requirements:

Elastic scaling: Kubernetes makes it straightforward to scale Trino workers horizontally based on query load. You can run 2 workers during off-peak hours and 20 during peak analytics workloads, without provisioning dedicated cluster nodes.

Workload isolation: Trino runs in containers separate from the Hadoop cluster nodes, preventing heavy analytical queries from competing for resources with YARN jobs on the same machines.

Federation: Trino can query data from multiple sources simultaneously — Iceberg tables in HDFS, PostgreSQL databases, Kafka topics — in a single query. This federation capability makes it a natural hub for exploratory analytics across heterogeneous data.

Superset integration: Apache Superset, also deployable through Ambari, connects to Trino via JDBC. Running both Trino and Superset on Kubernetes and connecting them to ODP data creates a complete, container-native BI stack backed by governed Hadoop storage.

Trino Helm Chart Managed by Ambari

Ambari deploys Trino using the official Trino Helm chart, with values generated from the ODP cluster configuration. The chart creates:

  • Trino Coordinator: 1 pod (configurable) — receives queries, plans execution, manages workers
  • Trino Workers: N pods (configurable) — execute query fragments in parallel
  • ConfigMaps: config.properties, jvm.config, log.properties, node.properties for coordinator and workers
  • Secrets: Kerberos keytab, Ranger plugin configuration, TLS certificates (if configured)
  • Service: ClusterIP service for internal access; optionally a LoadBalancer or Ingress for external JDBC access

Deploying Trino from Ambari

Step 1: Open the Kubernetes View

In Ambari, navigate to Views > Kubernetes Manager (or the name you gave your view instance). The application catalog appears.

Step 2: Select Trino and Click Deploy

Click Deploy next to Trino. The configuration wizard opens.

Step 3: Configure the Deployment

General tab:

SettingDescriptionDefault
Helm Release NameName for the Helm releasetrino
NamespaceKubernetes namespaceodp-apps
Coordinator ReplicasNumber of coordinator pods1
Worker ReplicasNumber of worker pods3
Worker CPU RequestCPU request per worker2
Worker CPU LimitCPU limit per worker4
Worker Memory RequestMemory request per worker8Gi
Worker Memory LimitMemory limit per worker16Gi
Coordinator MemoryMemory for coordinator pod4Gi

Connectivity tab (pre-populated from ODP cluster):

SettingSourceExample
Hive Metastore URIHive config in Ambarithrift://master02.example.com:9083,thrift://master03.example.com:9083
HDFS Default FSHDFS config in Ambarihdfs://mycluster
Ranger REST URLRanger config in Ambarihttps://master01.example.com:6182

Security tab:

SettingDescription
Kerberos PrincipalService principal for Trino (e.g., trino/k8s-worker.example.com@REALM)
KeytabGenerated by Ambari from the cluster KDC
Kerberos RealmPre-populated from cluster Kerberos config
KDC AddressPre-populated from cluster Kerberos config

Step 4: Submit

Click Deploy. Ambari creates a background operation. Monitor progress in Background Operations. A typical Trino deployment takes 2–5 minutes.

Trino Configuration Details

Coordinator config.properties

Ambari generates the following coordinator configuration (abbreviated):

coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery.uri=http://trino-coordinator:8080

Worker config.properties

coordinator=false
http-server.http.port=8080
discovery.uri=http://trino-coordinator:8080

Connecting Trino to Iceberg via Hive Metastore

Ambari configures the Hive catalog for Trino with Iceberg support:

# /etc/trino/catalog/hive.properties (inside container)
connector.name=iceberg
hive.metastore.uri=thrift://master02.example.com:9083,thrift://master03.example.com:9083
hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/_HOST@REALM.EXAMPLE.COM
hive.metastore.client.principal=trino/k8s-worker.example.com@REALM.EXAMPLE.COM
hive.metastore.client.keytab=/etc/trino/keytabs/trino.keytab
hive.config.resources=/etc/trino/conf/core-site.xml,/etc/trino/conf/hdfs-site.xml
iceberg.file-format=PARQUET

The core-site.xml and hdfs-site.xml files are generated by Ambari from the current HDFS configuration and mounted as ConfigMaps.

Kerberos Authentication for Trino

Trino uses the service keytab provisioned by Ambari to authenticate to:

  • Hive Metastore: to resolve table locations and schema
  • HDFS: to read data files directly (for tables stored in HDFS/Ozone)
  • Ranger REST API: to fetch authorization policies

The keytab is stored as a Kubernetes Secret:

apiVersion: v1
kind: Secret
metadata:
name: trino-kerberos-keytab
namespace: odp-apps
type: Opaque
data:
trino.keytab: <base64-encoded keytab>

And mounted into all Trino pods at /etc/trino/keytabs/trino.keytab.

Ranger Authorization for Trino

Ambari configures the Trino Ranger plugin at deployment time. The plugin intercepts every Trino query and evaluates it against Ranger policies before execution.

How It Works

  1. A user submits a query via JDBC or Trino CLI.
  2. The Trino Coordinator forwards the authorization request to the Ranger plugin.
  3. The plugin queries the Ranger REST API with the user identity, resource (catalog/schema/table/column), and action (SELECT, INSERT, etc.).
  4. Ranger evaluates the applicable policies (resource-based and tag-based).
  5. If access is denied, Trino returns an authorization error. If permitted, the query proceeds.

Ranger Service Definition for Trino

In Ranger, Trino appears as a service of type Trino (or Presto). Create a Trino service in Ranger pointing to the Trino coordinator:

Service Name: trino_k8s
Trino URL: http://trino-coordinator.odp-apps.svc.cluster.local:8080
Username: ranger_trino_lookup
Password: <password>

Policies are then created in this service to control access to Trino catalogs, schemas, and tables.

Relationship with Hive Ranger Policies

Users who can access a Hive table through Ranger do not automatically get access to the same table through Trino. Trino and Hive are separate Ranger services. You need to grant access in both services, or use Ranger tag-based policies to grant access via Atlas tags, which apply across services.

Tag-Based Policies for Consistency

To avoid maintaining duplicate policies for Hive and Trino, use Atlas to tag your sensitive tables (e.g., PII, RESTRICTED) and create tag-based policies in Ranger that apply to all services. This ensures consistent access control regardless of which engine a user queries through.

Accessing Trino

JDBC

The Trino JDBC driver connects to the coordinator service. The connection URL format:

jdbc:trino://<trino-coordinator-host>:<port>/<catalog>/<schema>

If Trino is exposed via a Kubernetes LoadBalancer or NodePort:

jdbc:trino://trino.example.com:8080/hive/default

With Kerberos authentication (from a client with a valid Kerberos ticket):

jdbc:trino://trino.example.com:8080/hive/default?KerberosRemoteServiceName=trino&KerberosPrincipal=user@REALM.EXAMPLE.COM&KerberosConfigPath=/etc/krb5.conf&KerberosKeytabPath=/home/user/user.keytab&SSL=true&SSLTrustStorePath=/etc/ssl/certs/truststore.jks

Trino CLI

trino \
--server https://trino.example.com:8080 \
--krb5-remote-service-name trino \
--krb5-principal user@REALM.EXAMPLE.COM \
--krb5-config-path /etc/krb5.conf \
--catalog hive \
--schema default

Verifying Access to ODP Data

Once connected, verify that Iceberg tables on ODP are accessible:

-- List available schemas
SHOW SCHEMAS FROM hive;

-- Query an Iceberg table
SELECT * FROM hive.iceberg_demo.my_table LIMIT 10;

-- Check Trino's view of Iceberg table history
SELECT * FROM hive.iceberg_demo."my_table$snapshots";

Monitoring Trino from Ambari

The Kubernetes View displays the following for the Trino deployment:

  • Pod status: coordinator and worker pod states (Running, Pending, CrashLoopBackOff)
  • Replica count: actual vs. desired worker count
  • Helm release status: current revision and Flux reconciliation status
  • Recent events: Kubernetes events for the Trino namespace

For deeper Trino monitoring, access the Trino Web UI at http://<trino-coordinator>:8080. It shows:

  • Active and queued queries
  • Worker node status and resource usage
  • Query execution plans and stage details

Scaling Workers

To adjust the number of Trino workers after initial deployment:

  1. In the Kubernetes View, select the Trino deployment.
  2. Click Configure.
  3. Update the Worker Replicas field.
  4. Click Upgrade. Ambari performs a helm upgrade with the new value.

Kubernetes scales the worker Deployment to the new replica count. Existing queries continue running on current workers; new workers become available for new queries within a minute or two.