Skip to main content
Version: 1.3.1.0

Encryption in ODP

ODP enforces encryption at multiple layers: in-transit TLS for all service communication, at-rest transparent encryption for HDFS, and wire encryption for RPC protocols. This document covers each layer and how to configure it through Ambari.

Encryption in Transit (TLS)

All inter-service communication in a secured ODP cluster runs over TLS. This includes:

  • HDFS block transfers (DataNode ↔ client, DataNode ↔ DataNode)
  • YARN Resource Manager ↔ Node Manager communication
  • HBase master ↔ RegionServer communication
  • ZooKeeper client connections
  • Ambari server ↔ agent communication
  • Knox gateway (client-facing TLS)
  • Ranger Admin, Ranger KMS, and all web UIs

Ambari's Auto-TLS feature automates certificate lifecycle management across the entire cluster. It:

  1. Generates a cluster-internal CA or integrates with FreeIPA's Dogtag CA.
  2. Issues signed certificates for every Ambari agent (one per host).
  3. Distributes certificates and configures all services automatically.
  4. Handles certificate renewal.

Enable Auto-TLS during initial cluster setup:

# Run on the Ambari server host
/var/lib/ambari-server/resources/scripts/configs.py \
--action=set \
--cluster=<cluster-name> \
--config-type=cluster-env \
--key=security_enabled \
--value=true

In practice, Auto-TLS is enabled through the Ambari setup wizard before deploying services. Once enabled, all services are configured with TLS automatically.

Manual Certificate Management

For environments where Auto-TLS is not used, certificates must be generated and distributed manually. Key locations by service:

ServiceKeystore / Truststore path
HDFS (NameNode, DataNode)/etc/security/serverKeys/hdfs.jks
YARN/etc/security/serverKeys/yarn.jks
HBase/etc/security/serverKeys/hbase.jks
Ranger Admin/etc/ranger/admin/conf/ranger-admin-keystore.jks
Knox/usr/odp/current/knox-server/data/security/keystores/gateway.jks

Generate a signed certificate for a service:

# Generate key and CSR
keytool -genkeypair -alias hdfs-nn \
-keyalg RSA -keysize 2048 \
-validity 730 \
-keystore /etc/security/serverKeys/hdfs.jks \
-storepass <keystore-password> \
-dname "CN=master01.dev01.hadoop.clemlab.com, OU=ODP, O=Clemlab, C=FR"

keytool -certreq -alias hdfs-nn \
-keystore /etc/security/serverKeys/hdfs.jks \
-storepass <keystore-password> \
-file hdfs-nn.csr

# Sign with your CA, then import:
keytool -importcert -alias ca-root \
-keystore /etc/security/serverKeys/hdfs.jks \
-storepass <keystore-password> \
-file ca.crt

keytool -importcert -alias hdfs-nn \
-keystore /etc/security/serverKeys/hdfs.jks \
-storepass <keystore-password> \
-file hdfs-nn.crt

Encryption at Rest — HDFS Transparent Data Encryption (TDE)

HDFS TDE encrypts data blocks on disk without requiring any changes to applications. Encryption and decryption are transparent to HDFS clients — the client reads and writes plaintext while the DataNode stores ciphertext.

Architecture

 HDFS Client
│ plaintext read/write

NameNode (manages encryption zones + DEK metadata)
│ DEK wrapped with zone key

Ranger KMS (Key Management Server)
│ stores master keys (zone keys)

DataNode (stores encrypted blocks)
  • Zone key: Master key stored in Ranger KMS. Never leaves KMS.
  • Data Encryption Key (DEK): Per-file key generated by HDFS. Encrypted (wrapped) with the zone key. Stored in HDFS metadata.
  • Encrypted Data Encryption Key (EDEK): What is stored in NameNode's fsimage.

Installing and Configuring Ranger KMS

Ranger KMS is a separate service deployed through Ambari (Add Service → Ranger KMS).

After installation, configure the KMS backing store in Ambari under Ranger KMS → Configs:

# Database for key store (MySQL / PostgreSQL)
ranger.ks.jpa.jdbc.url=jdbc:mysql://db-host:3306/rangerkms
ranger.ks.jpa.jdbc.user=rangerkms
ranger.ks.jpa.jdbc.password=<password>

Ranger KMS listens on port 9292 (HTTP) or 9293 (HTTPS).

Creating Encryption Zones

An encryption zone is an HDFS directory where all files are automatically encrypted with a specific zone key.

# Step 1: Authenticate as HDFS admin
kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@DEV01.HADOOP.CLEMLAB.COM

# Step 2: Create a key in Ranger KMS
hadoop key create my-zone-key \
-size 256 \
-provider kms://https@ranger-kms-host:9293/kms

# Step 3: Create the HDFS directory
hdfs dfs -mkdir /encrypted/finance

# Step 4: Create the encryption zone
hdfs crypto -createZone -keyName my-zone-key -path /encrypted/finance

Verify the zone:

hdfs crypto -listZones
# Output:
# /encrypted/finance my-zone-key

All files written to /encrypted/finance are automatically encrypted. Existing files are not encrypted retroactively — they must be moved into the zone.

Key Management

Create, rotate, and delete keys through the Ranger KMS REST API or the hadoop key CLI:

# List all keys
hadoop key list --provider kms://https@ranger-kms-host:9293/kms

# Roll (rotate) a key — new files use the new key version
hadoop key roll my-zone-key \
--provider kms://https@ranger-kms-host:9293/kms

# Delete a key (only if no encryption zone uses it)
hadoop key delete my-zone-key \
--provider kms://https@ranger-kms-host:9293/kms

Access to keys is controlled by Ranger KMS policies. Only HDFS and authorised users can retrieve DEKs.

Accessing Encrypted Data

Applications do not need code changes to read encrypted data. However:

  • The application's Kerberos principal must have DECRYPT_EEK permission in the Ranger KMS policy for the relevant key.
  • HDFS handles decryption transparently during reads.

Grant decrypt access in Ranger KMS UI under Service → KMS → Policies:

Resource: my-zone-key
User: hive — Permissions: GET, GET_KEYS, GET_METADATA, GENERATE_EEK, DECRYPT_EEK
User: spark — Permissions: GET, GET_METADATA, GENERATE_EEK, DECRYPT_EEK

Wire Encryption for RPC

Beyond TLS for HTTP/REST endpoints, ODP also supports in-flight encryption for Hadoop RPC (binary protocol used between HDFS clients and NameNode/DataNode, between MapReduce tasks, etc.).

Hadoop RPC Encryption

Configure in core-site.xml via Ambari:

<property>
<name>hadoop.rpc.protection</name>
<!-- Options: authentication (integrity only), integrity, privacy (full encryption) -->
<value>privacy</value>
</property>

<property>
<name>dfs.encrypt.data.transfer</name>
<value>true</value>
</property>

<property>
<name>dfs.encrypt.data.transfer.algorithm</name>
<!-- Options: 3des, rc4 -->
<value>3des</value>
</property>

Setting hadoop.rpc.protection=privacy enables full encryption of all RPC traffic. integrity provides HMAC-based message authentication without encryption. authentication provides only Kerberos authentication without protection of the data stream.

HBase RPC Encryption

Configure in hbase-site.xml:

<property>
<name>hbase.rpc.protection</name>
<value>privacy</value>
</property>

ZooKeeper TLS

ZooKeeper client-server and quorum TLS is configured in zoo.cfg:

sslQuorum=true
client.secure=true
ssl.keyStore.location=/etc/security/serverKeys/zookeeper.jks
ssl.keyStore.password=<password>
ssl.trustStore.location=/etc/security/serverKeys/zookeeper.truststore.jks
ssl.trustStore.password=<password>

Ambari manages these settings when Auto-TLS is enabled.


Ozone Encryption

Apache Ozone 2.0.0 supports encryption at the object level through the same Ranger KMS infrastructure used for HDFS TDE.

Encrypting an Ozone Bucket

# Authenticate
kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@DEV01.HADOOP.CLEMLAB.COM

# Create an encrypted bucket using a KMS key
ozone sh bucket create \
--replicationFactor=THREE \
--type=RATIS \
-k my-zone-key \
o3://ozone-host:9862/myvolume/encrypted-bucket

# Verify
ozone sh bucket info o3://ozone-host:9862/myvolume/encrypted-bucket
# Look for: encryptionKeyName: my-zone-key

All objects (keys) written to an encrypted Ozone bucket are encrypted with a DEK derived from the bucket's zone key, following the same KMS delegation pattern as HDFS TDE.

Ozone in-transit encryption

Ozone uses TLS for all gRPC-based communication between OzoneManager, SCM (Storage Container Manager), and DataNodes. Configure through Ambari under Ozone → Configs → Advanced ozone-site:

ozone.security.enabled=true
hdds.grpc.tls.enabled=true

When Ambari Auto-TLS is active, these are set automatically.