Version: 1.3.1.0

Apache Iceberg Overview

Apache Iceberg is an open table format for large analytical datasets stored on distributed file systems. It brings database-grade reliability — ACID transactions, schema evolution, time travel, and partition evolution — to data that lives in HDFS, Ozone, or object storage, while remaining readable by multiple engines simultaneously. ODP 1.3.1.0 ships Iceberg 1.6.1.

What is Iceberg?

Iceberg defines how data files (Parquet, ORC, Avro) and their metadata are organized on a file system. Rather than relying on directory-naming conventions and a central Metastore to track partitions (as traditional Hive tables do), Iceberg stores a self-contained metadata tree alongside the data:

table/
├── metadata/
│   ├── v1.metadata.json        ← snapshot history, schema, partition spec
│   ├── snap-*.avro             ← manifest lists (pointers to manifest files)
│   └── manifests/
│       └── *.avro              ← manifest files (lists of data files + statistics)
└── data/
    └── *.parquet               ← actual data files

This architecture enables atomic, consistent table operations without requiring a lock on the Metastore, and allows any engine with an Iceberg reader to query the table correctly without coordination.

Why Iceberg vs Traditional Hive Tables

Capability	Hive partition tables	Iceberg
ACID transactions	ORC only, limited	All formats, full ACID
Schema evolution	Limited (add columns only)	Add, drop, rename, reorder, widen types
Partition evolution	Requires full table rewrite	Add/change partitions without rewriting data
Hidden partitioning	No (partition in query)	Yes (transparent to queries)
Time travel	No	Yes (query any snapshot)
Concurrent writers	No	Optimistic concurrency control
Row-level deletes	ORC ACID only	Positional and equality deletes (all formats)
Multi-engine reads	Limited	Native (Hive, Spark, Impala, Trino)

ACID Transactions

Iceberg uses optimistic concurrency control: each write operation reads the current snapshot, makes changes, and commits a new snapshot atomically. If two writers conflict (both modify the same rows), one commit fails and must be retried. This provides serializable isolation for INSERT, UPDATE, DELETE, and MERGE operations without requiring a central lock manager.

Schema Evolution

Iceberg tracks schema changes in the metadata using field IDs rather than positional column indices. This means:

Adding a column does not require rewriting existing data files.
Dropping a column is recorded in metadata; existing files still contain the dropped column's data but readers ignore it.
Renaming a column is a metadata-only operation — data files remain unchanged.
Widening a type (e.g., INT to LONG) is safe and backward-compatible.

Time Travel and Rollback

Every write to an Iceberg table creates a new snapshot. Old snapshots are retained until explicitly expired, enabling:

-- Query the table as it was on January 1, 2025
SELECT * FROM events FOR SYSTEM_TIME AS OF '2025-01-01 00:00:00';

-- Query a specific snapshot
SELECT * FROM events FOR VERSION AS OF 8462937;

Time travel supports data auditing, debugging pipeline errors, and reproducing ML training datasets at a point in time. Rollback restores the table to a previous snapshot without data deletion.

Partition Evolution and Hidden Partitioning

Traditional Hive tables embed partition values in directory names (dt=2025-01-01/), requiring query authors to specify partition predicates explicitly. Iceberg separates partition specification from data layout:

Hidden partitioning: Iceberg derives partition values from column transforms (bucket(user_id, 16), truncate(event_date, month)) transparently. Query authors filter on event_date = '2025-01-01' without knowing the physical partitioning scheme.
Partition evolution: As data grows, the partitioning scheme can be changed (e.g., from monthly to daily buckets) without rewriting historical data. New data uses the new scheme; old data remains on the old scheme. Readers handle both transparently.

Iceberg 1.6.1 in ODP 1.3.1.0

ODP integrates Iceberg as a first-class table format across the stack:

Hive 4.0.1: Native Iceberg catalog support via the StoredByIceberg storage handler. Hive DDL operations (CREATE TABLE, ALTER TABLE, DROP TABLE) work on Iceberg tables identically to native Hive tables.
Spark 3.5.6: Iceberg Spark Runtime included. Full read/write support including time travel, schema evolution, and streaming writes.
Impala 4.5.0: Read support for Iceberg v2 tables (equality deletes, position deletes).
Trino 476: Full Iceberg connector with read/write and time travel support.

This means a dataset written by a Spark pipeline is immediately queryable by Hive, Impala, and Trino — with consistent ACID guarantees — without any format conversion or data copying.

Multi-Engine Lakehouse Architecture

The combination of Iceberg's engine-agnostic metadata and ODP's multi-engine integration enables a true lakehouse architecture:

NiFi ingestion ──► Kafka ──► Spark Streaming ──► Iceberg table on HDFS
                                                        │
                    ┌───────────────────────────────────┤
                    │                                   │
              Hive (batch ETL)              Impala (BI dashboards)
              Spark (ML training)           Trino (federated queries)

All engines share the same physical data files and the same metadata snapshot, eliminating the consistency problems that arise when data is copied between systems.

Polaris REST Catalog (Coming in ODP 1.3.2.0)

ODP 1.3.1.0 uses the Hive Metastore as the Iceberg catalog (tables are registered in the HMS). ODP 1.3.2.0 will introduce Apache Polaris as a tech preview: a REST-based Iceberg catalog that implements the Iceberg REST Catalog specification.

The Polaris REST catalog provides:

A standard HTTP API for catalog operations (create/list/drop namespaces and tables)
Fine-grained access control at the catalog and namespace level
Support for multiple warehouse locations
Compatibility with any Iceberg-capable engine that implements the REST catalog spec

Use Cases

Data Lakehouse

Iceberg is the foundation of the lakehouse pattern in ODP: HDFS provides scalable, cost-effective storage; Iceberg provides the reliability and query performance layer that traditionally required a data warehouse. Teams can run SQL analytics directly on the data lake without ETL pipelines to a separate warehouse system.

ML Training Datasets

Reproducibility is a core requirement for responsible ML. Iceberg's time travel capability allows data scientists to snapshot a training dataset at a specific point in time, ensuring that the exact dataset used to train a model can be reproduced for debugging, retraining, or regulatory review — even after the underlying data has been updated.

Audit Trails via Time Travel

Compliance use cases often require answering questions about historical data states: "What did this record look like on a specific date?" or "Who changed this record and when?". With Iceberg, these questions can be answered by querying past snapshots without maintaining a separate audit table.

Apache Iceberg Overview

What is Iceberg?​

Why Iceberg vs Traditional Hive Tables​

ACID Transactions​

Schema Evolution​

Time Travel and Rollback​

Partition Evolution and Hidden Partitioning​

Iceberg 1.6.1 in ODP 1.3.1.0​

Multi-Engine Lakehouse Architecture​

Polaris REST Catalog (Coming in ODP 1.3.2.0)​

Use Cases​

Data Lakehouse​

ML Training Datasets​

Audit Trails via Time Travel​