Apache Iceberg Overview
Apache Iceberg is an open table format for large analytical datasets stored on distributed file systems. It brings database-grade reliability — ACID transactions, schema evolution, time travel, and partition evolution — to data that lives in HDFS, Ozone, or object storage, while remaining readable by multiple engines simultaneously. ODP 1.3.1.0 ships Iceberg 1.6.1.
What is Iceberg?
Iceberg defines how data files (Parquet, ORC, Avro) and their metadata are organized on a file system. Rather than relying on directory-naming conventions and a central Metastore to track partitions (as traditional Hive tables do), Iceberg stores a self-contained metadata tree alongside the data:
table/
├── metadata/
│ ├── v1.metadata.json ← snapshot history, schema, partition spec
│ ├── snap-*.avro ← manifest lists (pointers to manifest files)
│ └── manifests/
│ └── *.avro ← manifest files (lists of data files + statistics)
└── data/
└── *.parquet ← actual data files
This architecture enables atomic, consistent table operations without requiring a lock on the Metastore, and allows any engine with an Iceberg reader to query the table correctly without coordination.
Why Iceberg vs Traditional Hive Tables
| Capability | Hive partition tables | Iceberg |
|---|---|---|
| ACID transactions | ORC only, limited | All formats, full ACID |
| Schema evolution | Limited (add columns only) | Add, drop, rename, reorder, widen types |
| Partition evolution | Requires full table rewrite | Add/change partitions without rewriting data |
| Hidden partitioning | No (partition in query) | Yes (transparent to queries) |
| Time travel | No | Yes (query any snapshot) |
| Concurrent writers | No | Optimistic concurrency control |
| Row-level deletes | ORC ACID only | Positional and equality deletes (all formats) |
| Multi-engine reads | Limited | Native (Hive, Spark, Impala, Trino) |
ACID Transactions
Iceberg uses optimistic concurrency control: each write operation reads the current snapshot, makes changes, and commits a new snapshot atomically. If two writers conflict (both modify the same rows), one commit fails and must be retried. This provides serializable isolation for INSERT, UPDATE, DELETE, and MERGE operations without requiring a central lock manager.
Schema Evolution
Iceberg tracks schema changes in the metadata using field IDs rather than positional column indices. This means:
- Adding a column does not require rewriting existing data files.
- Dropping a column is recorded in metadata; existing files still contain the dropped column's data but readers ignore it.
- Renaming a column is a metadata-only operation — data files remain unchanged.
- Widening a type (e.g.,
INTtoLONG) is safe and backward-compatible.
Time Travel and Rollback
Every write to an Iceberg table creates a new snapshot. Old snapshots are retained until explicitly expired, enabling:
-- Query the table as it was on January 1, 2025
SELECT * FROM events FOR SYSTEM_TIME AS OF '2025-01-01 00:00:00';
-- Query a specific snapshot
SELECT * FROM events FOR VERSION AS OF 8462937;
Time travel supports data auditing, debugging pipeline errors, and reproducing ML training datasets at a point in time. Rollback restores the table to a previous snapshot without data deletion.
Partition Evolution and Hidden Partitioning
Traditional Hive tables embed partition values in directory names (dt=2025-01-01/), requiring query authors to specify partition predicates explicitly. Iceberg separates partition specification from data layout:
- Hidden partitioning: Iceberg derives partition values from column transforms (
bucket(user_id, 16),truncate(event_date, month)) transparently. Query authors filter onevent_date = '2025-01-01'without knowing the physical partitioning scheme. - Partition evolution: As data grows, the partitioning scheme can be changed (e.g., from monthly to daily buckets) without rewriting historical data. New data uses the new scheme; old data remains on the old scheme. Readers handle both transparently.
Iceberg 1.6.1 in ODP 1.3.1.0
ODP integrates Iceberg as a first-class table format across the stack:
- Hive 4.0.1: Native Iceberg catalog support via the
StoredByIcebergstorage handler. Hive DDL operations (CREATE TABLE,ALTER TABLE,DROP TABLE) work on Iceberg tables identically to native Hive tables. - Spark 3.5.6: Iceberg Spark Runtime included. Full read/write support including time travel, schema evolution, and streaming writes.
- Impala 4.5.0: Read support for Iceberg v2 tables (equality deletes, position deletes).
- Trino 476: Full Iceberg connector with read/write and time travel support.
This means a dataset written by a Spark pipeline is immediately queryable by Hive, Impala, and Trino — with consistent ACID guarantees — without any format conversion or data copying.
Multi-Engine Lakehouse Architecture
The combination of Iceberg's engine-agnostic metadata and ODP's multi-engine integration enables a true lakehouse architecture:
NiFi ingestion ──► Kafka ──► Spark Streaming ──► Iceberg table on HDFS
│
┌───────────────────────────────────┤
│ │
Hive (batch ETL) Impala (BI dashboards)
Spark (ML training) Trino (federated queries)
All engines share the same physical data files and the same metadata snapshot, eliminating the consistency problems that arise when data is copied between systems.
Polaris REST Catalog (Coming in ODP 1.3.2.0)
ODP 1.3.1.0 uses the Hive Metastore as the Iceberg catalog (tables are registered in the HMS). ODP 1.3.2.0 will introduce Apache Polaris as a tech preview: a REST-based Iceberg catalog that implements the Iceberg REST Catalog specification.
The Polaris REST catalog provides:
- A standard HTTP API for catalog operations (create/list/drop namespaces and tables)
- Fine-grained access control at the catalog and namespace level
- Support for multiple warehouse locations
- Compatibility with any Iceberg-capable engine that implements the REST catalog spec
Use Cases
Data Lakehouse
Iceberg is the foundation of the lakehouse pattern in ODP: HDFS provides scalable, cost-effective storage; Iceberg provides the reliability and query performance layer that traditionally required a data warehouse. Teams can run SQL analytics directly on the data lake without ETL pipelines to a separate warehouse system.
ML Training Datasets
Reproducibility is a core requirement for responsible ML. Iceberg's time travel capability allows data scientists to snapshot a training dataset at a specific point in time, ensuring that the exact dataset used to train a model can be reproduced for debugging, retraining, or regulatory review — even after the underlying data has been updated.
Audit Trails via Time Travel
Compliance use cases often require answering questions about historical data states: "What did this record look like on a specific date?" or "Who changed this record and when?". With Iceberg, these questions can be answered by querying past snapshots without maintaining a separate audit table.