Skip to main content
Version: 1.3.1.0

Hardware Recommendations for ODP

This page provides hardware sizing guidance for deploying an ODP cluster. These recommendations cover the typical three-tier topology (master nodes, worker nodes, edge node) and include specific guidance for hardware-sensitive components such as Kudu and Kafka.

Typical ODP Cluster Topology

An ODP cluster is generally organized into three categories of nodes:

Node TypeCount (typical)Purpose
Master nodes3–5Coordination services: NameNode, ResourceManager, HBase Master, ZooKeeper, Ambari Server, Ranger, Atlas, Knox
Worker nodes3–NData storage and processing: HDFS DataNode, YARN NodeManager, Impala daemon, Kafka broker, Kudu tablet server
Edge node1–2Client-facing: Hadoop clients, Knox gateway, NiFi, HiveServer2 client connections

This separation of concerns ensures that master coordination services are not impacted by the resource consumption of data workloads running on worker nodes.

Master Node Recommendations

Master nodes host coordination services that must remain highly available and responsive. ODP supports NameNode HA and ResourceManager HA, requiring a minimum of 2 master nodes for HA pairs. A third master (or a dedicated quorum node) is typically required for ZooKeeper and for HBase Master standby.

ResourceMinimumRecommended
CPU8 cores16–32 cores
RAM32 GB64–128 GB
OS / logs disk1x SSD 200 GB2x SSD 400 GB (RAID 1 or mirrored)
Network10 GbE25 GbE

Notes:

  • Use SSD for the operating system volume and log directories (/var/log/) to avoid disk I/O becoming a bottleneck during NameNode edit log flushes or Ambari agent activity
  • The Ambari Server database (PostgreSQL, MySQL, or Oracle) should reside on a volume with low-latency I/O; SSD is strongly recommended
  • Ranger and Atlas both benefit from dedicated heap and fast disk for their embedded or external Solr audit stores

Worker Node Recommendations

Worker nodes carry the bulk of the storage and compute load. Sizing depends heavily on your expected data volume, replication factor, and processing workloads (batch, interactive, streaming).

ResourceMinimumRecommended (medium cluster)
CPU8 cores16–24 cores
RAM32 GB64–256 GB
Data disks4x HDD 4 TB6–12x HDD 6–12 TB (JBOD, no RAID)
OS disk1x SSD 200 GB1x SSD 200 GB
Network10 GbE25 GbE

Notes:

  • HDFS DataNode data disks should be configured as JBOD (Just a Bunch of Disks) — do not use RAID for data disks. HDFS provides its own replication (default factor: 3) and hardware RAID is unnecessary and wasteful
  • Size total raw worker storage as: required usable storage × replication factor × 1.25 (for overhead)
  • YARN NodeManager available memory should be set to total RAM minus OS overhead and any co-located service heap. A good starting point is total RAM - 8 GB for YARN containers

Kudu — SSD Requirement

warning

Kudu tablet servers require SSD storage for tablet data directories. Running Kudu on spinning HDDs results in severe performance degradation and is not supported in ODP.

If Kudu is deployed, worker nodes hosting Kudu tablet servers should have at minimum 2–4 NVMe or SATA SSDs dedicated to Kudu data directories, separate from HDFS data disks.

Kafka — Dedicated Disk Recommendation

tip

Kafka brokers are I/O intensive and perform best with dedicated disks for Kafka log directories.

If Kafka is co-located with HDFS DataNodes on worker nodes (a common pattern for smaller clusters), configure Kafka log directories on separate disks from HDFS data directories. For large Kafka deployments, consider dedicated Kafka broker nodes with high-throughput SSDs or high-capacity HDDs.

Edge Node Recommendations

The edge node serves as the entry point for users and applications connecting to the cluster. It hosts Hadoop client libraries, Knox (API gateway / SSO), and optionally NiFi for data ingestion pipelines.

ResourceMinimumRecommended
CPU4 cores8 cores
RAM16 GB32 GB
Disk1x SSD 200 GB1x SSD 500 GB
Network10 GbE10 GbE

Notes:

  • Knox handles TLS termination and Kerberos SPNEGO; a faster CPU reduces authentication latency for concurrent users
  • If NiFi is deployed on the edge node, increase RAM to at least 64 GB and ensure sufficient local disk for NiFi repositories (FlowFile, Content, Provenance)

Network Requirements

RequirementMinimumRecommended
Internal cluster bandwidth10 GbE25 GbE
SwitchLayer 2 (same VLAN)Layer 3 with dedicated VLAN
Latency (intra-cluster)< 1 ms< 0.5 ms

For clusters with more than 20 worker nodes, 25 GbE interconnects are strongly recommended to avoid network becoming a bottleneck for HDFS replication traffic and Spark shuffle operations.

See the Network Requirements page for DNS, NTP, firewall, and Kerberos connectivity details.