AI Infrastructure: Powering Scalable, Unified Workloads with DDN Infinia

By Sanjay Jagad, VP of Product Management, DDN

The conversation about AI infrastructure is changing.

For the past several years, the dominant question was: can your storage keep up with training? Sustained throughput, checkpoint speed, metadata at scale — the storage layer’s job was to keep GPU clusters fed without becoming the bottleneck. DDN built Infinia to answer that question, and the platform has done exactly that, validated today at scales exceeding 100,000 GPUs across some of the world’s largest AI deployments.

But the question the market is asking in 2026 is different, and more interesting.

It is no longer just about training. AI factories are now running continuous, multi-stage pipelines: inference serving millions of requests in parallel with training, RAG pipelines querying the same data lake that ingestion pipelines are still filling, multi-protocol workloads from research and life sciences teams that need POSIX semantics alongside the object store. The infrastructure layer underneath all of this has to do something legacy architectures were never designed to do — serve all of these workloads simultaneously, on shared infrastructure, with hard isolation and consistent performance guarantees, at any scale.

This is the moment DDN has been building toward. And this release marks the moment Infinia steps fully into it.

Three Markets. One Platform.

The Infinia release we are announcing today is not a maintenance update. It is a strategic expansion into three of the highest-growth areas in AI infrastructure, enabled by the architectural foundations that have been in place since day one.

AI Inference at Scale

As AI factories shift from training to production inference, the data infrastructure requirements change fundamentally. Low-latency access to model weights, prompt data, and context stores is now a first-class requirement — not an afterthought. Infinia’s performance profile — fast S3 access, intelligent metadata-driven retrieval, and direct GPU data paths — makes it purpose-suited for inference pipelines serving millions of requests at scale.

The headline proof point: 18× faster time-to-first-token and 75% reduction in input token cost via KV cache offload. These are not benchmark numbers. They come from production inference

deployments where Infinia serves as the persistent KV cache layer for vLLM — allowing previously computed context to be loaded directly from Infinia, bypassing prefill computation entirely. GPU-Direct SDK transfers tensor data over RDMA directly to GPU memory, with no CPU hops, no protocol overhead.

This is what it means to have storage in the inference stack rather than behind it.

Retrieval-Augmented Generation and Intelligent Data Lakes

RAG is only as good as the data behind it. The infrastructure challenge is not storing the data — it is making it queryable, fresh, and retrievable at inference time without manual curation, redundant copies, or async indexing lag.

Infinia’s approach is architectural. Because metadata lives in the same distributed KV engine as data, catalog freshness is guaranteed at write-commit time — the moment data is written, it is searchable. No background indexing. No ETL pipelines keeping a separate database in sync. For organizations building scalable AI data lakes, this means the data lake is genuinely intelligent: accessible by AI frameworks including Apache Spark, Trino, PyTorch, and TensorFlow, without data movement or reformatting, delivering 22× faster RAG pipeline performance vs. traditional object store.

For enterprise AI teams, this is the difference between a data lake and an AI data lake.

Multi-Protocol Workloads: Research, Life Sciences, FSI, Manufacturing

Not every workload speaks S3. A genomics pipeline, a computational fluid dynamics simulation, a quantitative research workflow, a digital twin environment — these require standard POSIX file semantics. Until now, serving these workloads alongside AI training and inference meant separate infrastructure, separate operations, and data copies moving between them.

With native POSIX file access entering customer validation, Infinia extends its reach to these workloads without requiring separate infrastructure. File-based applications run on the same platform, against the same data, with the same performance and isolation guarantees as the AI-native object workloads Infinia already serves. This also brings Infinia directly into DDN’s long-standing stronghold in high-performance computing — offering HPC users a modern, AI-ready platform that is familiar and immediately productive.

Deepening the NVIDIA Partnership

DDN and NVIDIA share a common mission: ensuring that the world’s most demanding AI workloads never wait on data.

This release deepens that partnership with integrations across the NVIDIA inference stack — spanning data transfer, inference orchestration, KV cache management, and storage I/O optimization. For organizations building on NVIDIA DGX, HGX, or cloud-based GPU infrastructure, Infinia is designed to be the storage layer that keeps the stack performing. DDN and NVIDIA are actively co-developing and co-validating these integrations, with the goal of

making Infinia a recommended and supported data platform for NVIDIA customers at every scale.

Details of the full NVIDIA integration roadmap are available to qualified customers and partners under NDA.

The Foundation That Makes This Possible

The reason Infinia can expand into inference, intelligent data lakes, and multi-protocol workloads simultaneously — without becoming a different product for each — is the architecture.

A distributed, log-structured KV engine where every IO is independently optimized. Metadata in the same engine as data, guaranteeing real-time catalog freshness. Layout indirection that enables zero-downtime cluster scaling without moving a byte of data. Per-IO dynamic erasure coding that gives every write optimal protection without fixed stripe penalties. Hierarchical key-space multi-tenancy that provisions isolated tenants in seconds.

These are not features. They are structural properties of the platform. They are why Infinia can run training, inference, RAG retrieval, and POSIX file workloads on the same cluster, with hard isolation between tenants, without architectural compromise.

The performance numbers are the result of getting the architecture right:

Outcome	Measured Result
RAG pipeline vs. traditional object store	22× faster
Checkpoint speed vs. legacy systems	600× faster
Inference time-to-first-token	18× improvement
Input token cost via KV cache offload	75% reduction
Throughput per node (S3 GET)	~12.5 GiB/s — linear, validated
Cluster bring-up	Under 10 minutes
Tenant provisioning	~10 seconds via 4 REST API calls

What’s New in This Release

The platform expansion is enabled by a set of capabilities shipping in this release and entering customer validation:

POSIX File Access

POSIX file access is now available as a technology preview for approved design partners. Standard Linux file semantics — cp, mv, ls, rm, mkdir — working natively on Infinia mounts.

File-level distributed locking. mmap() support for HPC and analytics workloads. Supported on RHEL 9.5/9.6 and Ubuntu 24.04. GA production support is targeted for summer 2026.

AI Pipeline Management

A new operations experience purpose-built for AI infrastructure teams: a simplified installer with pre-flight validation, unified APIs, and automation-first lifecycle management that integrates with Ansible, Terraform, and cloud-native orchestration tooling. A cluster that is operational in under 10 minutes from zero, with full observability from the moment it is.

NVIDIA Ecosystem Integrations

KV cache fabric for vLLM demonstrated at GTC, direct GPU data paths, and the co-validation work underway across the NVIDIA DGX and HGX stack. The inference stack integration is the most strategically significant development in Infinia’s market positioning in the past year.

What Comes Next

The 2.4 roadmap continues the platform build-out with commitments that unlock the next tier of enterprise and NCP deployments:

Scale validation: 500–600 node clusters with published linear benchmarks
POSIX GA: Production-ready with GPUDirect Storage, POSIX ACLs, S3/POSIX interop on the same dataset
Enterprise security: RBAC with AD/LDAP, KMS/KMIP with per-tenant encryption keys, WORM/Object Lock
Data protection: Replication engine and data mover for business continuity
NVIDIA certification: DGX positioning and a formal certification path

Infinia is becoming the comprehensive AI data intelligence layer — from raw data ingest through inference serving — with the security, ecosystem integration, and operational tooling that production AI factories demand.

DDN Infinia is generally available now. POSIX file access, AI pipeline management capabilities, and NVIDIA ecosystem integrations are in active customer validation with GA targeted for summer 2026. Contact your DDN representative or visit ddn.com to schedule a technical briefing or begin a proof of concept.

DDN Infinia Expands to Power Inference, Intelligent Data Lakes, and Multi-Protocol AI Workloads

Three Markets. One Platform.

AI Inference at Scale

Retrieval-Augmented Generation and Intelligent Data Lakes

Multi-Protocol Workloads: Research, Life Sciences, FSI, Manufacturing

Deepening the NVIDIA Partnership

The Foundation That Makes This Possible

What’s New in This Release

POSIX File Access

AI Pipeline Management

NVIDIA Ecosystem Integrations

What Comes Next

DDN Infinia Expands to Power Inference, Intelligent Data Lakes, and Multi-Protocol AI Workloads

DDN Appoints Kevin Delane as President and Chief Revenue Officer to Scale Global AI Leadership

DDN Earns Recognition on the Third Annual CRN AI 100 List

Email Us

About Us

Call Us

Solutions

Locations

Resources