DDN’s AI Storage Appliance Proves Why It is the Solution of Choice for Generative AI, Large Language Models, and Other AI Use Cases
Founded in 2018, MLCommons is an open engineering consortium with a mission to make machine learning better for everyone through benchmarks and data. The first benchmark was the MLPerf™ Training suite, followed by adding various benchmarks for essential parts of AI workflows. Their tests are designed to supply an unbiased way to evaluate AI hardware and software. MLCommons is constantly iterating its benchmarks and adding new test suites to meet the evolving industry trends.
The Introduction of MLPerf Storage Benchmark
MLCommons recently introduced its first suite of benchmarks to evaluate the performance of storage for ML training workloads. They presented these new tests because many AI workloads are extraordinarily data-intensive and demand high-performance storage to ensure good overall system performance and availability. MLPerf Storage attempts to accurately reflect the I/O patterns imposed by ML workloads and provide a standard by which a user can evaluate storage architectures.
The suite currently consists of two workload types and measures how fast storage systems can supply training data when a model is trained across a scalable number of AI accelerators:
- Medical Image Segmentation (3D UNET)
- Natural Language Processing (Bert-large)
DDN Shines in Benchmark Results
DDN’s submissions cover both MLPerf Storage Benchmark categories and five different GPU compute infrastructure configurations to illustrate the AI400X2 appliance’s ability to effectively support workloads as they scale.
The results demonstrate DDN’s industry-leading efficiency. Achieved using only a single storage appliance, DDN attained the highest bandwidth and most accelerators supported for on-premises solutions in both categories of the MLPerf Storage benchmarks. The remarkable results demonstrate 700% better efficiency on a per storage node basis when compared to the competitive on-premises solution submissions.
DDN’s reported results include the following highlights:
- In the single compute node benchmark, one DDN AI400X2 NVMe appliance running DDN’s EXAScaler 6.2 parallel filesystem served 40 accelerators at a throughput of 16.2 GB/s.
- In the multi-node benchmark, one DDN AI400X2 NVMe appliance served 160 accelerators across ten GPU compute nodes at a throughput of 61.6 GB/s.
- It is also notable that the second benchmark submission is limited by the performance of the compute clients rather than the single AI400X2 system, demonstrating the superior efficiency of the 2U appliance.
AI Data Centers Need Efficient Storage
As accelerated computing becomes more widely adopted, the efficiency of the storage system supporting AI workloads increases in importance. Power and space in data centers are becoming increasingly valuable commodities, and by efficiently delivering superior bandwidth to computing systems, DDN’s solutions simultaneously ensure GPUs are fully utilized while minimizing storage footprint.
The System Behind the Numbers
As the demand for larger and larger AI datasets to feed generative AI and Large Language Models continues to grow, the need for simple but capable storage becomes more apparent. As demonstrated in the MLPerf Storage benchmark, the DDN AI400X2 appliance delivers leadership performance in the world’s most efficient package to complement the largest GPU systems in the world for an end-to-end solution that supplies outstanding results.
The AI400X2’s shared parallel architecture and client protocol ensures high levels of performance, scalability, security, and reliability for accelerated computing systems. Multiple parallel data paths extend from the drives all the way to containerized applications running on the GPUs for fastest AI processing. With DDN’s true end-to-end parallelism, data is delivered with high-throughput, low-latency, and massive concurrency in transactions. This ensures applications achieve the most productivity and efficiency with all GPU cycles put to productive use. Optimized parallel data delivery directly translates to increased application performance and faster training completion times.