DDN breaks its own record as the fastest AI storage with DGX A100: 162 GiB/s delivered directly to GPUs, 60x more than NFS and supplies Reference Architecture with NVIDIA for DGX SuperPOD customers.
How do organizations ensure that their environment is running ideally when the scales get large? Is your investment in GPU Infrastructure, data scientists, data sources and ingestion all optimized with the headroom to accelerate value from data? Simplicity and performance go hand in hand when it comes to scale. The complexity of managing data silos, multiple tiers of storage on different systems and storage appliances not designed for scale can be a significant risk factor for AI projects.
DDN is a proven supplier to achieve at-scale and in production. Solutions using standard protocols like NFS and designed for modest sized workloads become exceedingly more difficult to manage and ultimately fail to scale. Even with accelerating technologies such as ROCE, NFS is a bottleneck to applications running on GPUs, especially as the number of GPUs increase. In this post, we’ll discuss an advanced technology development that we’ve furthered in close collaboration with NVIDIA to ensure that DDN data platforms achieve the most from DGX A100 systems and the Reference Architecture that makes SuperPOD an enterprise deployable product.
Organizations are seeking to derive more competitive value from their AI, so their data infrastructure must deliver the saturation-level performance to AI workflows with the minimum of infrastructure and manual data movement. DDN A3I Storage outperforms all others whilst minimizing configuration complexity through efficient performance and easy scaling.
With decades of experience delivering solutions to some of the world’s largest computing centers, one of DDN’s most recent deployments is NVIDIA’s Selene the 7th ranked system on the TOP 500 and largest SuperPOD with DGX A100 currently in operation. The Selene cluster is made up of 280 DGX A100 systems. These GPU-based systems, and all GPU computing platforms, represent a significant revolution in how enormous amounts of data are processed.
NVIDIA GPUs provide massively concurrent processing capabilities and the DDN shared parallel architecture is proven to ensure full saturation for all types of unstructured data workloads – large, small and mixed file type environments – at even the largest scale with GPUs. This makes DDN A3I Solutions the ideal data platform for enabling end-to-end AI workflows like deep learning and inference. In fact, DDN enables the largest autonomous driving programs in the world with data platforms that scale several hundred petabytes in capacity and deliver terabytes of second in performance.
But not ones to rest on our laurels, DDN and NVIDIA are constantly looking to improve the performance of AI workflows. One way we are working together to do this is with GPUDirect Storage (GDS), which is part of NVIDIA’s Magnum IO set of APIs.
GDS improves efficiency of data movement between storage and GPU with a faster, more direct data path. Data is transferred directly from a network interface card (NIC) on a host to a GPU without having to go through the system memory and CPU. This eliminates any bottleneck in the IO path from the system architecture, eliminates unnecessary data copies, reduces latency, and frees up CPU resources for other tasks, like image processing during ingest for deep learning applications.
The NVIDIA DGX A100 system includes eight single ported Mellanox ConnectX6 Network Interfaces. To manage the traffic inside the DGX A100, there are four interconnected PCI switches, each with two GPUs and two NICs. The four PCI switches are interconnected through the two AMD CPUs. DDN’s fully integrated GDS and shared parallel architecture manages the data path even better than other suppliers by guaranteeing data transfers pass directly through the nearest interface. So the combination of DDN A3I Storage with GDS ensures data goes through the shortest, most efficient path possible between application and storage.
Our lab testing connected four AI400X appliances, the same appliances and configurations we’ve been shipping for the past six months, to a DGX A100 system over an HDR 200 Infiniband network. We connected all eight NICS on the DGX A100 system, closest to the GPUs. We measured performance using all eight GPUs first through the standard IO path in the CPU, second with GDS enabled and through the optimized IO path.
There are three very exciting points to note in our results:
- First, we deliver 162 GiB/s read and 143 GiB/s write performance directly to GPUs with GDS. That’s almost the full line rate capability of the eight HDR200 NICs! It’s 60X more than what enterprise file sharing protocols like NFS can provide. The infinitely scalable nature of the DDN shared parallel architecture makes it simple to achieve this level of performance with multiple systems engaged simultaneously – for example, with a POD or SuperPOD deployment.
- Second, GDS enables us to deliver 57% more throughput to GPUs compared to the traditional path through the CPU. Our performance via the CPU path is very impressive– 103 GiB/s read throughput to a single client. But because of limitations in the data path through the CPU, the addition of GDS achieves the most data saturation for GPUs with DGX A100.
- Third, GDS is completely integrated in our DDN AI400X appliance, without any subsequent changes after installation. The performance benchmark application running on the eight GPUs seamlessly and instantly took advantage of the GDS capability of our system. 162 GiB/s performance delivered to a DGX A100 system out of the box. This does not involve additional licensing or cost.
It’s very exciting to see how our shared parallel architecture has evolved. A3I delivers data with high-throughput, low-latency and massive concurrency equally well to both a single system supercomputer like the DGX A100, and a large supercomputing cluster with many clients such as the NVIDIA SuperPOD. DDN technology is proven to deliver maximum flexibility and ability at any scale. By using our unique MultiRail algorithm, DDN A3I appliances automatically balance network traffic to make optimized high-performance networking a simple out-of-box feature.
One additional key to securing our customer’s competitive advantage, is ensuring that performance scales not just on a few GPUs but that we can meet the needs of massive systems, like the 2,240 GPUs found in NVIDIA’s Selene deployment. By running extensive testing in cooperation with NVIDIA, we have worked together to deliver the industry’s first SuperPOD Reference Architecture.
We have authored this Reference Architecture jointly to provide Enterprise organizations the quickest, safest, and most effective path to AI supercomputing. By removing uncertainty around deployment scenarios and performance expectations, IT organizations can concentrate on delivering leadership level AI computing to their lines of business. The Reference Architecture is an end to end platform that includes computing, storage, networking, infrastructure management, and data science workflow tools optimized to work together and provide maximum performance at scale, further optimized by expert implementations services for quickest time to operation.
Having worked with NVIDIA and their Mellanox Infiniband networks for many years, DDN can ensure that the storage solutions matches the performance, density and flexibility needs with the most efficient data path possible. DDN is the only vendor certified to meet those needs at SuperPOD scale.
Through repeatable deployments at multiple customers worldwide, DDN and NVIDIA with partners like Atos and WWT, have the experience adapting and integrating the SuperPOD architecture for a variety of industries and use cases. Any applications that needs image classification, object detection, language processing, or reinforcement learning at large scale will benefit from SuperPOD, and the combined flexibility of our solution makes running these and other use cases a reality from a single platform.
Are you deploying AI and HPC at scale with NVIDIA GPUs and DGX A100 systems? If so, get in touch with us and let’s discuss how our data platforms with GDS or the SuperPOD Reference Architecture can help you get the most performance from your applications.
To hear more about the NVIDIA Selene SuperPOD deployment and DDN’s plans for the future, set up a one on one meeting with DDN leadership or technical experts