DDN breaks its own record performance with DGX A100: 178 GB/s delivered directly to GPUs, 60x more than NFS.
How do organizations ensure that their environment is running ideally when the scales get large? Is your investment in GPU Infrastructure, data scientists, data sources and ingestion all optimized with the headroom to accelerate value from data? Simplicity and performance go hand in hand when it comes to scale. The complexity of managing data silos, multiple tiers of storage on different systems and storage appliances not designed for scale can be a significant risk factor for AI projects.
DDN is a proven supplier to achieve at-scale and in production, unlike enterprise protocols like NFS solutions which are designed for modest sized workloads, become exceedingly more difficult to manage and ultimately fail to scale. Even with accelerating technologies such as ROCE, NFS is a bottleneck to applications running on GPUs. In this post, we’ll discuss an advanced technology development that we’ve furthered in close collaboration with NVIDIA to ensure that DDN data platforms achieve the most from DGX A100 systems.
Maximizes Computing Performance
Solutions based on the DDN AI400X can supply up to 178 GB/s to a single NVIDIA DGX A100.
Scales Easily and Seamlessly
DDN AI400X based solutions maximize the investment in a single DGX A100 and proven at scale with up to 280 clients with NVIDIA SuperPOD with DGX A100.
Proven in Production
DDN has decades of experience in supplying the absolute best in storage for data-intensive organizations who are driving innovation and discovery.
Just as organizations need to bring more competitive value from their AI, Data Infrastructure should deliver the saturation-level performance to AI workflows with the minimum of infrastructure and manual data movement. DDN A3I Storage outperforms all others whilst minimizing configuration complexity and AI infrastructure.
With decades of experience delivering solutions to some of the world’s largest computing centers, one of DDN’s most recent deployments is NVIDIA’s Selene the 7th ranked system on the IO500 and largest SuperPOD with DGX A100 currently in operation. The Selene cluster is made up of 280 DGX A100 systems. These GPU-based systems, and all GPU computing platforms, represent a significant revolution in how enormous amounts of data are processed.
NVIDIA GPUs provide massively concurrent processing capabilities and the DDN shared parallel architecture is proven to ensure full saturation for all types of unstructured data workloads – large, small and mixed file type environments – at even the largest scale with GPUs. This makes DDN A3I Solutions the ideal data platform for enabling end-to-end AI workflows like deep learning and inference. In fact, DDN enables the largest autonomous driving programs in the world with data platforms that scale several hundred petabytes in capacity and deliver terabytes of second in performance.
But not ones to rest on our laurels, DDN and NVIDIA are constantly looking to improve the performance of AI workflows. One way we are working together to do this is with GPUDirect Storage (GDS), which is part of NVIDIA’s Magnum IO set of APIs.
GDS improves efficiency of data movement between storage and GPU with a faster, more direct data path. Data is transferred directly from a network interface card (NIC) on a host to a GPU without having to go through the system memory and CPU. This eliminates any bottleneck in the IO path from the system architecture, eliminates unnecessary data copies, reduces latency, and frees up CPU resources for other tasks, like image processing during ingest for deep learning applications.
The NVIDIA DGX A100 system includes eight single-ported Mellanox ConnectX6 Network Interfaces. To manage the traffic inside the DGX A100, there are four interconnected PCI switches, each with two GPUs and two NICs. The four PCI switches are interconnected through the two AMD CPUs. DDN’s fully-integrated GDS and shared parallel architecture manages the data path even better than other suppliers by guaranteeing data transfers pass directly through the nearest interface. So the combination of DDN A3I Storage with GDS ensures data goes through the shortest, most efficient path possible between application and storage.
Our lab testing connected four AI400X appliances, the same appliances and configurations we’ve been shipping for the past six months, to a DGX A100 system over an HDR 200 Infiniband network. We connected all eight single-ported NICS on the DGX A100 system, closest to the GPUs. We measured performance using all eight GPUs first through the standard IO path in the CPU, second with GDS enabled and through the optimized IO path.
There are three very exciting points to note in our results:
- First, we deliver 178 GB/s read and 154 GB/s write performance directly to GPUs with GDS. That’s almost the full line rate capability of the eight HDR200 NICs! It’s 60X more than what enterprise file sharing protocols like NFS can provide. The infinitely scalable nature of the DDN shared parallel architecture makes it simple to achieve this level of performance with multiple systems engaged simultaneously – for example, with a POD or SuperPOD deployment.
- Second, GDS enables us to deliver 50% more throughput to GPUs compared to the traditional path through the CPU. Our performance via the CPU path is very impressive– 108 GB/s read throughput to a single client. But because of DGX system limitations in the data path through the CPU, we need GDS to really achieve the most data saturation for GPUs with DGX A100.
- Third, GDS is fully-integrated in our DDN AI400X appliance. The performance benchmark application running on the eight GPUs seamlessly and instantly took advantage of the GDS capability of our system. 178 GB/s performance delivered to a DGX A100 system out of the box. This does not involve additional licensing or cost.
It’s very exciting to see how our shared parallel architecture has evolved. A3I delivers data with high-throughput, low-latency and massive concurrency equally well to both a single system supercomputer like the DGX A100, and a large supercomputing cluster with many clients such as the NVIDIA SuperPOD. DDN technology is proven to deliver maximum flexibility and ability at any scale. By using our unique MultiRail algorithm, DDN A3I appliances automatically balance network traffic to make optimized high-performance networking a simple out-of-box feature.
Are you deploying AI and HPC at scale with NVIDIA GPUs and DGX A100 systems? If so, get in touch with us and let’s discuss how our data platforms with GDS can help you get the most performance from your applications running on GPUs.