The need for faster data infrastructure to support real-world AI platforms increased markedly this past few months. The broadening use cases for natural language processing, real-time video inference, the increasing definition and greater volume of images and diversity of image sources, plus AI framework updates that improve the efficiency of inference problems, has been met with new, more powerful GPU platforms. Each step forward in the AI ecosystem adds to the risk that your storage infrastructure will become the limiting factor that prevents successful AI – and one that will be very difficult to resolve once your data volumes move into multiple petabytes.

In our previous post, Dr. James Coomer reported on initial testing which tested our DDN A³I Storage appliances with the latest GPU-based computing systems. Our out-of-the box results broke all records for data throughput into an AI platform. In short, just two of our AI400X appliances delivered a whopping 99 GB/s of throughput to applications running on a GPU system through a single mount point. That’s over 33X more throughput than enterprise file sharing protocols like NFS. It’s an excellent demonstration of the enablement and acceleration that the DDN shared parallel architecture provides for HPC and AI workloads at-scale, particularly with GPUs.  Our customers using GPUs for deep learning report that higher throughput can directly translate to higher application performance, and shorter run times, which allows them to get more out of their AI infrastructure.

Data platform performance is multi-dimensional reflecting the varying demands of different approaches to running AI workloads. Throughput is one aspect of capability, measuring the volume of data that can be moved in a certain amount of time. IOPS is another important consideration, measuring the number of unique data operations handled per second. Historically, data platforms required customers to choose between optimizing for one or the other, depending on the underlying media and architecture type. With the DDN AI400X, we can satisfy both equally and simultaneously, even when deployed as a hybrid configuration using flash for performance and large capacity disk as part of a solution with best economics. Our appliances perform equally well with any volume and mix of small and large data.

This enables us to deliver excellent performance for modern HPC and AI workloads on GPUs, most of which require dynamic and flexible data access to a wide variety of data types. As part of our initial testing, we delivered over 4.8 million IOPS through a single mount point. That’s almost 50X more IOPS than NFS.

We achieve this by fully integrating, aligning and optimizing every layer of our solution, from the NVME drive all the way through the network to your application, and doing so automatically without the need for careful and complicated management of NFS shares and mounts. Our shared parallel architecture gives us the ability to handle massive data concurrency at nearly limitless scale, even across extremely large GPU clusters. With this latest testing, we demonstrate that we can deliver unrivaled amounts of throughput and IOPS to a single client, and that’s proven to scale predictably as more clients are engaged as part of the workload.

DDN All-around performance is well proven with the largest NVIDIA SuperPOD currently in operation. NVIDIA’s recent announcement of Selene, the largest NVIDIA SuperPOD using DGX A100, NVIDIA Mellanox networking and DDN A3I storage ranked #7 on the latest top 500 Supercomputer list. NVIDIA deployed several DDN AI400X appliances to present data at scale to hundreds of DGX A100 nodes.

Are you deploying AI and HPC at scale with GPUs? If so, get in touch with us and experience how DDN can help you build AI Data Infrastructure that eliminates your risks to AI success.

  • William Beaudin
  • William Beaudin
  • Date: August 27, 2020