Historically, the deployment of the largest supercomputing systems was a months-long process which involved extensive customization and tuning to extract the maximum performance for available resources. NVIDIA’s recent announcement of their NVIDIA DGX SuperPOD infrastructure is a game–changer for the world of complex AI modeling and other HPC-like workloads that require extreme multi-node scale. DDN and NVIDIA have committed an extensive effort to create an end-to-end deployment that pairs the power of the NVIDIA DGX-2 system and the parallel data–delivery system of DDN’s A³I appliances for high-performance environments, which are easy to deploy and manage.
DGX SuperPOD implements a reference architecture integrating 64 NVIDIA DGX-2 systems with Mellanox InfiniBand™ networking, and the DDN AI400X™ to create shared supercomputing infrastructure designed not just for the lab world, but for businesses exploring data science at scale.
With super-efficient storage as a backend, DDN can fulfill the promise of simplicity and deploy-ability.
Now, commercial customers that are struggling to deploy their AI models at scale with massive data sets have a readily–available recipe that requires little to no customization to drive business innovation. IT can consolidate silos of data science within their organization. And with features like secure, multi-tenancy and auditing that are available with the EXA5 data management solution found on the AI400X, customers have a reliable, high-performance data infrastructure with isolation and access management across divisions.
This journey isn’t over, either. We continue to collaborate with NVIDIA to further enhance IO efficiency. The recently–announced NVIDIA Magnum IO tools will further streamline the path for AI and deep learning workloads. GPUDirect Storage, a part of the Magnum IO release, takes the compute system’s CPU out of the path to deliver data directly from storage to GPU. Early testing between DDN and NVIDIA has revealed a 20x speed-up for some of the most data-intensive applications. Stay tuned for more information on our relentless pursuit of optimal AI data infrastructure.