NVIDIA DGX SuperPOD Reference Architecture: DDN A³I AI400 Appliance with EXA5

NVIDIA DGX SuperPOD with DDN A³I AI400 Appliance Reference Architecture

Please complete the form to receive email with Reference Architecture.

The NVIDIA DGX SuperPOD™ is a first-of-its-kind artificial intelligence (AI) supercomputing infrastructure. DDN A³I with the EXA5 parallel file system is a turnkey, AI data storage infrastructure for rapid deployment, featuring faster performance, effortless scale, and simplified operations through deeper integration. The combined solution delivers groundbreaking performance, deploys in weeks as a fully integrated system, and is designed to solve the world's most challenging AI problems.

The groundbreaking performance delivered by the DGX SuperPOD enables the rapid training of deep learning models at great scale. To create the most accurate image classification, object detection, and natural language models require large amounts of training data. This data must be accessed rapidly across the entire SuperPOD. To maximize the computational capabilities of the DGX SuperPOD, it is essential to pair the DGX SuperPOD with a storage system fitted to the task.

In this paper, the DDN® A³I AI400 appliance was evaluated for suitability for supporting deep learning (DL) workloads when connected to the DGX SuperPOD. The AI400 appliance is a compact and low-power storage solution that provides incredible raw performance with the use of NVMe drives for storage and InfiniBand as its network transport. The AI400 appliance leverages the EXAScaler file system which provides an enterprise version of the Lustre parallel filesystem which features increased hardening and additional data management capabilities. Parallel filesystems such as Lustre simplify data access and support additional use cases where fast data is required for efficient training and local caching is not adequate.