DDN BLOG

With all the possible impediments to a successful AI program, DDN ensures that managing data isn’t one of them. There are many requirements for AI storage, but the top 3 are scalability, performance, and good economics. We’ve noticed that getting all of these three correct is (a) not easy and (b) down to fundamental architecture. It is relatively easy to get two of the three right but getting all three right and making it easy to deploy and manage on top of that is what makes DDN unique.

DDN has the fastest, most scalable solutions that retain cost-effectiveness at any scale because of good architectural choices: End-to-end parallel data movement, the use of the fastest protocols with a very deep understanding of scalability in every element of the architecture. DDN also has the experience in deploying and scaling solutions from one unit to hundreds of systems.

Once you’ve built a storage software platform, the way it manages, moves and protects data is fundamental. Get that wrong, and you’ve built storage for databases, or storage for document management, or storage for email servers, but not storage for data-intensive AI.

At the technical heart of AI, is massive and granular parallelism. GPUs have pushed aside the old world of highly capable, high clock speed cores and replaced them with thousands of simple cores. The essence of deep learning’s success is in this use of parallelism. Chopping up a problem into thousands and millions of pieces is much more efficient than loading up a single do-everything process. The same is true for storage. Almost every storage system on the planet fails in this respect by using some form of NFS. This immediately removes true parallelism, kills the main source of application performance, and opens the door to complexities for scale.

So last Thursday we took a leading GPU client platform with 8 of the latest GPUs and high-performance networking technology and asked: “How much data can we get into that single client from the minimum amount of storage?”

DDN’s AI400X appliance is an all-NVMe true-parallel filesystem in just 2 rack units. We used just two of them and connected them quickly via our network to this GPU system we had just deployed in our lab. With the AI400X, making the filesystem connection into the GPU Client is very easy. Even though the system has 8 network connections, our intelligent client software uses all of them with fine-grained parallelism, and the single mount point is available without tuning and configuration.

Within minutes we were pushing 99GB/s (GigaBYTES, not Gigabits) of data into the GPU System. That’s like condensing 2 weeks of video streaming into every second between our storage and the AI system. This is accomplished while minimizing the GPU system’s CPU utilization. While a protocol like NFS will cause CPU utilization to max out to 100% during data transfer leaving no cycles for running your AI workloads, DDN’s optimized data path consumed a maximum of 24%, even whilst moving 99GB/s of data leaving plenty of cycles for the real workload.  As a bonus, you get all this from just 4 Rack Units of storage.

That means that the AI400X is the fastest storage on the planet for AI. But much more importantly, it is both easy and cost-effective to get the performance to your AI Applications. The knock-on effect of this is that you get maximum performance from all your AI resources – the data, the storage, the network, the computing, and your human personnel.

Whenever you see a claim of top performance always seek out the answer to how much storage was needed and always question the complexity of managing the system efficiently. Make sure your data partner has the serious experience deploying production systems in the most intensive organizations worldwide. DDN has done this for over two decades and has taken that experience and made it easy for all to consume.

  • Dr. James Coomer
  • Dr. James Coomer
  • Sr. Vice President of Products
  • Date: July 21, 2020