I have been planning to write a blog for some time about the merits of site-wide file systems and how these change the supercomputer purchase and TCO paradigm.
Allow me to set the stage for the discussion…
Since the beginning of time, large-scale supercomputer purchases have adhered to the procurement methodology where customers purchase new storage to directly support the new supercomputers they’ve purchased. This concept dates back to an era where monolithic supercomputers were configured with Direct Attached Storage (DAS) where the systems mounted a local file system.
When you look at it, the idea made sense…
…buy a new computer, upgrade your storage system at the same time.
Today, advances in scalable file storage technology enable a whole new level of system optimization, application acceleration and cost savings to be achieved by changing that paradigm.
Fast forward to 2013, what’s changed?
- Massively parallel systems are built from 1,000s of low-cost computers
- Scalable file systems have been engineered to scale to support 10s of 1,000s of cluster nodes, even across different network technologies
- Storage has become an increasingly significant cost component for HPC data centers and big data computing environments
Yet, with rapid change in supercomputer technology, the deployment model is still largely the same as it was ~10 years ago, where customers build in the upgrade to their new HPC system purchase and the systems they buy are directly connected to their new machines. In single-cluster environments, this makes perfect sense – one cluster, one storage system. The challenge (and opportunity) arises when you consider multi-cluster environments where customers now are looking for ways to liberate themselves from islands of HPC storage. These multi-cluster environments are becoming increasingly common as customers deploy visualization systems, aggressively roll out new clusters, introduce new GPGPU based clusters and more.
Consider the following scenario where a facility buys three machines and three dedicated storage systems to support these machines. A picture is worth 1,000 words, so here goes:
Assuming a law of averages, it’s expected that the majority of the peak I/O happens at separate times (randomly), since applications are bursting or doing heavy I/O only a fraction of any application runtime. Considering this fact, as well as other aspects of the HPC workflow, the facility has introduced a number of architectural and workflow inefficiencies into the cluster environment.
- The facility has purchased 100GB/s of combined throughput, where the peak requirement of any one cluster is half the total performance sold;
- Performance utilization is low, as the resources are not shared across clusters, and single storage system performance is built for sporadic burst I/O of its dedicated cluster;
- Data sharing across HPC clusters requires data copies and wall-clock wait times as applications move data between islands of storage.
- The care and feeding of three separate storage resources, is more complex and expensive than provisioning and maintaining just one.
Leading computing organizations in the US such as TACC and ORNL are promoting a new site-wide storage architecture strategy that enables cost savings, faster application burst performance, workflow efficiencies and a much simpler approach to deploying HPC resources.
One more picture…
As seen in this admittedly simplistic representation, the benefits are clear:
- The customer can save on total storage purchase by sharing resources
- imagine only building for the performance of your one fastest cluster!
Simplicity is the fourth dimension of value that comes from a site wide cluster strategy, but this predominantly applies to, what I’ll call, power users. For organizations who add to their computing environment periodically, it’s a massive benefit for them to not have to validate and test new storage architectures as they’re also shaking out new clusters. The storage is already online and ubiquitous.
To give more references and context, I’ll check back on this blog as we announce some of the new customers who are rolling out side-wide systems with DDN technology. Until then, I’ll leave you with some understanding of WHY they chose DDN when they do go down this road.
- Wide and Deep Scaling: Customers can scale out to TB/s of performance, or scale deep capacity behind a small number of SFA storage appliances to always build to the precise levels of performance and capacity required while optimizing the cost of configuration at every step.
- Best-In-Class Performance: DDN supports leadership levels of both throughput and IOPS via its real-time, parallel storage processing architecture. So, whether you’re running DDN’s EXAScaler or GRIDScaler or building Lustre or GPFS environments, DDN systems accelerate mixed workloads with a cache centric architecture designed for unpredictable.
- Quality of Service: DDN systems mask the impact of drive failures from applications while also performing very fast drive rebuilds. As parallel file systems stripe data across 1000s of hard drives, DDN technology ensures the best production performance utilization and performance levels to deliver the highest levels of sustained application performance, thereby avoiding Amdahl’s law [I’ll write a blog post on just this one in a bit…].
- DirectMon™ and SFA APIs: 1-100s of file storage appliances can be managed simply from a single pane of glass. DDN built its platform from the ground up to truly scale in clustered storage environments.
Centralizing and liberating storage from the DAS paradigm takes serious thought and planning – considerations such as platform interoperability and scaling considerations need to be taken into account at day one. That said, the capabilities described above have become key architectural considerations when designing site-wide environments and are not always clear when storage is considered at face value.
It’s a fun time in high-scale HPC. The stability and scalability of scale-out file systems is enabling customers to translate this technology into dramatic savings and data center-wide efficiencies for multi-cluster environments.
So, to paraphrase Sting, if you love fast computational wall-clock time and optimized budget dollars, set your storage free. Architect your site-wide storage systems smart and your storage will love you and your science back.