DDN Actually Has One & Customers Are Using It Today. And BTW, Its Capabilities Are Beyond What The Other Guys Are Only Able To Merely Talk About.
It’s November and the weeks of preparation for Supercomputing are now complete. We’re locked, loaded, here and ready for another exciting few days of showcasing all of DDN’s latest innovations and sharing them with both old and new friends in New Orleans. It’s been a busy year across DDN, but especially for the IME (Infinite Memory Engine) Product Team. Though our functions are diverse (Engineering, Sales Engineers, Product Management, Marketing), we all share the same goal of delivering a breakthrough piece of technology that will revolutionize the way our HPC customers provision and accelerate I/O. This has always been our mission at DDN, as we’ve proven to be the clear leaders in engineering storage and file systems that deliver the highest levels of performance. Beyond the technology, our team of SE’s are both technology and application experts who know how to identify and remove I/O bottlenecks across the topology better than anyone else. They are the “Formula 1 Pit Crew” of I/O, masterfully optimizing and tuning hardware to extract every last ounce of performance in order to deliver the fastest run times and results for our users.
Now, with IME a reality, these SE’s have a new arrow in their quiver to accelerate I/O with more power than ever before. For those of you who haven’t been briefed yet, IME is a remarkable piece of software that virtualizes commodity SSDs into a single pool of non-volatile memory-based fast data storage that sits right between your compute cluster and the parallel file system. Not only have we moved data closer to the compute to remove storage and network latency, but now we can shield parallel file systems from the small file I/O and fragmented data patterns that bring Lustre® & GPFSTM to its knees and slow down the entire cluster when running “problem” applications.
We started our development journey on IME in 2010, by collaborating with Top20 supercomputing sites who were seeking to remove the bottlenecks found in today’s technologies that inhibited exascale planning. Early on, folks were recognizing that parallel file systems and disk-based storage wouldn’t scale performance effectively enough to support the exponential increase in cores and concurrent threads required to achieve exascale. Most notably, fears about significantly longer checkpoint/restart process in an exascale world took center stage – and inspired the conceptualization of Burst Buffers to reduce this operation’s duration. From there, a handful of strategic compute, storage and memory vendors began work on burst buffer designs. And IME was born.
With innovation at the core of who DDN is, from our design inception it became clear that utilizing NVM had the potential to accelerate all sorts of application I/O beyond that of checkpoint/restart – and could do it today more efficiently (without waiting for exascale implementations). Our R&D efforts began with characterizing I/O across a wide array of our customers’ HPC applications and codes to examine data patterns and build requirements for this new tier of fast data storage. While other vendors were focusing and purpose-building their buffers to solve checkpointing’s write-intensive, large file bandwidth challenges, what we discovered was that 90 percent of HPC application data was in fact less than 32KB in size. That these applications weren’t just in the business of writing, lots of reads also were also taking place. Additionally, there were application ensembles that read and process data from another application’s larger dataset that would also benefit from IME acceleration to achieve real-time insight. Bravely, the IME Team decided to go well beyond the burst buffer concept to create IME. This added a ton more complexity, requirements and work to do than merely seeking to cache and drain a big blob of sequential scratch data. Going beyond a mere burst buffer, IME became a new fast data storage tier – and the “storage tier” part created a lot of design implications:
-Protecting data in this new tier
-Supporting a wide array of applications
-Dealing with both large AND small I/O and then “draining it” into the PFS without contention of POSIX semantics
-Handling both writes AND reads
-Supporting existing APIs, so applications can run transparently within this new tier without modifications.
In addition, we also recognized two important things about our customer’s varied hardware environments and preferences:
-Their ability to support any compute or storage vendor¹s platform and both open and proprietary interconnects
-Their growing desire to utilize commodity hardware to reduce cost whenever possible.
After analyzing all of the requirements above, it became very clear that IME needed to be a software-based approach. Predominately, DDN is known as a hardware company – providing the biggest, “baddest”, fastest storage systems out there. But, in the last six years since we’ve moved away from FPGA silicon driving our 2001-08 era S2A® storage systems, we’ve developed a tremendous amount of software. Millions of lines of code on our SFA® Operating System, WOS® (our object storage software), DirectMonTM (our end-to-end storage management application) among a few. And, we’ve gotten really good at it and have big software development teams with bright minds coding all around the world.
IME is a software-defined storage application. Yes, I know this is a buzz word. But at the core, this term defines a simple construct where the abundance of intelligence lives in the application, so you can leverage off-the-shelf commodity components (or best-in-class specialty components) for the hardware bits of the solution. This is really what enables IME to do so much more than other burst buffers and why it can be used by more customers than the other guys’. So, going into SC14 I’m really feeling proud of this technology (to be released in mid-2015) and confident that we are well ahead of the other guys.
In addition to having more functionality and flexibility, IME has now been in the hands of over a dozen of the largest supercomputing centers since July. We’ve supplied them with testbeds, dedicated SEs and weekly code drops so they could evaluate IME in detail. This is a highly collaborative engagement, much deeper than a typical beta or a POC. From the testbed program, we are receiving: real world application performance benchmarks, stability testing at-supercomputing scale, compatibility testing across a wide array of compute/storage hardware, interconnects and various APIs. Some sites are developing plug-ins to integrate open source job schedulers, management tools, etc.
All of this investment means that with IME, users can have a greater level of confidence that this product has been tested and proven in the world’s most demanding environments. Many of the bugs and gotchas associated with gen 1 products will have been worked-out, and our SEs and Support Teams will have plenty of real-world time clocked on the software running on various systems to make both installation and troubleshooting quicker and more complete. Our software development process has become much more agile as a result of fixing various bugs in near real-time and bolting-on features that these environments have identified to make the IME easier to use and even more beneficial.
There’s still tons of work to do as we build the bridge to GA next year. More testbed sites, more to learn from them, more education and evangelization of this new enabling technology that completely changes the I/O provisioning game. It’s exciting and both myself and the IME Team are passionate about what this technology can do for our users. It’s really a standout product – and its solving problems today in our testbed sites.
If you have questions or simply want to talk application or workflow acceleration, hit me up mail to:email@example.com or your DDN sales rep firstname.lastname@example.org. We’d like to hear more about your challenges and answer any questions you have.