For those of you who missed my last blog post, our latest introduction, the Infinite Memory Engine™ or IME is a distributed buffer cache system designed to accelerate parallel file systems, eliminate locking issues inherent in traditional parallel file systems and enable organizations to harness the power of NVRAM to reduce storage expenditures by up to 90% vs. today’s performance architectures. Following on to our most-excellent product debut at SC’13, I wanted to put a finer point on the demo we provided around the IME that we had running live, in the flesh (or should I say “the metal”), in the DDN booth on the show floor.
As the leader in HPC storage, we are essentially firing the first shot in the Exascale era by announcing the future availability of the first commercial burst buffer system. We’ll write about it periodically because it is new and requires some added education. For those of you in the know, skip the prose and jump to the tables below to understand how we can accelerate file system metadata operations by 2,000X.
Full disclosure… I coined a phrase in a previous blog that I already really regret. The quote read as follows: In a scale-out world, you can throw enough hardware at any problem. At the time, I intended to showcase the benefits of 1TB/s of systems we were deploying at ORNL and DDN SW achieves a lot of performance with just a little bit of HW. In truth, if you can assume truly parallel processes – then this statement absolutely still holds up. However, what the above quote fails to recognize is that there are certain inexorably serial elements of parallel systems that will become acute scaling hurdles in the not-too-distant future.
At SC’13 in Denver, we assembled a prototype cluster running our IME software to show how nasty parallel writes can really be to a parallel file system. The tests we showed are why I’m writing this blog today and why I need to eat my words about “throwing HW at any problem.”
At our show booth, we had a small InfiniBand-connected cluster of SuperMicro servers each configured with a small population of server-local SSDs. In total, we had:
- 14 x x86 nodes with 56Gb HCAs
- about 80GB/s of bandwidth
- 98 x SSDs running node-local at 500MB/s each
- about 50GB/s of bandwidth
- configured to house both data & metadata
The conversation above happened all too often at the show. IME is SW first and foremost, and we’ll deploy it in your cluster topology where it makes most sense.
The Parallel I/O Grudge Match: File Locking Pressure & Metadata Scaling
On our SC’13 cluster, we performed parallel writes to a single file using two different block sizes. In the first case, we configured the test to use a 8MB block size (which is today possible with GPFS). In the second test we performed the same process but this time we formatted the system to write in 4KB increments. By doing this, we’re able to showcase metadata rates on a system 2,000 larger (8MB = 2000 x 4KB). The testing was done on the same exact HW so the comparison is completely apples-to-apples.
Here are the results – straight from the slides from our R&D team:
The above illustrates the computational issues with processing shared write requests to one file or one directory in a parallel file system. The metadata, in both cases, is all populated in SSD… the point here is that the media doesn’t help file system lock handling. Locking contention on a single file exhibits serialization within a parallel file system and the ordered processing of write requests that is exactly the problem we endeavored to solve by developing the Infinite Memory Engine.
More importantly, we can do this to an application without requiring our users or programmers to jump through unnatural hoops. No special libraries, no need to rewrite your applications. Imagine just using a file system as it was intended and letting the buffer both protect your data, interact with a file system and let you get back to using good ol’ POSIX! That’s IME in a nutshell.
Oh, and less we forget, the HW savings are also staggering. I’ve been doing work on understanding the IME impact on today’s architectures, and all I can say is that our customers would be paying a lot less if we were ready to release IME as a GA product today (sorry DDN sales guys!). I digress, though… will save this for another blog.
There’s lots of really exciting stuff going on at DDN. SC’13 is always a good excuse for me to wrap my mind around all of the compelling efforts we’ve got in flight and I love telling the story – if you’ve not gotten the update on our activities, Christmas is here early for you. In our 400+ customer meetings at the show, we did get the chance to videotape one of my presentations… see below and happy watching 🙂
If you’re short on time, you can also read the slides from my talk here on Slideshare.