As engineers, scientists, and technologists in the field of High Performance Computing (HPC), our domain expertise lies in the simulation and modelling of our environment, be that nanoscale technology, the cosmos, financial markets, or social behaviors. That should mean that we have an extra leg-up when it comes to seeing what’s coming down the line in our own industry…but, as Neils Bohr said, “prediction is very difficult, especially if it’s about the future.” Regarding storage technology futures, the smart money has been riding on non-volatile memory to deliver the step-change required for the coming decades.
The past five years or so have been notable for the introduction of NAND flash. Other technologies have been present too, albeit in more specific niches. NOR flash, for example, is as mature as NAND flash. NOR logic delivers high random access rates for both read and write and, like DRAM, also allows the retrieval of just a single byte of memory. This byte-addressability is an important property because of dramatic performance improvements that avoid multiple machine instructions, where one can suffice, to read or write an I/O. NAND devices, in contrast, only allow writes to large pages of flash rather than individual bits or bytes but benefit from their far lower cost.
The memory industry has been researching devices that hold information, not as magnetic or electrical charges, but by using other physical properties. Such approaches allow them to escape the fundamental issue of electronic degradation in flash cells. The most promising method stores data as a structural alteration in atomic positions within the memory cell. This “phase-change” memory device combines both byte-addressability with an economically viable fabrication process. 2017 sees the introduction of 3D XPoint (pronounced cross point) technology by Intel®/Micron to the storage marketplace.
Along with the increasing NAND capacities and the emergence of these ultra-low latency, byte-addressable NVM, we also have the timely uptake of the NVM Express (NVMe) standard. NVMe allows the low-latency potential of Non-Volatile memory devices to be realized across fabrics, removing the prior limitations imposed by SCSI layers. Together, these device and protocol advances place the remaining bottleneck to applications squarely at the I/O software layer and parallel file systems. These file systems were developed when the underlying device latency was in the millisecond range when a thick layer of software incurring millisecond-order latencies was perfectly acceptable. Take that same layer and introduce a flash backend and the fast file system becomes an IOPS barrier between application and storage media.
DDN’s initiative to bridge this chasm between application and ultra-low latency storage began from scratch in 2012. The project that was to become Infinite Memory Engine® (IME®) was designed fundamentally to address the sub-microsecond latencies expected by 2020+ and, unlike classic all flash arrays, to do so at supercomputer scale. Today, IME is scale-out, flash-native, software-defined, storage cache that integrates with parallel file systems to support the most demanding of I/O workloads. IME interfaces directly with applications and secures I/O into an array of NVM servers via a data path that eliminates file system bottlenecks.
IME’s ground-up implementation also allowed DDN® to address many other shortcomings of the parallel file systems. IME is “write-anywhere,” allowing clients to change their data transmission rates to servers depending on load. This prevents the classic “Amdahl’s law” drawback of parallel file systems whereby individual slow-performing storage devices and servers can impact the whole application workload. IME is also flash-optimized. This brings benefits in SSD management, delivering consistent performance and longer lifetime for NAND flash. Rebuilds are highly declustered, resulting in complete, large-capacity data rebuilds in a few minutes.
IME is a scale-out cache that sits in front of parallel file-systems. As such, it introduces flash-cache economics: system architects can reduce both capital and recurring (power/cooling/footprint) spend through the decoupling of performance and capacity, using IME to deliver on IOPS and throughput targets, and a backing parallel file system with large capacity drives to meet storage volume requirements.
The neatest bit is IME’s ability to solve the broadest spectrum of I/O problems. Managing the saddle distribution of I/O sizes in HPC is a difficult problem for file systems, and new application methods such as multi-scale physics, adaptive mesh refinement, and ever more complex workloads are adding to the tougher components of I/O workloads. The rapidly developing field of supercomputer-scale analytics and machine-learning exacerbates the problem, both by introducing tough read workloads and by much greater concurrency (number of threads), since they typically take advantage of many core, often heterogeneous, compute environments. Now the I/O is characterized not by an ideal, large I/O, sequential access, but rather a complex mixture of large, small, random, unaligned, high-concurrency I/O in read and write workloads which require both streaming performance and high IOPs. HPC file systems have exceled at gaining the maximum large I/O throughput from each HDD, but small I/O management has been very limited. IME can support reads and writes with I/O sizes ranging from large to tiny I/Os of 4K with the same blistering performance.