ACCELERATING SAS ANALYTICS WITH PARALLEL FILE SYSTEMS
A formidable challenge for Big Data analytics is architecting a high-performance infrastructure to handle rapid and unpredictable data growth at a reasonable price point. Traditional NAS and SAN enterprise storage protocols (NFS, CIFS, iSCSI, FC) are designed more for traditional back-office applications and file-sharing repositories. These protocols are point-to-point and are not designed for concurrent access of data from multiple applications. Furthermore, the SAS Grid computing tool consolidates islands of analytics hardware, increasing the performance requirements for shared, backed storage beyond what traditional protocols can offer.
Successful commercial High Performance Computing firms that rely on SAS Analytics have been changing their infrastructures to take advantage of parallelism to deliver more accurate risk models by testing multiple scenarios against larger data sets in less time. By incorporating parallel file systems, organizations are also optimizing IT budgets by consolidating data silos, reducing data-center footprint, and minimizing licensing costs.
Leading SAS Grid-driven enterprises are also increasingly leveraging parallel file systems, like IBM® Spectrum Scale™ and Lustre*, to deliver several important advantages over traditional direct attached file systems and network attached storage infrastructures. The primary advantages behind parallel storage are sustained high performance and the ability to easily scale upward to support ever larger workloads while minimizing data center footprint and optimizing IT budgets.
Advantages of Running SAS Analytics in Conjunction with DDN® Storage Solutions
More than 50% of the largest oil and gas companies, 40% of the leading financial services companies, and 30% of the top aerospace and automotive companies have deployed DDN solutions to address performance, scalability, and TCO challenges in their organizations.
Built from the ground, DDN’s GRIDScaler (IBM® Spectrum Scale™) and EXAScaler (Lustre* FS) parallel file system solutions are the next-generation analytics storage platforms that eliminate performance bottlenecks, simplify environment, and greatly increase return on investment.
A parallel file system offers several advantages over a single, direct-attached file system or traditional network-attached storage. When DDN parallel file storage solutions are used in conjunction with SAS Analytics, the advantages include:
- A significant decrease in operational latency and high bandwidth data transfer, especially for data intensive and time-sensitive SAS Analytics workflows, by balancing content around multiple file system servers.
- Separation of data and metadata, enabling optimized performance and accessibility and delivering lower-latency file-system metadata access, thereby eliminating I/O bottlenecks.
- Linear scalability of bandwidth and I/O, which delivers significantly higher aggregate performance over traditional architectures that are extremely complex and expensive to scale up, often resulting in performance degradation.
- Parallelization of SAS Analytics in a single, shared namespace, allowing a user to treat any data-intensive workload independently from other data-intensive workloads because of efficient file-locking.
- Consolidation of data silos while delivering higher performance, scalability, and reliability architectures with no single point of failure, thereby allowing users to simplify data management, minimize datacenter footprint and licensing, and cut operations costs.
- 452% improvement in end-to-end workflow performance
- 400% higher throughput per core
- More than 4x SAS Grid workflows executed on DDN solutions vs. competing solutions in the same amount of time
- More than 50% reduction in data center footprint
- Elimination of data silos and simplification of data management infrastructure
Fannie Mae: Strengthening the U.S. Housing Finance System Through Faster and More Accurate Risk Analytics by RichReport