ACCELERATING SAS ANALYTICS WITH PARALLEL FILE SYSTEMS
Successful commercial High Performance Computing firms that rely on SAS Analytics have been changing their infrastructures to take advantage of parallelism to deliver more accurate risk models by testing multiple scenarios against larger data sets in less time. By incorporating parallel file systems, organizations are also optimizing IT budgets by consolidating data silos, reducing data-center footprint, and minimizing licensing costs.
Leading SAS Grid-driven enterprises are also increasingly leveraging parallel file systems, like DDN GRIDScaler® with IBM® Spectrum Scale™ and DDN EXAScaler® with Lustre*, to deliver several important advantages over traditional direct attached file systems and network attached storage infrastructures. The primary advantages behind parallel storage are sustained high performance and the ability to easily scale upward to support ever larger workloads while minimizing data center footprint and optimizing IT budgets.
“With our SAS GRID solution (on DDN GRIDScaler) we achieved 452% faster workflows across the board—some applications were even higher. We also went from over 100 [SAS] servers to 16 for the entire enterprise.”—John Eubanks, Systems Engineer V, Fannie Mae
More Performance with Fewer Cores
SAS-GRID users who centralize their analytics workflows on DDN regularly see significant improvements in runtimes and a sizable reduction in the number of processor cores needed. For more information, read our whitepaper to see how parallelism and direct memory access enable faster and more accurate SAS analytics.
FEATURED USER VIDEO: FANNIE MAE
|Strengthening the US Housing Finance System Through Faster and More Accurate Risk Analytics
In this video from the DDN User Group at SC15, John Eubanks from Fannie Mae presents: Strengthening the US Housing Finance System Through Faster and More Accurate Risk Analytics.
“The GRIDScaler parallel file system is an excellent choice for SAS Grid deployments. IO intensive SAS Grid workloads have demonstrated excellent performance characteristics utilizing this storage appliance” —Cheryl Doninger, Senior Director, Research and Development, SAS
More than 50% of the largest oil and gas companies, 40% of the leading financial services companies, and 30% of the top aerospace and automotive companies have deployed DDN solutions to address performance, scalability, and TCO challenges in their organizations. Built from the ground, DDN’s GRIDScaler (IBM® Spectrum Scale™) and EXAScaler (Lustre* FS) parallel file system solutions are the next-generation analytics storage platforms that eliminate performance bottlenecks, simplify environment, and greatly increase return on investment.
- 4.5X improvement in end-to-end workflow performance
- 400% higher throughput per core
- More than 4x SAS Grid workflows executed on DDN solutions vs. competing solutions in the same amount of time
- More than 50% reduction in data center footprint
- Elimination of data silos and simplification of data management infrastructure
A formidable challenge for Big Data analytics is architecting a high-performance infrastructure to handle rapid and unpredictable data growth at a reasonable price point. Traditional NAS and SAN enterprise storage protocols (NFS, CIFS, iSCSI, FC) are designed more for traditional back-office applications and file-sharing repositories. These protocols are point-to-point and are not designed for concurrent access of data from multiple applications. Furthermore, the SAS Grid computing tool consolidates islands of analytics hardware, increasing the performance requirements for shared, backed storage beyond what traditional protocols can offer.
A parallel file system offers several advantages over a single, direct-attached file system or traditional network-attached storage. When DDN parallel file storage solutions are used in conjunction with SAS Analytics, the advantages include:
- A significant decrease in operational latency and high bandwidth data transfer, especially for data intensive and time-sensitive SAS Analytics workflows, by balancing content around multiple file system servers.
- Separation of data and metadata, enabling optimized performance and accessibility and delivering lower-latency file-system metadata access, thereby eliminating I/O bottlenecks.
- Linear scalability of bandwidth and I/O, which delivers significantly higher aggregate performance over traditional architectures that are extremely complex and expensive to scale up, often resulting in performance degradation.
- Parallelization of SAS Analytics in a single, shared namespace, allowing a user to treat any data-intensive workload independently from other data-intensive workloads because of efficient file-locking.
- Consolidation of data silos while delivering higher performance, scalability, and reliability architectures with no single point of failure, thereby allowing users to simplify data management, minimize datacenter footprint and licensing, and cut operations costs.