Executive Summary
As AI becomes central to enterprise innovation, Salesforce is scaling some of the world’s most advanced models—where performance, accuracy, and speed are non-negotiable.
To eliminate critical data bottlenecks and maximize GPU efficiency, Salesforce deployed Google Cloud Managed Lustre, powered by DDN EXAScaler. The result: dramatically improved throughput, reduced latency, and significantly lower cost per training run—enabling faster, more efficient AI at scale.
The Challenge: Data Bottlenecks Limiting AI Performance
Salesforce was training large-scale models, including a Llama 3.1 (8B parameter) model, on high-performance GPU clusters. Despite best-in-class compute infrastructure, performance was constrained by the data layer.
Key challenges included:
- I/O bottlenecks starving GPUs, with utilization dropping as low as 40%
- High latency impacting training cycles and time-to-results
- Fragmented storage architecture requiring constant tuning and management
In high-performance AI environments, underutilized GPUs translate directly into wasted investment and delayed innovation.
The Solution: Google Managed Lustre Powered by DDN EXAScaler
Salesforce deployed Google Cloud Managed Lustre, a fully managed parallel file system built on DDN EXAScaler, the industry’s leading high-performance AI data platform.
By integrating Lustre with its Vortex training cluster, Salesforce achieved:
- Seamless deployment with minimal operational overhead
- Massively parallel throughput to feed GPUs at scale
- A fully managed environment eliminating infrastructure complexity
This approach aligns with DDN’s core principle: AI performance is determined by the data layer as much as the compute layer.
Results: Faster Training, Higher Efficiency, Lower Cost
With DDN-powered Google Managed Lustre, Salesforce transformed its AI training pipeline:
- 75% reduction in I/O latency
- 1.5× faster model training
- 70% increase in GPU utilization
- 42% reduction in overall training costs
By eliminating storage bottlenecks and fully saturating GPU pipelines, Salesforce unlocked the full value of its AI infrastructure investment.
Business Impact: From Infrastructure Management to AI Innovation
Beyond performance gains, the shift to a managed, high-performance data platform enabled Salesforce to refocus resources:
- Reduced time spent on infrastructure tuning and troubleshooting
- Increased developer productivity and faster iteration cycles
- Scalable foundation for both training and inference workloads
As a result, Salesforce can now build, train, and deploy AI models faster—without being constrained by data access limitations.
Why It Matters
Modern AI factories demand tight integration between compute and data. Even the most advanced GPUs cannot deliver value if they are waiting on data.
With DDN EXAScaler powering Google Managed Lustre, organizations can:
- Maximize GPU utilization and ROI
- Accelerate time-to-model and time-to-insight
- Reduce cost per token and per training cycle
- Scale seamlessly from experimentation to production
Conclusion
For Salesforce, trust in AI starts with performance. By leveraging DDN’s data intelligence platform through Google Cloud Managed Lustre, the company eliminated a critical bottleneck—turning its AI infrastructure into a high-efficiency, production-ready engine.