Salesforce Accelerates AI Model Training with Google Managed Lustre Powered by DDN

How Salesforce Research Cut AI Training Costs by 42% with Google Cloud Managed Lustre

Executive Summary

As AI becomes central to enterprise innovation, Salesforce is scaling some of the world’s most advanced models—where performance, accuracy, and speed are non-negotiable.

To eliminate critical data bottlenecks and maximize GPU efficiency, Salesforce deployed Google Cloud Managed Lustre, powered by DDN EXAScaler. The result: dramatically improved throughput, reduced latency, and significantly lower cost per training run—enabling faster, more efficient AI at scale.

The Challenge: Data Bottlenecks Limiting AI Performance

Salesforce was training large-scale models, including a Llama 3.1 (8B parameter) model, on high-performance GPU clusters. Despite best-in-class compute infrastructure, performance was constrained by the data layer.

Key challenges included:

  • I/O bottlenecks starving GPUs, with utilization dropping as low as 40%
  • High latency impacting training cycles and time-to-results
  • Fragmented storage architecture requiring constant tuning and management

In high-performance AI environments, underutilized GPUs translate directly into wasted investment and delayed innovation.

The Solution: Google Managed Lustre Powered by DDN EXAScaler

Salesforce deployed Google Cloud Managed Lustre, a fully managed parallel file system built on DDN EXAScaler, the industry’s leading high-performance AI data platform.

By integrating Lustre with its Vortex training cluster, Salesforce achieved:

  • Seamless deployment with minimal operational overhead
  • Massively parallel throughput to feed GPUs at scale
  • A fully managed environment eliminating infrastructure complexity

This approach aligns with DDN’s core principle: AI performance is determined by the data layer as much as the compute layer.

Results: Faster Training, Higher Efficiency, Lower Cost

With DDN-powered Google Managed Lustre, Salesforce transformed its AI training pipeline:

  • 75% reduction in I/O latency
  • 1.5× faster model training
  • 70% increase in GPU utilization
  • 42% reduction in overall training costs

By eliminating storage bottlenecks and fully saturating GPU pipelines, Salesforce unlocked the full value of its AI infrastructure investment.

Business Impact: From Infrastructure Management to AI Innovation

Beyond performance gains, the shift to a managed, high-performance data platform enabled Salesforce to refocus resources:

  • Reduced time spent on infrastructure tuning and troubleshooting
  • Increased developer productivity and faster iteration cycles
  • Scalable foundation for both training and inference workloads

As a result, Salesforce can now build, train, and deploy AI models faster—without being constrained by data access limitations.

Why It Matters

Modern AI factories demand tight integration between compute and data. Even the most advanced GPUs cannot deliver value if they are waiting on data.

With DDN EXAScaler powering Google Managed Lustre, organizations can:

  • Maximize GPU utilization and ROI
  • Accelerate time-to-model and time-to-insight
  • Reduce cost per token and per training cycle
  • Scale seamlessly from experimentation to production

Conclusion

For Salesforce, trust in AI starts with performance. By leveraging DDN’s data intelligence platform through Google Cloud Managed Lustre, the company eliminated a critical bottleneck—turning its AI infrastructure into a high-efficiency, production-ready engine.