How to Build an AI Center of Excellence

View as PDF

How to Build an AI Center of Excellence to Accelerate AI Expertise

As they race to adopt artificial intelligence, organizations need to adopt a disciplined approach to selecting, justifying, and measuring the outcomes of AI projects. The wide range of tools and strategies that are available for implementing and integrating AI technology creates new skills and integration challenges, accentuating the need for organizations to select a standardized approach. 

Fortune Business Insights expects the AI market to grow more than 20% annually through 2029 to nearly $1.4 trillion. However, a recent Deloitte survey found that more than 50% of companies recognize that there are significant risks to their AI implementation initiatives, and less than half say that they have a sufficient level of skills for integrating AI technologies into their existing IT environment.

Organizations should take the opportunity to focus their AI initiatives around a common strategy to fulfill what is expected to be vast demand.

The specter of “shadow AI”

Lack of technology isn’t the problem. There are many free tools and resources to encourage experimentation and adoption of AI technologies, many of which are readily and inexpensively available on cloud platforms. While these “sandboxes” are an excellent way to familiarize people with AI’s potential with minimal risk, they become a problem when users begin moving prototypes into production without adequate data protection, security, privacy controls, and backup.

A sound AI strategy should lay out clear guidelines for evaluating prospective projects in a business context…

The whole organization can pay a price if redundant and incompatible AI tools are adopted in siloes, creating additional overhead needed to implement and support multiple systems.

A sound AI strategy should lay out clear guidelines for evaluating prospective projects in a business context, specifying time frames, putting metrics in place to measure success, and identifying approved tools and methodologies. Even organizations that are still early on the AI adoption curve can benefit from creating this structure around what will inevitably be an explosion of demand.

An AI Center of Excellence (CoE) can help to overcome the most common causes of AI failure:

  • Lack of clear business value.
  • Failure to account for organizational impact.
  • Goals that are beyond the capabilities of the organization to reach.
  • Inability to integrate with existing technologies and workflows.
  • Failure to evaluate projects using clear and understandable metrics rooted in relevance to the business.

Why AI needs its own CoE

The process of building, training, and deploying AI models is not like that used to create enterprise applications. For most of computing history, business software has operated with the same basic constructs: A set of prescribed calculations is applied to well-defined inputs to yield a predictable outcome. As long as outcomes don’t change, the application is assumed to be operating properly.

AI differs in some fundamental ways, however. For one thing, input types vary and are not always predictable. For example, an image recognition algorithm may be created to sort through a wide variety of image types to identify certain similarities or assign context. Results change over time as the learning model develops, data sources evolve, and the organization refines the AI model and objectives. These changes are expected and even desirable.

AI applications are also extremely data intensive. They typically ingest huge volumes of data of varying types that can include video, audio, images, and natural language, sometimes within the same application. These are needed to build initial deep learning models and to drive reinforcement learning that improves the quality of the model over time. In general, the more data the training model contains the better the results.

The data types used to train AI models are also different from the structured data used in production processing. AI models are trained on very large numbers of small files in datasets that can reach petabytes in scale. Conventional enterprise infrastructure is not designed for this kind of processing. Models that perform well ingesting small amounts of data can rapidly become overwhelmed as volumes grow. Entirely new approaches to ingesting, reading, and writing data are required.

Failure to account for these unique characteristics can hurt the organization in several ways, including:

  • Performance bottlenecks that drag down responsiveness for all users on shared infrastructure.
  • Models that can underperform because of an inability to process sufficient quantities of training data.
  • Organizations that may cut corners on the volume or quality of training data to achieve acceptable performance.
  • AI models that are improperly trained or supported may deliver results that are misleading or wrong, potentially leading to more serious consequences.

An overarching AI strategy is needed.

To achieve success with AI, organizations need to accept and internalize a few core principles. Success begins and ends with data. Biased or poor-quality data yields AI models that can’t be trusted, thus undermining confidence in the results. A disciplined approach to data preparation must be put in place that safeguards data integrity, security, privacy protection, and backup/recovery.

A reference architecture must be built that accounts for optimal processing, storage, and bandwidth. Processes and architecture should be designed to support growth and application to different use cases. Projects should be judged in the context of business value. Experimentation is acceptable, but appropriate rigor must be applied before experimental projects become part of the formal development process.

If the organization’s early AI initiatives are designed and built using the principles defined above, individual teams will be more likely to collaborate and leverage expertise from the AI Center of Excellence rather than develop their approaches in isolation.

An AI CoE is a centralized group or team that guides and oversees the implementation of organizationwide AI projects. It can encompass all types of AI projects, including machine learning, neural networks, image and speech processing, intelligent process design, robotic process automation, and other “hyperautomation” technologies.

The CoE should combine the specialized talent, knowledge, and resources required to enable AIbased projects to scale. Three core disciplines are needed:

  • DevOps is a commonly used methodology to innovate at the application and operational levels through rapid iteration and frequent releases.
  • MLOps is a collaborative function focused on streamlining the transitioning of machine learning models into production maintenance.
  • DataOps uses data engineering practices to optimize data flows between managers and consumers.

The CoE sets AI vision, goals, and metrics around tangible business outcomes such as improving customer experience or streamlining processes. It develops an easily understood system to track the progress and measure the benefits of AI initiatives. It also serves as an internal counsel to identify new opportunities for leveraging AI to solve various business problems and to evaluate new technologies and practices.

A CoE should seek the following outcomes:

  • Clearly define business goals, identify high-impact use cases, and set priorities for AI investments.
  • Define strategies for managing the teams that use AI and build AI models.
  • Define roles, data owners, and structures for AI-led innovation.
  • Reimagine workflows and roles.
  • Evaluate the organizational impact of AI projects and suggest accommodation strategies.
  • Define processes to sustain continual AI innovation, typically using agile development techniques.
  • Establish tactics for data reliability, collection, preparation, and storage that are consistent with the needs of the business.
  • Define the process for evaluating and adopting a consistent set of tools that match the organization’s needs, with reusability as a guiding principle.


There’s no question that the age of AI has arrived. Given the dramatic improvements that early adopters have seen in efficiency, organizational agility, and customer satisfaction, there is little doubt that AI use cases will continue to proliferate.

Even businesses that aren’t yet using AI extensively in production can get out in front of this trend by preparing an organizational and technology foundation that sets them up to thrive in the age of intelligent machines. Start putting the skills and infrastructure in place now to ride the wave when it reaches you. Learn more from DDN and NVIDIA.

This content was commissioned by NVIDIA and DDN and produced by TechTarget Inc.
The NVIDIA DGX™ A100 system features eight NVIDIA GPUs and two 2nd Gen AMD EPYC™ processors.

Related Resources

Evaluating Infrastructure Options for Enterprise Development
Although the promise and attraction of AI is well appreciated by IT and business decision-makers alike, it’s far less clear how organizations should begin their journey to enterprise-class AI.
View Resource
Accelerate Artificial Intelligence Initiatives with DDN and NVIDIA at Any Scale
Data now plays such a prominent role in determining business success that 98% of organizations surveyed by ESG are in some phase of data-driven digital transformation.
View Resource
Artificial Intelligence Success Guide
Maybe your applications seem to be running fine, but since when has “fine” been good enough? Instead of celebrating the new business value of AI, you find yourself resetting expectations.
View Resource