Though some view technologies like Artificial Intelligence (AI) as still being too early for widespread adoption, the truth is AI and Machine Learning (ML) are already starting to make an impact across many aspects of our lives. From autonomous driving and image recognition, through to research breakthroughs in healthcare, drug discovery and climate projections, AI is already shaping our world.
Despite that, some of the initial promise of AI hasn’t exactly lived up to expectations. In addition, from a business perspective, many organizations have been struggling with getting their AI projects off the ground. In fact, according to research firm Gartner. It is often believed that 9 out of 10 companies who’ve started AI projects haven’t been able to achieve what they had originally hoped to do.
The reasons for these challenges are many, including insufficient planning, limited resources, insufficient source data and much more. In addition, many organizations don’t have the skill sets they need in house in order to properly manage the process.
Despite these challenges, companies continue to pursue AI-related projects because they recognize the enormous potential they offer. From time and cost savings to improvements in efficiency and accuracy to completely new types of analytical insights, there’s widespread recognition of the impressive value that a properly trained, appropriately fed AI model can offer, and to inspire others within the organisation to utilise AI to meet their own goals.
The key to success with AI is creating a clear, simple concept of what you want to achieve. Fundamentally that means thinking through the kind of data you have (or need) to input into your AI model and the kind of results that you need (or hope) to receive from its output. Of course, getting from point A (input) to point B (output) and attaining that goal requires a lot of effort along the way, but the basic concepts are straightforward.
Those principles of simplicity and balance extend to the technical requirements for AI as well. One of the key learnings from successful AI-related projects is that you need to start with a balanced system of computing, storage and network components. Most AI projects can be summarized as the creation of huge data matrices that are used to generate the probability of a specific output from a given input. It turns out creating those matrices—collectively known as an AI model—and running data through them to generate valuable output takes an enormous amount of compute power. In addition, for them to operate reliably and efficiently, there needs to be a consistent, high-speed source of good quality, relevant input data. The real trick is to get all these various components working together in a balanced way.
Think of it like a car. You can have the largest, most powerful engine in the world, but if you don’t feed it with the gas or electric power it needs at a given moment or if you pair it with a tiny set of wheels, you won’t get anywhere near the performance that a finely tuned sports car can provide. So it is with hardware systems that are intended to run large AI models. Yes, you need to have powerful compute engines, like today’s immensely capable GPUs and CPUs, but you also need high-speed interconnects and storage systems that can deliver the data at rates the computing engines demand.
Plus, it’s important to remember that building, training and retraining of AI models is essentially a never-ending iterative process. That means the need to maintain a balanced system is ongoing—not just an occasional occurrence for an initial training run. It’s also important to note that the way that AI models are built and run relies on a huge number of operations happening in parallel. This is the key reason why GPUs, which were originally designed for graphics and display applications, have become so essential for AI. The massively parallel architecture required to update all the pixels on a screen at the same time proved to be a perfect match for the parallel processing demands of creating and running AI models. As I’ll discuss more in subsequent chapters of this blog series, those parallel computing demands put some unique parallel data-feeding requirements on storage systems used in AI-focused systems as well.
Even with the right type of systems, you also need a large amount of the right kind of data if you want to generate useful output from an AI project. Exactly what kind of data is the best suited for a specific AI model and the types of parameters that need to be gleaned from that data are going to vary dramatically from project to project. Similarly, the amount of data required is likely to vary based on the complexity of a given model or its intended output, but the general consensus is the more, the merrier.
From a storage perspective, that typically entails very large capacity requirements. In many cases, it also implies different types of storage — some for the “hottest” data that needs to be loaded into a computing system’s memory and others for “colder” data that may be used on a more occasional basis.
In sum, it’s important to note that there are many interlinked assets that are required to have a successful and efficient AI model project—both on the data and hardware system side. Of note, feeding processors with large amounts of data at the appropriate speed, in particular, can make a big difference in the efficiency of creating and using AI models.
In the next blog post, I’ll dive into more of the details involved in creating a balanced system for AI and discuss how companies can use those principles to avoid the bottlenecks that often occur when defining and using AI models.