Architect’s Guide To Agentic AI

By Moiz Kohari, VP of Enterprise & Cloud Programs

Philosophy, physics and the field of AI are so deeply interconnected that if one tries to separate them, one may miss the most fundamental lessons hidden in plain sight. Brian Greene, one of my heroes, writes “Reality is not what it seems; it is shaped by hidden dimensions and fundamental forces we have yet to fully comprehend”. String theory suggests that reality consists of multiple layers of interconnected dimensions beyond our immediate perception. Just as string theory unifies disparate elements of physics, AI unifies vast amounts of data to derive meaning and intelligence. Architects of Agentic AI should consider:

Multidimensional Thinking – integrating data across multiple sources, contexts and scales.
Hidden Variables – AI operates on weights and biases that may be unseen but influence outcomes profoundly.
Continuous Discovery – unify fragmented knowledge into coherent intelligence.

The resulting AI systems that we help to create must account for ethical usage, leveraging reinforcement learning and recommendation engines. AI models optimize experiences by curating results, making predictions and tailoring responses – as if the universe of data and algorithms is conspiring to help humans achieve goals. You may remember Paulo Coelho’s famous book The Alchemist, where he reflects “When you want something, all the universe conspires in helping you to achieve it”.

Seeing the Forest for the Trees

Architecting an Agentic AI system requires some level of detail regarding the technologies you use to drive results. The technological details presented reflect my personal preferences, rather than an invitation for ideological debate. I recently saw an extremely intelligent Microsoft engineer debating why using a parallel file system for AI training was a useless idea; I would respectfully suggest that I can find many examples where it may be appropriate. The ideal architecture should take into account that interfacing with many different types of environments may be necessary and certain types of challenges may require tools that may seem unorthodox. Suffice it to say, massive amounts of data drive AI systems. This is how you knock on the doorstep of knowledge. Massive amounts of data may reside on storage architectures that may be modern or slightly dated; if we can leverage these storage systems as is, then we save expensive and time consuming replication. For example, to replicate an exabyte of data using the fastest networking technologies of today (400 Gigabit Ethernet) would take approximately 7.7 months.

Highly sophisticated models are trained on corpora that are far larger than the total model size. You do not store this training data in the model itself, nor do you load the entire dataset in GPU/CPU memory at once. Instead, you stream training batches from your storage. The data is typically stored in distributed file systems, object stores or large disk arrays. The AI model reads this data in chunks into GPU/CPU memory as the training loop proceeds, processes each chunk and update parameters, then moves on to the next chunk. Large training clusters rely on high-throughput pipelines to feed new data. Maybe this is why Jensen Huang, Nvidia’s CEO expressed his preference: “we’re going to build faster and faster brains, and faster and faster computers but those fast computers can only learn if the storage system that DDN creates is able to provide fuel to that brain”. Jensen is referring to the high bandwidth DDN storage, but I digress.

Having a large dataset helps the model learn richer patterns, I have seen clients leverage multiple exabytes of data. The model’s parameter count (you may hear numbers like 1.8 trillion parameters) determines how much memory is needed at once to store the learned weights. Obviously this is a much smaller than the data corpora, which will probably continue to be utilized each time a new model is being trained. However, LLM inference (when you ask a question of your favorite LLM) doesn’t typically retrieve the training corpus. Model weights hold the learned knowledge; these are the statistical patterns the LLM has generated from the huge data corpus during training.

Core Properties of Agentic AI

Agentic AI systems are goal-oriented, autonomous systems, that can make decisions and take actions. These are different from LLMs that typically respond to direct queries. An AI agent will usually exhibit:

Autonomy and goal-directed behavior, where continuous human oversight is not necessary and the AI can set its own goals in pursuit of a defined objective.
The system will have the ability to monitor its environment and adapt based on feedback from the environment.
The system will have self-reflection capability and employ reasoning engines where it may simulate consequences before selecting an action.
Agentic systems will maintain memory that persists across interactions and may learn from experience.

High Level Architecture:

An Agentic AI system should be architected such that it allows for optimized storage and retrieval of information at a massive scale. The AI training requires scalable long-term storage and efficient similarity search. I am a fan of integrating POSIX-based file systems for real-time access for high performance AI workloads alongside object storage for hybrid cloud deployments. However, given the massive scale of information that is potentially dispersed across multiple geographies, one of the most basic components of this architecture is a vector database. A vector database helps convert raw context into searchable embeddings (more on this in a second). As information flows into this environment it is critical to perform (what may be known as an ETL process) on the information and create metadata that provides clues about the data that is being stored into long-term storage. A high level architecture for such a deployment may look as follows:

Vector Databases – a Critical Component

Vector databases are a crucial part of any Agentic AI system. AI agents need to recall past interactions, plans or reasoning steps to act coherently over time. Vector databases enable very fast retrieval of information via something known as embeddings. Embeddings are generated as data is persisted in longterm storage (e.g. PDF, image, audio, video..). This requires a pre-processing pipeline based on the file type. Imagine the following examples:

A text based file (such as doctor’s notes) will require Natural Language Processing (NLP) models like “text-embedding-ada-002” from OpenAI (or others) to extract textual embeddings.
Image files (such as MRIs or X-rays) may use Vision Transformers (ViT) or CNN-based models to extract visual embeddings.
Audio files like conversations can use speech-to-text models like Whisper or DeepSpeech to extract embeddings.
Video files will need to extract image embeddings.

Once the embeddings are stored in a vector database, the embedding is connected to the parent file (in S3 or POSIX namespace) by using a metadata store (think MongoDB).

When a new query is encountered and information retrieval is initiated, the query itself is vectorized and using similarity search the AI system is able to quickly retrieve the most relevant information. For example, if the query is regarding a “knee MRI with MCL tear”:

The query is converted into an embedding
The vector database retrieves the most similar embeddings
The embedding enable us to retrieve the object storage URL from the meta store
The full file can then be fetched from the S3 object store

Flexibility of the Back-end (POSIX/S3)

For the sake of simplicity people may choose to go with one approach over the other. Considerations are usually based on low-latency (usually POSIX based) or Object Storage (HTTP(S) API). One thing to consider is that POSIX usually provides strict consistency and many object stores operate on eventual consistency (MinIO is the known exception, providing strict consistency). If possible, I would opt for a Hybrid approach that will provide your applications the highest amount of flexibility in the future.

I know of a few current projects that are trying to integrate these two namespaces in a way such that operations can be done on the files with either protocol, or to find a path to the appropriate data storage.

Decision and Reasoning Engine

The nerve center of this design is the core intelligence layer, where autonomous actions are planned and executed. The decision and reasoning engine is responsible for making real-time decisions based on retrieved knowledge via interactions with the vector database, metadata store and object storage. A typical reasoning engine may use an LLM (GPT4, Llama, etc) for natural language reasoning, multi-modal models for analysis of objects, and incorporate logic based rules or reinforcement learning. The AI control loop may look like the following:

Sense (Ingest data or query)
Retrieve (the storage life cycle above)
Reason (LLM, AI models)
Act (generate insights or trigger workloads)

Ethical Considerations

The properties above behoove us to proceed with caution to avoid unintended consequences if the AI’s constraints and error handling are not completely defined.

UNESCO has a global AI ethics and governance observatory trying to develop ethical guardrails. You can read more about this here:

https://www.unesco.org/en/artificial-intelligence/recommendation-ethics

The World Economic Forum also has been collaborating to ensure AI adoption addresses challenges arising from AI developments. The link is below:

https://initiatives.weforum.org/ai-governance-alliance/home

However, currently there is no single, comprehensive worldwide AI regulation and compliance program. Having worked through the challenges of creating a data privacy startup that operated in an environment where lack of governmental enforcement created a huge vacuum, I think it is in all of our interest to continue to push on this subject.

Last Updated

Jun 11, 2025 6:34 AM