The Anatomy of AI Agency: Perception, Reflection, Action, Memory
AI Agents are here. More than a chatbot, an AI Agent can perform tasks for us on its own, but there is a lot of variation in the definitions and capabilities promoted by companies in the space. Taking a step beyond automation, agency is a concept worth exploring for builders and buyers alike. What makes a system agentic? How can we distinguish between different levels of agency? How much agency is actually needed for a given use case?
I’ll answer these questions by proposing four core agentic competencies; perception, reflection, action, and memory. With these definitions I’ll outline a framework for evaluating the level of agency in an AI system.
The PRAM Framework
A note on alignment: Primary and Secondary Agency
I’ll quickly address the high-level concern of alignment by dividing agency into two top-level categories: primary agency, which acts for itself, and secondary agency, which acts on behalf of another.
Let’s assume we, as human beings with awareness of our physical bodies, all have primary agency at our core. That puts some animals in this bucket as well. I’ll add certain organizations like companies and countries to the first category, while leaving the nuances of organizational agency for another time.
Let’s agree that machines, anything we build, should remain in the second category. We can achieve AGI while keeping it in the second category, working for us. Secondary agency then necessarily includes an external source of motivation, which I’ll call a User. A machine can have vast knowledge and perform tasks better than a human without being intrinsically motivated to act. It can maintain a boundary of self vs. other, and even have records of its previous states, without having primary agency. It can even produce extremely convincing language that will have you believing it is more agentic than you, without actually having primary agency itself (perhaps an artifact of our fascination with Terminator scenarios having permeated its training data). I welcome your messages discussing the finer points of this position if you feel compelled.
Building Blocks of Secondary Agency
Since we are clear on the nature of the agency we seek to imbue in our machines, let’s look at the components required. The basic core of any Agent can be loosely called a generative process, which in practice is an LLM. Agents are built on top of LLMs and we can more or less take this for granted. An LLM alone though is not quite agentic. It can receive an input and decide what to produce as an output, but it has no ability to engage with the world on its User’s behalf. Basic LLMs notably lack persistence, every new context is a blank slate. This can be seen by some as a feature rather than a shortcoming, but we need more from our digital representatives. It also lacks self-critique, the type of behaviour that catches mistakes and refines decisions, a standard we often expect from those we ask to act on our behalf.
Perception
Perception is input– the most basic capability that forms the foundation of agency. Perception implies a boundary between self and environment, with information flowing across it from the environment into the self. Basic LLMs perceive the world through text, and multimodal LLMs have the senses of sight and hearing as well. Agents are built on LLMs and inherit those senses, though their perceptory modalities don’t map neatly to ours, owing to the differences between the physical and digital environments.
We can think about Agent perception in three categories: User input, static knowledge, and data feeds. They each can take the form of text, imagery, or audio, but these data types are somewhat beside the current point. The important distinctions in Agent perception are the data source, and how it is ingested. User input is privileged data pushed in, and crucial to maintain alignment. Most workflows today require users to continuously provide feedback and correction during a task. We are beginning to see Agents that can operate without this in the moment feedback, but they still require very specific initial instructions. One can imagine an Agent running without any User input, and someone will build it (already has?), but for the moment I’m not sure they should.
Static knowledge is information pulled from the environment, popularized as RAG (retrieval augmented generation). Static knowledge is probably either internet search, file systems such as a code base or knowledge base, or structured databases like a vector store or typical relational database. This data might be included in the base LLMs original training data or not, but what characterizes it as perception is the process of pulling pieces of information from outside of the Agent’s reasoning processes, as defined by those processes.
Data feeds as perceptory apparatuses are somewhat more interesting, bearing resemblance to the types of perception we are used to. They might be real time, near real time, or batched information, with the distinction that the information is pushed into the Agent. The examples of data streams are close to endless, I think you can already imagine at least three. Given my current work I’m most excited about streaming financial data, social content, and all manner of sensor data.
Streaming data into an LLM presents a challenge and a new horizon for Agent builders. First, there’s the question of cost. Fortunately, recent improvements in inference unit economics may have already addressed this challenge, making it more affordable to run nearly continuous generation. However LLMs are still inherently transactional in nature; input then output, not both at once. This is an interesting engineering problem for someone to solve (likely with buffers, etc). Then there’s context management. Even the longest context windows would be exhausted by a continuous data feed, so some pruning and summarization will be essential. This now implies multiple LLMs working in a perceptory hierarchy, which again is something that only just recently became affordable to for the typical Agent builder. Much opportunity to build here.
Reflection
Reflection enables self-monitoring, error correction, and iterative refinement. It is internal processing of tokens, meant to improve coherence and enhance outputs. It happens at inference time, when a trained model is processing its inputs. The original LLMs have no reflective capacity, they simply produce a response that is directly based on the input. Current reasoning models like OpenAI's o1 and DeepSeek's R1 use self-prompting, looping over their own outputs to refine the final response. This reflective capacity allows agents to catch mistakes, adjust strategies, and optimize their output in the moment.
Contemporary LLM reasoning works through a process called chain-of-thought (CoT), honed through reinforcement learning. This approach allows the models to break down complex problems into steps, reflecting on each step before moving to the next. The process typically involves dissecting a query into manageable components before working through these components in sequence. As it progresses, the model generates intermediate thoughts or steps, each building upon the previous ones, while attempting to maintain logical consistency.
Reflective capabilities contribute to increased accuracy, perhaps even allowing smaller models to rival larger ones that do not have reflective procedures. By critically examining their own outputs, reflective agents can discover innovative approaches to problems that weren't explicitly programmed, leading to novel problem-solving techniques. This self-observation mechanism allows the models to operate more independently, requiring less human intervention for performance tuning and strategy adjustment.
This is the area with the most potential for future advancements. As reflective capacities begin integrating with memory and external environments, we’ll see ever more complex behaviours emerge. However, the integration of reflection into agent architectures presents both challenges and opportunities. Balancing the computational overhead of continuous self-monitoring with the benefits of enhanced decision-making will require careful design choices by Agent builders, tailored to the task at hand.
Action
The capacity for autonomous action lies at the heart of agentic potential. Agents must move beyond passive analysis to actively influence their environment in pursuit of our objectives. Tool use enables agents to extend their cognitive capabilities and shape their surroundings. The spectrum of potential actions available to AI agents is vast, encompassing digital and physical domains. In software contexts, agents may manipulate code, interact with APIs, or navigate complex information systems.
Critically, the design of an agent's action space must align with its intended purpose and operating context. Overly constrained actions may limit an agent's ability to adapt and innovate, while too much ability to act raises risks of unintended consequences. Striking the right balance between flexibility and control is a central challenge in action space engineering.
The coupling between an agent's action and perception capabilities plays a vital role in its overall effectiveness. Agents must be able to perceive the consequences of their actions and adjust their behavior accordingly. An ideal perception-action loop enables adaptive learning, ultimately driving toward desired outcomes.
Memory
Memory is the crucial persistence layer of agentic systems, allowing agents to learn from experience, build rich world models, and pursue long-term objectives. It is also the cumulative result of a foundation model’s training, the traces of all the feedback and guidance it has received. The interplay between different memory systems - working, episodic, semantic, and procedural - shapes an agent's cognitive capabilities and behavioral patterns.
Working memory, often corresponding to an LLM's active context, enables agents to hold and manipulate information in real-time. The capacity and management of this immediate memory buffer play a critical role in an agent's ability to reason, plan, and problem-solve. I see a lot of differentiation emerging and perhaps potential for moats to be built in the management of working memory. Any agent will always have a finite working memory, and there exists some optimal compression and selection policy to fight the right information in working memory at any given moment. I know it’s popular to propose new scaling laws right now, so here’s one to chew on: As the complexity of an agent's environment increases, the effectiveness of its decision-making depends on the efficiency of strategy for managing working memory.
Episodic memory, by contrast, encodes an agent's autobiographical history - a record of its unique experiences and interactions. Often implemented using vector databases or similar retrieval mechanisms, episodic memory allows agents to learn from past successes and failures, and to adapt their strategies based on accumulated knowledge (Shinn et al., 2023).
Semantic memory stores an agent's factual knowledge about the world, including concepts, relationships, and abstractions. This memory system enables agents to reason about their environment, draw inferences, and communicate effectively with users. The structure and organization of semantic memory - whether through knowledge graphs, ontologies, or other representational frameworks - shapes an agent's understanding of its domain (Modarressi et al., 2023). Semantic memory is distinct from perceived static knowledge in that it has been processed by the agent. It is information that has been transformed by experience and synthesis to form a concept vector that is very likely unique to the Agent.
Procedural memory encodes an agent's skills, routines, and action sequences. It is implicit and enables agents to perform tasks efficiently and automatically, without explicit reasoning or recall. Procedural memory in an Agent is in the weights, the result of all pre-training, post-training, and fine tuning. Even the connective code that forms the Agent’s runtime, the pieces that developers today are quickly building out, can be considered procedural memory.
The integration and coordination of these memory systems pose significant architectural challenges. Agents must be able to efficiently store, retrieve, and update memories across different timescales and levels of abstraction. Moreover, the ability to generalize from past experiences, to transfer knowledge across domains, and to selectively forget irrelevant or outdated information are critical for long-term learning and adaptability.
Simplicity is Effective
While the capabilities outlined above - perception, reflection, action, and memory - are essential components of fully autonomous, agentic behaviour, it is important to recognize that not every agent requires all of these capabilities in equal measure. The design of effective agents often involves strategic tradeoffs and a focus on simplicity over unnecessary complexity.
In Building Effective Agents, Anthropic proposes a guiding principle for agent design, which suggests the simplest agent capable of successfully completing a given task is likely to be the most effective. Overengineering agents with extraneous capabilities can lead to inefficiency, brittleness, and unintended consequences.
In practice, this means carefully considering the specific requirements and constraints of a given application domain, and tailoring an agent's architecture accordingly. For some tasks, a simple reactive agent with minimal memory and reflection may suffice. Other domains may demand more sophisticated cognitive capabilities, such as long-term planning of action sequences or counterfactual reasoning.
The key is to identify the minimum viable agent configuration that can reliably achieve the desired objectives. This focus on simplicity not only streamlines development and deployment, but also enhances interpretability, and robustness. To achieve this simplicityI’ve developed the four core agentic competencies into a precise framework for outlining Agent requirements.
The PRAM Framework
The PRAM framework breaks down each of the four agentic competencies into levels of complexity that allow us to be more specific about requirements for a task. These levels will certainly evolve as the technology grows, but here’s a useful starting point. Many of the higher levels reach beyond what’s possible today, but can be conceived of based on current tech and development trends. We will see many papers and whole categories of companies emerge to tackle these future problems, but for now they will be aspirational bullet points in this framework.
PERCEPTION (Input Data)
P1: Only text messages from a User
P2: Web search, or RAG
P3: Continuous data feeds
REFLECTION (Internal Processing)
R1: Generation only
R2: Self-prompting; Chain of Thought reasoning
R3: CoT + retrieval from environment
R4: Continuous self-prompting with tight integration to long-term memory; persistent thoughts
R5: Continuous self-modification (updating internal weights) based on live feedback
ACTION (Output Capability)
A1: Respond to user, generate tokens
A2: Binary actions that affect environment (do X or don't do X)
A3: Parameterized actions that affect environment (do X with variations)
A4: Actions conditional on future inputs
A5: Sequences of conditional actions (planning)
MEMORY (Information Retention)
M1: Working memory only
M2: Storage of previous working memory states
M3: Storage of curated, processed knowledge to semantic memory
M4: Internal management of working memory
This framework allows precise characterization of any agentic system by understanding the level of each core capability. When building agents to solve business problems, this framework gives us a tool to define clear technical requirements in a field that is changing every week.
Before building any Agent, it’s worth considering which capabilities are necessary. It’s easy to imagine a solution that is more complex than it is feasible. To set expectations, consider some widely known AI products from OpenAI. ChatGPT is level 1 in each competency. The new Deep Research is arguably the most advanced publicly available agent, which to my knowledge achieves a level 2 in Perception, 3 in Reflection, and 1 in both Action and Memory. In the coming months we’ll see Agents reaching levels 2 and three in all competencies. There are already papers coming out beginning to address level 4 competencies. As these abilities begin to interact we’ll see powerful new forms of behvior emerge. It’s an exciting time to build.
Conclusion
AI agents hold tremendous promise for transforming the way we live, work, and interact with technology. As we build out the systems that will drive tomorrow’s economic activity, it's crucial that we develop a shared conceptual framework for understanding and designing agentic systems. The PRAM framework presented in this article offers a step in that direction.
Are you evaluating Agents for use cases in your business? Maybe you are already using LLMs extensively and seeking to extend their capabilities? Please feel free to reach out to me. I am applying this framework to build systems that solve real problems, and I’d be happy to help.
Written with help from Claude 3.5 Sonnet, Claude Opus, Perplexity, and DeepSeek R1.