LLM Agents, Part 2 - What the Heck are Agents, anyway?
An Intelligent Agent (IA) is an autonomous entity that observes and acts upon an environment to achieve specific goals, ranging from simple systems, such as thermostats to complex AI systems.
An Intelligent Agent (IA) is an autonomous entity that observes and acts upon an environment to achieve specific goals. These agents can range from simple systems, such as thermostats or basic control mechanisms, to highly complex AI-powered systems. The exact definitions and the thresholds necessary to attribute agency to a system are up for debate and can only be contextually discussed. However, most IAs possess some or all of the following key properties:
Autonomous operations
Reactive to the environment
Proactive (goal-directed)
Interactive with other agents (via the environment)
To better understand how our understanding of the concept has evolved, the current state, and the potential future of IAs, it's essential to trace their history and examine the key milestones that have shaped the field. But first…
Why should I care?
Are agents yet another hype that will soon die out? Or are they the next major platform shift?
Why are we excited about LLM Agents?
The emergence of LLM agents has sparked considerable excitement in the AI community. Their ability to comprehend and generate coherent text, undertake complex tasks, and exhibit autonomous behavior has opened up a wide array of possibilities. One of the key factors contributing to this excitement is the potential of LLM agents to serve as planning modules for autonomous agents.
Open-source LLMs have also reached a point where they can effectively drive agentic workflows. For example, the integration of LLMs into systems where they can call tools has further enhanced their capabilities, allowing them to perform more complex and diverse tasks.
This growth has led to a significant rise in both the development and adoption of LLMs as core components of autonomous agents. The excitement surrounding them is further fueled by their potential to function as artificial general intelligence systems (AGI), capable of performing a wide range of tasks with human-like proficiency. However, it is important to note that there are still significant challenges to be addressed before LLM agents can truly achieve such advanced capabilities.
I am working on use case X, should I really care about LLM agents?
No, if:
Your task is well-defined and specific.
Needs a single function like grammar checking, text summarization, or code generation.
Doesn't require remembering past interactions or context.
Operates solely on the information provided in the current prompt.
Examples:
Highlighting grammatical errors in a document.
Creating a concise summary of a lengthy article.
Translating a simple sentence from one language to another.
Generating different creative text formats based on a single prompt (e.g., poems, scripts).
Yes, if:
Your task is more complex and involves multiple steps.
Needs the ability to remember past interactions and context.
Benefits from accessing and interacting with external tools or resources.
Requires a level of autonomy in completing the task.
Examples:
A virtual assistant that manages your schedule, checks weather data, and books appointments.
A system that analyzes customer reviews and recommends product improvements.
A chatbot that can answer complex questions by searching the web and integrating information from different sources.
A content creation tool that understands your previous creative decisions and generates content that aligns with your overall vision.
Brief History of Intelligent Agents
The concept of Intelligent Agents has evolved alongside the development of Artificial Intelligence (AI), with its roots dating back to the 1950s. Let's take a look at a brief history of intelligent agents and how they have progressed over time.
1950s and before: The Dawn of AI
Turing Machine (1936): Though not an agent, Alan Turing's theoretical model provided a foundation for defining computation and intelligence.
Turing Test (1950): Proposed by Alan Turing, this test established a benchmark for a machine's ability to exhibit human-level intelligence.
These early concepts laid the groundwork for the development of autonomous agents in the following decade.
1960s: The Rise of Autonomous Agents
ELIZA: A natural language processing program created by Joseph Weizenbaum in the 1960s, was one of the earliest intelligent agents capable of simulating a psychotherapist through natural language conversations.
General Problem Solver (GPS): Developed by Herbert Simon, J.C. Shaw, and Allen Newell in the late 1950s, was an early intelligent agent system that could solve problems by searching through a space of possible solutions, laying the foundation for future problem-solving agents.
SHRDLU: Developed by Terry Winograd, SHRDLU demonstrated rudimentary natural language processing capabilities to solve tasks in a simulated block world.
Building on these early successes, the 1970s and 1980s saw intelligent agents finding applications in specialized domains.
1970s-1980s: Growth and Specialization
MYCIN: An early expert system designed for medical diagnosis, MYCIN showcased the potential of knowledge-based systems in specialized domains.
Shakey the Robot (1970s): A mobile robot from SRI International, Shakey pioneered basic navigation and manipulation tasks in a controlled environment.
As AI technology advanced, the 1990s and 2000s witnessed the rise of intelligent agents in more practical and everyday applications.
1990s-2000s: The Rise of Practical Applications
Deep Blue (1997): IBM's Deep Blue, a chess-playing computer, defeated chess grandmaster Garry Kasparov, demonstrating AI's potential for complex decision-making.
Roomba Vacuum Cleaner (2002): The Roomba became a popular example of IAs entering everyday life, performing basic cleaning tasks autonomously.
In the 21st century, intelligent agents have become increasingly sophisticated and integrated into various aspects of our lives.
2000s-Present: Evolution to Advanced Intelligent Agents
Virtual personal assistants such as Siri, Alexa, and Google Assistant are prime examples of intelligent agents.
Self-driving cars, recommendation systems, and game-playing AI are other examples of intelligent agents.
NASA's mobile agents for human planetary exploration are some of the most advanced machines we have created.
The 21st century has witnessed a remarkable surge in the development and deployment of intelligent agents across various domains. The evolution of powerful machine learning algorithms, coupled with the exponential growth in computing power and data availability, has enabled the creation of highly sophisticated autonomous systems. One of the most significant breakthroughs in this era has been the emergence of Reinforcement Learning (RL) as a key approach for training intelligent agents.
RL has proven to be a game-changer in the realm of game-playing AI, with notable examples such as AlphaGo, which made history by defeating world champion Go players in 2016. This achievement highlights the potential of RL in enabling agents to learn and adapt to complex environments through trial-and-error learning and reward maximization.
Agents in Reinforcement Learning
Reinforcement Learning (RL) is a subfield of machine learning that focuses on training agents to make sequential decisions in an environment to maximize a cumulative reward signal. In RL, an agent interacts with its environment by taking actions, observing the resulting state, and receiving rewards or penalties based on its actions. The goal here is to learn a policy, which is a mapping from states to actions, that maximizes the expected cumulative reward over time.
In RL, an “agent” is strictly defined by its policy, a mapping function from the current state to the most appropriate next action, informed by the reward earned from all previous actions.
Let’s look at an example. Say an RL agent is in charge of controlling the conversation flow inside a customer service chat system.
“State” in this case is some indicator of progress extracted from the last utterance from the customer (eg. intent, like needs more information to purchase).
“Reward” could include positive points for retrieving relevant information, or positive sentiment from the customer, or improved likelihood of a purchase or service renewal, and negative points for signs of frustration (eg. repeated asks) or negative sentiment from the customer, or abandoned conversation.
The “environment” in this case is the current conversation, all the data we have about the customer (their purchase history, demographic, previous communication transcripts), and say data related to competitive services.
“Actions” could include offering a solution, retrieving data to answer questions, and asking clarifying questions.
The “policy” (and therefore the agent) is the decision making function (could be learned from historical data or designed based on business logic or both) that selects the next best action given the current reward. For example, a “perception” function might evaluate the intent of the last utterance (eg. complaint) to infer the state and using that information the policy determines what the best next action is (eg. apologize and offer a discount).
Note that while reward has a significant impact on how the agent behaves, it is not an internal property of the agent. It is instead a property determined by the designer of the system (part art, part science) to help it learn the desired goal seeking behavior. In other words, the reward in this case is externally driven (as opposed to human behavior, for example, where the incentives could be internal or external).
Properties of RL agents
We can explore the key properties we mentioned at the beginning of this article for RL agents.
Autonomy: RL agents can make decisions independently based on their policy, without requiring explicit instructions or supervision.
Interactivity: Agents continuously interact with their environment, including other agents, by taking actions and receiving feedback in the form of rewards or penalties.
Adaptability: Through trial-and-error learning, RL agents can adapt their behavior based on the feedback they receive, allowing them to improve their performance over time.
Goal-orientation: RL agents are driven by the objective of maximizing cumulative rewards, which enables them to learn optimal strategies for achieving specific goals.
Examples of RL agents
Game-playing agents:
AlphaGo: Developed by DeepMind, AlphaGo is an RL agent that learned to play the complex game of Go at a superhuman level, defeating world champion players.
OpenAI Five: Created by OpenAI, this RL agent mastered the multiplayer video game Dota 2, showcasing the potential of RL in complex strategic environments.
Robotics and autonomous systems:
Autonomous vehicles: Reinforcement learning is actively being explored for training self-driving cars. Companies like Waymo and Tesla utilize RL for tasks like lane following, obstacle avoidance, and optimizing driving behavior.
Robotic manipulation: RL agents can learn to perform dexterous manipulation tasks, such as grasping and assembling objects, by learning from trial-and-error interactions with the environment.
Recommendation systems:
News recommendation: RL agents can be employed to personalize news article recommendations based on user preferences and engagement, optimizing for long-term user satisfaction. The DRN framework, for instance, uses Deep Q-learning to deliver personalized news content.
E-commerce recommendations: By learning from user interactions and purchase history, RL agents can provide personalized product recommendations that maximize user engagement and revenue. This paper proposed to use deep reinforcement learning to recommend product sequences that sustain user interest and drive purchases.
The examples we've explored demonstrate the wide range of applications for RL agents across various domains, from game-playing and robotics to recommendation systems. As research in RL continues to advance, we can expect to see even more innovative and impactful use cases for these adaptive, goal-oriented agents.
RL and NLP
One particularly exciting area where RL is making significant advancements is in the field of Natural Language Processing (NLP). By leveraging the power of RL, researchers and practitioners are developing agents that can effectively tackle complex language tasks, such as text generation, dialogue management, and summarization.
The intersection of Reinforcement Learning and Natural Language Processing has given rise to a new generation of language-based agents that can learn to generate, manipulate, and understand human language in increasingly sophisticated ways.
Text Generation Control:
RL agents can be employed to control the style and content of text generation tasks, enabling the creation of tailored writing styles for different audiences.
For example, Reinforcement Learning with Human Feedback (RLHF) has been used to train models that can generate text with desired style or tone and even content, opening up new possibilities for creative writing and content generation.
Dialogue Management in Chatbots:
RL agents can learn optimal conversation strategies in chatbots, allowing them to engage users more effectively and achieve specific goals.
By training RL agents to select appropriate responses based on user input and conversation context, chatbots can maintain engaging discussions, provide relevant information, and even assist in tasks like booking appointments or making recommendations.
Text Summarization:
RL agents can be applied to the task of text summarization, learning to generate concise and informative summaries of longer documents. This has also been used in the context of making language model prompts more efficient.
By designing reward functions that encourage faithfulness to the original text, coherence, and brevity, RL agents can produce high-quality summaries that capture the key points of a document while maintaining readability.
The potential of Reinforcement Learning in Natural Language Processing is truly remarkable. This enables the development of intelligent agents that can generate, manipulate, and understand human language in increasingly sophisticated ways. However, the concept of intelligent agents in NLP is not a recent development. In fact, it has been deeply rooted in the field since its early days, aiming to create systems that can understand and respond to human language in a meaningful way.
Agents in Natural Language Processing
The idea of agents in NLP can be traced back to the field's early focus on interaction and reasoning. As researchers sought to develop systems capable of engaging in human-like communication, the notion of language-based agents naturally emerged.
Focus on Interaction and Reasoning:
NLP has long been motivated by the desire to create systems that can understand and respond to human language, mimicking human-like interaction and reasoning capabilities.
This focus on interaction and reasoning naturally led to the conceptualization of agents as entities that can engage with users using natural language.
Early NLP Systems as Agents:
Some of the earliest NLP systems, such as SHRDLU (1972), can be considered simple agents in their own right.
SHRDLU, for example, could understand and respond to natural language questions about a simulated block world, showcasing basic reasoning capabilities within a limited domain.
These early systems laid the foundation for the development of more sophisticated language-based agents in the years to come.
Dialogue Systems and Chatbots:
The development of dialogue systems and chatbots heavily relied on the concept of agents, as these systems needed to process user input, understand intent, and generate appropriate responses.
Early chatbots, while less advanced than modern language models, were essentially software agents operating in the domain of human-computer conversation.
These systems paved the way for the more sophisticated conversational agents we interact with today.
As NLP technologies continue to evolve, the concept of agents has taken on new dimensions, particularly with the advent of large language models. LLMs have unlocked unprecedented possibilities for creating intelligent, language-based agents that can understand and generate human-like text with remarkable coherence and contextual awareness.
LLM Agents
What are they?
LLM agents are a new class of AI systems that combine large language models (LLMs) with the ability to make informed decisions, take actions, and work towards specific goals. They can be described as a system that uses an LLM to reason through a problem, create a plan to solve the problem, and execute the plan with the help of a set of tools. LLM agents are typically characterized by 3 properties:
Memory (equivalent to environment in the context of RL)
Tool usage (equivalent to actions in the context of RL)
Planning (equivalent to policies in RL mapping states to actions based on the current state to maximize reward)
This concept enables LLMs to analyze information they encounter and choose the most appropriate tool for the task at hand based on their available policies. This empowers them to make informed decisions and achieve their goals. This is exactly what we humans do. When we have a task to solve, we gather information, we look for ways and tools that help us to solve this task as easily as possible. Memory and tool usage are relatively well established, but planning has still significant room for debate and improvements.
Early LLM agents like AutoGPT and BabyAGI have shown promise in complex tasks like web searches and code generation. However, these agents are still under development, and their stability, reliability, and applicability to real-world problems remain open questions.
Agents and Autonomy
One of the important characteristics of agents is their level of autonomy characterized by their ability to execute increasingly complex tasks with little to no supervision (correlated with the complexity of their policies). A software system at a low level of autonomy might resemble tools and one with a high level of autonomy might behave like an agent. While there is no clear cut differentiation, comparison to autonomous driving can be illuminating.
Levels 0 and 1 are largely what we have seen in industry in the past 2 decades for narrow and specialized use cases. Level 2 is what we have seen in the past 5 years or so, especially with generative models that can convert effectively between modalities (text to image) or ones that have emergent capabilities although they’re only trained on single tasks (aka large language models). The important property of level 2 is that systems at this level are combinations of smaller narrow systems and are coupled to each other via carefully designed interfaces (the most speculated architecture for GPT4 is a mixture of experts). In all these levels, systems have a low level of autonomy which means that they can perform specific actions based on clear instructions but have limited decision-making authority.
Level 3 is where things start to get interesting. Medium level of autonomy means that the systems can choose between different predefined options or strategies based on the context. A lot of the cases we call “agentic workflows” today fall into this category where a pre-trained classifier (eg. intent classifiers in chatbots) routes queries to specific pipelines or action chains. This is a bit of a gray area in terms of the strict definition of agents but robustness demands for most near-term applications would mean that we will see most systems follow this design pattern until more robust infrastructure is available for higher levels of autonomy.
Levels 4 and 5 have high autonomy which means that the agent can learn from its interactions, set its own goals within a broader framework, and make complex decisions without needing explicit instructions. Very much like what we have experienced with autonomous driving at levels 4 and 5, there are significant infrastructure prerequisites for these levels to exist and deliver robust performance. For example, there might be a need for significant organizational changes to allow a software system to execute a large number of tasks within a business workflow.
What are the challenges of deploying LLM Agents in business workflows in the near term?
While LLM agents have a potential to enhance business workflows and to enable more intelligent systems, there are significant challenges that need to be addressed before these systems can be widely deployed in real-world settings.
Technical Challenges
Stability and reliability: Early experiments with LLM agents have shown that they can be prone to erratic or unexpected behavior, often deviating from intended goals or producing nonsensical outputs.
Measuring progress and performance: Evaluating the effectiveness of LLM agents can be complex, as they may take unexpected approaches to achieve goals or potentially deviate from desired outcomes.. Developing robust metrics and evaluation frameworks is an active area of research.
Organizational Challenges
Integration with existing processes and infrastructures: Deploying LLM agents may require complex setup and management of data sources, APIs, and tools. Compatibility issues with legacy systems and the need for custom interfaces and integrations are also challenges.
Human oversight and intervention: Mechanisms for humans to monitor, guide, and correct the behavior of LLM agents are important. Designing workflows and interfaces that allow for seamless collaboration between humans and AI agents is a challenge.
These challenges illustrate that while LLM agents have the potential to augment and automate various business tasks, their deployment requires careful planning, iterative testing, and ongoing monitoring and refinement.
Agents and Control Flow
In the near term, overcoming these challenges hinges on robust control flow mechanisms. Control flow dictates how the LLM agent navigates interactions, makes decisions, and ultimately achieves its goals. Without it, LLM agents risk producing nonsensical outputs, deviating from intended tasks, or simply becoming unstable.
Imagine an LLM agent designed to write customer service emails. Control flow ensures it understands the situation (e.g., complaint, inquiry), retrieves relevant information (e.g., customer details, order history), and crafts a professional and appropriate response. This might involve routing the user's request to the appropriate department or dynamically generating different email templates based on the issue. Control flow keeps the agent on track, preventing irrelevant tangents or factual errors.
Effective control flow also addresses the challenges of measuring progress and integrating with existing systems. By establishing clear decision points and expected behaviors, developers can create metrics to track the LLM agent's performance and identify areas for improvement. Furthermore, control flow allows for the integration of human oversight and intervention. Developers can design control mechanisms that allow humans to guide the LLM towards desired outcomes or step in when the agent encounters unexpected situations. In essence, control flow acts as the bridge between the raw power of LLMs and the need for stability, reliability, and human oversight in real-world applications. See also:
You can think of control flows as generalized policies that come from a deep understanding of the workflow the agent is trying to automate. They could include things like:
Guardrails: Type of control flow specifically designed to restrict the LLM's behavior in certain ways. They act like safety rails, preventing the LLM from venturing into undesirable areas or generating harmful outputs. For instance: preventing offensive language, staying on topic, and fact-checking.
Routing: LLMs can be used to make decisions about how to respond to a user's query. This can involve classifying the query type (e.g., question, request, instruction) and then directing the response accordingly. For instance, an LLM might predict the best response to a question is a factual summary, while a request might require completing an action (like booking a flight).
Error Handling and Recovery: When an agentic system encounters an issue, control flow mechanisms allow it to diagnose the problem and take corrective actions. This might involve prompting the user for clarification, reformulating a request, or attempting alternative strategies to complete the task.
Prioritization and Decision Making: Agentic systems often juggle multiple tasks or goals. Control flow structures help them prioritize based on urgency, importance, or available resources. For instance, a virtual assistant might prioritize responding to an urgent message over completing a less time-sensitive task.
State Management: Many agentic systems track their internal state (e.g., conversation history, user preferences) to provide a more consistent and personalized experience. Control flow dictates how the system updates its state based on new information and uses it to inform future actions. Imagine a chatbot remembering your previous order preferences while recommending a new product.
Learning and Adaptation: Advanced agentic systems can learn and adapt their behavior over time. Control flow allows them to integrate newly acquired knowledge into their decision-making process. For instance, a recommendation system might adjust its suggestions based on your past interactions and positive feedback.
Most common implementations involve training small, specialized models (not necessarily language models) that carry out tasks and provide information or constraints to the overall system including crafting prompts that maximize the likelihood of the desired LLM response.
Multi-agent LLM systems
Why multi-agent systems?
Multi-agent systems involve multiple single-agent systems that interact to achieve complex tasks. This is particularly useful when a monolithic agent (one policy based on a singular reward function) is too inefficient or impossible. In our customer service example, imagine an agent who is rewarded by maximizing the likelihood of upselling products. This can hurt the business by sacrificing long term retention of the customer if they start to feel they are being “sold to”. One solution could be adding more terms to the reward function to account for the long term retention. Now collapsing all those contradictory requirements into one function might not necessarily result in the best policy learned by the system. Alternatively, one could create a system where two agents, one rewarded for maximizing short term profit and one rewarded for reducing the risk of retention, can collaborate and keep each other in check.
Some of the other reasons for breaking up the system into multiple agents are:
Optimized task allocation: It might be more efficient (from a design, implementation, and maintenance point of view) to break down a complex problem into smaller subproblems and have agents rewarded for solving those subproblems specifically. This more modular design, although unnecessary from a functionality point of view, could be easier to improve and scale.
Enhanced response time: The modular design of multiple agents is not only useful if the sub-problems are different but also in cases where they might be similar but they can be done in parallel to save analysis time.
More robust specialization: While it is plausible for one agent to learn how to choose amongst a large number of actions, it might be more robust to partition actions based on some relevant property and have agents specialize in using them more effectively.
Agents UX and HCI
Software agents have become integral components of various Human-Computer Interaction (HCI) applications. These agents, powered by large language models (LLMs), serve various roles, from acting as personal assistants to customizing user interactions based on individual preferences. This section explores the roles of LLM agents in HCI and their impact on user experiences.
Intelligent User Interfaces (IUIs): Agents can act as intelligent assistants that can understand user needs, provide recommendations, and automate tasks. They offer a more intuitive and efficient means of interaction, reducing the cognitive load on users. Virtual assistants like Siri, Alexa, or Google Assistant are prime examples of IUIs which allow LLM agents to interpret user queries and provide relevant information.
Personalization and Recommendation Systems: Recommendation systems in e-commerce or streaming services can be powered by agents. These agents learn user preferences and recommend products, movies, or music based on that information. However, it's important to acknowledge that LLM agents can inherit biases from the data they're trained on. Transparency in how recommendations are generated is crucial for user trust.
Adaptive Interfaces: Agents can be used to create adaptive interfaces that adjust to user behavior or skill level. For instance, an educational software program might use an agent to tailor the difficulty of exercises based on the user's performance.
Embodied Conversational Agents (ECAs): These are virtual characters that can interact with users through spoken language or gestures. ECAs powered by LLMs can be used for customer service, education, or even companionship. Imagine an ECA tutor that personalizes learning experiences and provides emotional support.
Augmented Reality (AR) and Virtual Reality (VR): Agents can be integrated into AR/VR experiences to guide users, provide information, or even act as companions within the virtual environment. An LLM agent in an AR museum experience could provide historical context about exhibits or answer visitor questions in a natural, conversational way.
LLM agents are revolutionizing HCI by creating more intuitive, efficient, and personalized user experiences. They reduce cognitive load, improve accessibility, and offer a more natural way to interact with technology. As LLM agents continue to evolve and become more sophisticated, we can expect even more innovative applications that enhance human-computer interaction in the years to come.
Parting words
Hopefully, with this article you have learned more about agents and are as excited as we are about them! Agents are the holy grail of a lot of what we have done and have been wanting to do with computers for the past several decades. Today, with natural language as a new way to interface with machine, we are closer than ever to that dream!
Happy building!