17. AI Agents
Introduction¶
AI Agents represent an exciting development in Generative AI, enabling Large Language Models (LLMs) to evolve from assistants into agents capable of taking actions. AI Agent frameworks enable developers to create applications that give LLMs access to tools and state management. These frameworks also enhance visibility, allowing users and developers to monitor the actions planned by LLMs, thereby improving experience management.
The lesson will cover the following areas:
- Understanding what an AI Agent is - What exactly is an AI Agent?
- Exploring four different AI Agent Frameworks - What makes them unique?
- Applying these AI Agents to different use cases - When should we use AI Agents?
Learning goals¶
After taking this lesson, you'll be able to:
- Explain what AI Agents are and how they can be used.
- Have an understanding of the differences between some of the popular AI Agent Frameworks, and how they differ.
- Understand how AI Agents function in order to build applications with them.
What Are AI Agents?¶
AI Agents are a very exciting field in the world of Generative AI. With this excitement comes sometimes a confusion of terms and their application. To keep things simple and inclusive of most of the tools that refer to AI Agents, we are going to use this definition:
AI Agents allow Large Language Models (LLMs) to perform tasks by giving them access to a state and tools.
Let's define these terms:
Large Language Models - These are the models referred throughout this course such as GPT-3.5, GPT-4, Llama-2, etc.
State - This refers to the context that the LLM is working in. The LLM uses the context of its past actions and the current context, guiding its decision-making for subsequent actions. AI Agent Frameworks allow developers to maintain this context easier.
Tools - To complete the task that the user has requested and that the LLM has planned out, the LLM needs access to tools. Some examples of tools can be a database, an API, an external application or even another LLM!
These definitions will hopefully give you a good grounding going forward as we look at how they are implemented. Let's explore a few different AI Agent frameworks:
LangChain Agents¶
LangChain Agents is an implementation of the definitions we provided above.
To manage the state , it uses a built-in function called the AgentExecutor
. This accepts the defined agent
and the tools
that are available to it.
The Agent Executor
also stores the chat history to provide the context of the chat.
LangChain offers a catalog of tools that can be imported into your application in which the LLM can get access to. These are made by the community and by the LangChain team.
You can then define these tools and pass them to the Agent Executor
.
Visibility is another important aspect when talking about AI Agents. It is important for application developers to understand which tool the LLM is using and why.. For that, the team at LangChain have developed LangSmith.
AutoGen¶
The next AI Agent framework we will discuss is AutoGen. The main focus of AutoGen is conversations. Agents are both conversable and customizable.
Conversable - LLMs can start and continue a conversation with another LLM in order to complete a task. This is done by creating AssistantAgents
and giving them a specific system message.
Python | |
---|---|
1 |
|
Customizable - Agents can be defined not only as LLMs but be a user or a tool. As a developer, you can define a UserProxyAgent
which is responsible for interacting with the user for feedback in completing a task. This feedback can either continue the execution of the task or stop it.
Python | |
---|---|
1 |
|
State and Tools¶
To change and manage state, an assistant Agent generates Python code to complete the task.
Here is an example of the process:
LLM Defined with a System Message¶
Python | |
---|---|
1 |
|
This system messages directs this specific LLM to which functions are relevant for its task. Remember, with AutoGen you can have multiple defined AssistantAgents with different system messages.
Chat is Initiated by User¶
Python | |
---|---|
1 |
|
This message from the user_proxy (Human) is what will start the process of the Agent to explore the possible functions that it should execute.
Function is Executed¶
Bash | |
---|---|
1 2 3 4 5 |
|
Once the initial chat is processed, the Agent will send the suggest tool to call. In this case, it is a function called get_weather
. Depending on your configuration, this function can be automatically executed and read by the Agent or can be executed based on user input.
You can find a list of AutoGen code samples to further explore how to get started building.
Taskweaver¶
The next agent framework we will explore is Taskweaver. It is known as a "code-first" agent because instead of working strictly with strings
, it can work with DataFrames in Python. This becomes extremely useful for data analysis and generation tasks. This can be things like creating graphs and charts or generating random numbers.
State and Tools¶
To manage the state of the conversation, TaskWeaver uses the concept of a Planner
. The Planner
is a LLM that takes the request from the users and maps out the tasks that need to be completed to fulfill this request.
To complete the tasks the Planner
is exposed to the collection of tools called Plugins
. This can be Python classes or a general code interpreter. This plugins are stored as embeddings so that the LLM can better search for the correct plugin.
Here is an example of a plugin to handle anomaly detection:
Python | |
---|---|
1 |
|
The code is verified before executing. Another feature to manage context in Taskweaver is experience
. Experience allows for the context of a conversation to be stored over to the long term in a YAML file. This can be configured so that the LLM improves over time on certain tasks given that it is exposed to prior conversations.
JARVIS¶
The last agent framework we will explore is JARVIS. What makes JARVIS unique is that it uses an LLM to manage the state
of the conversation and the tools
are other AI models. Each of the AI models are specialized models that perform certain tasks such as object detection, transcription or image captioning.
The LLM, being a general purpose model, receives the request from the user and identifies the specific task and any arguments/data that is needed to complete the task.
Python | |
---|---|
1 |
|
The LLM then formats the request in a manner that the specialized AI model can interpret, such as JSON. Once the AI model has returned its prediction based on the task, the LLM receives the response.
If multiple models are required to complete the task, it will also interpret the response from those models before bringing them together to generate to the response to the user.
The example below shows how this would work when a user is requesting a description and count of the objects in a picture:
Assignment¶
To continue your learning of AI Agents you can build with AutoGen:
- An application that simulates a business meeting with different departments of an education startup.
- Create system messages that guide LLMs in understanding different personas and priorities, and enable the user to pitch a new product idea.
- The LLM should then generate follow-up questions from each department to refine and improve the pitch and the product idea
Learning does not stop here, continue the Journey¶
After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!