Building AI Agents from Scratch
Learn how to design and implement autonomous AI agents capable of reasoning, planning, and executing tasks.
Introduction
The landscape of Artificial Intelligence is shifting from static models to dynamic, autonomous agents. AI agents are systems that can perceive their environment, reason about how to achieve goals, and take actions to accomplish them. Unlike simple chatbots, agents can use tools, browse the web, and execute complex workflows.
In this tutorial, we will explore the architecture of AI agents and build a functional agent from scratch using Python and Large Language Models (LLMs).
What is an AI Agent?
At its core, an AI agent is a loop:
- Observation: The agent receives input or observes the state of the world.
- Reasoning: The agent uses an LLM to decide what to do next based on the observation and its goal.
- Action: The agent executes a tool or action.
- Feedback: The result of the action is fed back into the loop.
Prerequisites
- Python 3.10+
- API Key for an LLM provider (OpenAI, Anthropic, etc.)
- Basic understanding of prompt engineering
Setting Up the Environment
First, let's install the necessary libraries. We'll use langchain as a framework to simplify our agent construction, although we could build it with raw API calls.
pip install langchain langchain-openai python-dotenv
Designing the Agent Core
The brain of our agent is the LLM. We need to prompt it effectively to act as a reasoning engine. This is often called the "ReAct" (Reasoning + Acting) pattern.
from langchain_openai import ChatOpenAI
from langchain.agents import tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.agents.format_scratchpad.openai_tools import format_to_openai_tool_messages
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser
llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
Defining Tools
Agents need tools to interact with the world. Let's define a simple calculator tool and a search tool.
@tool
def multiply(a: int, b: int) -> int:
"""Multiply two numbers."""
return a * b
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
# Mock implementation
return f"The weather in {city} is sunny and 25掳C"
tools = [multiply, get_weather]
The Reasoning Loop
Now we bind the tools to the LLM and create the execution loop.
llm_with_tools = llm.bind_tools(tools)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant. You can use tools to answer questions."),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_tool_messages(x["intermediate_steps"]),
}
| prompt
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)
Running the Agent
To run the agent, we need an AgentExecutor which handles the loop of calling the agent, executing tools, and feeding outputs back.
from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = agent_executor.invoke({"input": "What is 55 times 12? Also, what's the weather in Tokyo?"})
print(result['output'])
Advanced Concepts: Memory and Planning
Real-world agents need memory to maintain context over long conversations. We can add memory to our agent using RunnableWithMessageHistory.
Furthermore, for complex tasks, we might implement a "Planning" step where the agent breaks down a high-level goal into sub-tasks before execution.
Conclusion
Building AI agents opens up a new dimension of software development. By combining the reasoning capabilities of LLMs with functional tools, we can create software that is more flexible and capable than ever before.
Start experimenting with different tools and prompts to see what your agent can achieve!