Building AI Agents from Scratch

Learn how to design and implement autonomous AI agents capable of reasoning, planning, and executing tasks.

Introduction

The landscape of Artificial Intelligence is shifting from static models to dynamic, autonomous agents. AI agents are systems that can perceive their environment, reason about how to achieve goals, and take actions to accomplish them. Unlike simple chatbots, agents can use tools, browse the web, and execute complex workflows.

In this tutorial, we will explore the architecture of AI agents and build a functional agent from scratch using Python and Large Language Models (LLMs).

What is an AI Agent?

At its core, an AI agent is a loop:

Observation: The agent receives input or observes the state of the world.
Reasoning: The agent uses an LLM to decide what to do next based on the observation and its goal.
Action: The agent executes a tool or action.
Feedback: The result of the action is fed back into the loop.

Prerequisites

Python 3.10+
API Key for an LLM provider (OpenAI, Anthropic, etc.)
Basic understanding of prompt engineering

Setting Up the Environment

First, let's install the necessary libraries. We'll use langchain as a framework to simplify our agent construction, although we could build it with raw API calls.

pip install langchain langchain-openai python-dotenv

Designing the Agent Core

The brain of our agent is the LLM. We need to prompt it effectively to act as a reasoning engine. This is often called the "ReAct" (Reasoning + Acting) pattern.

from langchain_openai import ChatOpenAI
from langchain.agents import tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.agents.format_scratchpad.openai_tools import format_to_openai_tool_messages
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser

llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)

Defining Tools

Agents need tools to interact with the world. Let's define a simple calculator tool and a search tool.

@tool
def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    # Mock implementation
    return f"The weather in {city} is sunny and 25掳C"

tools = [multiply, get_weather]

The Reasoning Loop

Now we bind the tools to the LLM and create the execution loop.

llm_with_tools = llm.bind_tools(tools)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. You can use tools to answer questions."),
    ("user", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_tool_messages(x["intermediate_steps"]),
    }
    | prompt
    | llm_with_tools
    | OpenAIToolsAgentOutputParser()
)

Running the Agent

To run the agent, we need an AgentExecutor which handles the loop of calling the agent, executing tools, and feeding outputs back.

from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = agent_executor.invoke({"input": "What is 55 times 12? Also, what's the weather in Tokyo?"})
print(result['output'])

Advanced Concepts: Memory and Planning

Real-world agents need memory to maintain context over long conversations. We can add memory to our agent using RunnableWithMessageHistory.

Furthermore, for complex tasks, we might implement a "Planning" step where the agent breaks down a high-level goal into sub-tasks before execution.

Conclusion

Building AI agents opens up a new dimension of software development. By combining the reasoning capabilities of LLMs with functional tools, we can create software that is more flexible and capable than ever before.

Start experimenting with different tools and prompts to see what your agent can achieve!

Written by PlayHve

Tech Education Platform

Your ultimate destination for cutting-edge technology tutorials. Learn AI, Web3, modern web development, and creative coding.

Building AI Agents from Scratch

Next Steps

Building Neural Networks from Scratch with Python

Building Realtime Apps with WebSockets

Building Web3 dApps with Solidity

Written by PlayHve