
Building an AI agent from scratch isn’t as intimidating as it sounds at first. You don’t need a PhD in machine learning or years of experience. What you need is curiosity, basic Python skills, and this step-by-step guide.
In this tutorial, you’ll build a functional AI agent that can answer questions and perform custom tasks using tools. We’re talking about a real agent that uses a large language model (LLM) for reasoning and can execute actions like web searches or calculations. No heavy frameworks, no black boxes – just pure, understandable code that demystifies how agents actually work. By the end, you’ll have an MVP agent running on your machine and the knowledge to expand it however you want.
What Is an AI Agent?
An AI agent is a software program that can perceive, think, and act to achieve goals autonomously. Think of it as the difference between a calculator (you press buttons, it computes) and a personal assistant (you state a goal, it figures out the steps).
Consider a customer service bot. A basic chatbot has scripted responses: “For billing, press 1.” An AI agent, however, understands natural language, accesses multiple tools (database queries, payment systems, knowledge bases), and dynamically determines the best path to solve your problem. It might check your account status, calculate a refund, and send a confirmation email – all from a single request like “I was overcharged last month.”
Here’s what makes something an agent versus simple automation: autonomy (it acts without manual triggers for each step), decision-making ability (it chooses between options based on context), and AI-powered reasoning (it doesn’t just follow rules; it interprets, plans, and adapts). According to research on AI agent architectures, true agents exhibit goal-directed behavior and can handle dynamic, unpredictable situations – not just predetermined workflows.
An AI agent is autonomous and action-oriented, while a regular chatbot typically just responds with pre-written answers or simple ML predictions. A chatbot might tell you the weather if it’s in its database, but an AI agent can call a weather API, calculate differences, set reminders, and perform complex multi-step tasks on its own. Agents don’t just chat – they accomplish goals by deciding what actions to take and executing them.
Core Components of an AI Agent
Every AI agent, no matter how simple or complex, shares fundamental building blocks. Understanding these components is your foundation for building anything from a basic assistant to a sophisticated autonomous system.
The “Brain” (Large Language Model)
Modern AI agents typically use a Large Language Model as their reasoning engine – their brain. This is what interprets your instructions, understands context, and generates intelligent decisions. Models like GPT-4, Claude, or open-source alternatives like Llama provide the cognitive horsepower that makes agents feel smart.
For our tutorial, we’ll use OpenAI’s API (GPT 5.2 is the latest version as of the time of writing) because it’s accessible, well-documented, and powerful enough to demonstrate core concepts. The LLM handles the heavy lifting: understanding what you want, deciding what to do next, and formulating responses. Without it, we’d need to hand-code every possible scenario – an impossible task for truly flexible agents.
Memory: Keeping Context
Imagine having a conversation where every response treats you like a complete stranger. Frustrating, right? That’s an agent without memory. Memory systems allow agents to maintain conversation context and reference previous interactions.
For our simple agent, memory means maintaining chat history – storing what the user asked and what the agent responded. As explained in agent memory systems, this gives the LLM context for each new query. Without it, asking “What did I just tell you?” would fail every time because the agent literally doesn’t remember.
There’s short-term memory (the current conversation) and potentially long-term memory (persistent storage of user preferences, facts, or past sessions). Our MVP focuses on short-term memory using a simple list structure, but the principle scales up to sophisticated vector databases for production systems.
Tools and Actions
Here’s where agents become genuinely useful beyond conversation. Tools are external functions or APIs that extend the agent’s capabilities beyond its training data. Think of them as hands and eyes for a brain that lives in language.
Your agent might have tools for:
- Searching the web for current information
- Performing calculations accurately
- Querying databases
- Sending emails or messages
- Controlling smart home devices
According to agent architectures, this is what separates agents from pure language models. The LLM can decide when a tool is needed and interpret results, but the tools perform actual actions in the real world (or at least in your software ecosystem).
The Agent Loop (Perception-Action Cycle)
Agents don’t just think once and quit. They operate in a continuous loop: perceive the situation, reason about it, take action, observe results, then adjust and repeat. This is the ReAct pattern (Reasoning + Acting) that modern AI agents use to tackle complex, multi-step problems.
The cycle looks like this:
- Perceive: Receive user input or environmental data
- Reason: Use the LLM to think about what to do (might need a tool? Have enough information? Task complete?)
- Act: Either respond to the user or call a tool
- Observe: See the result of the action (tool output, user feedback)
- Adjust: Use new information to inform the next reasoning step
This loop continues until the agent determines it has achieved its goal. For simple queries (“What’s 2+2?”), the loop might execute once. For complex tasks (“Research and summarize the top 3 alternatives to X”), the loop might iterate a dozen times, gathering information piece by piece.
Pro Tip: The ReAct pattern is powerful because it makes the agent’s thinking visible. Instead of a black box, you can see “Thought: I need current data” followed by “Action: SEARCH(query)” – making debugging infinitely easier. 🧠
Planning Your Agent (Goal and Approach)
Before writing a single line of code, let’s define what we’re building. Clear goals prevent scope creep and keep you focused.
Our Agent’s Goal: We want a conversational assistant that can answer general knowledge questions using its built-in training, but can also perform a specific task when its knowledge isn’t enough. Specifically, our agent will be able to search for information (simulated Wikipedia lookup) when it encounters a question it can’t answer from memory alone.
Think about it: the LLM knows a lot, but it doesn’t know real-time information or extremely niche topics. By giving it a search tool, we extend its capabilities dramatically. The user asks a question, the agent determines if it can answer directly or needs to search, then responds appropriately.
Choosing Your Tools: For this tutorial, we’ll implement one tool: a Wikipedia search function. This is perfect for demonstration because information retrieval is a common agent use case. In real applications, you might have tools for weather APIs, database queries, file operations, or external service integrations. Start simple – you can always add more tools later using the same pattern.
Why Python: We’re using Python because it’s the lingua franca of AI development. The ecosystem is unmatched – libraries for everything, excellent OpenAI SDK support, and readable syntax that won’t obscure the concepts we’re teaching. Plus, if you’re reading this, you probably already know Python or can pick it up quickly.
Question to Consider: What problem could your agent solve that would make your life easier? A personal research assistant? A code debugger? A task automator? Keep that vision in mind as we build. 💭
Prerequisites & Setup
Let’s make sure you’re ready to build.
Skill Prerequisites: You should be comfortable with basic Python programming – functions, lists, dictionaries, and simple control flow. You should also understand the concept of APIs at a high level (you send a request, you get a response). That’s it. No machine learning expertise required, no AI degree necessary.
Tools You’ll Need:
- Python 3.8+ installed on your machine
- A code editor (VS Code, PyCharm, or even a simple text editor)
- An OpenAI API key – sign up at OpenAI’s platform and generate an API key.
- Python libraries: We’ll install
openai(official SDK) andpython-dotenv(for secure API key management through environment variables)
Environment Setup: Create a dedicated folder for this project. Using a virtual environment keeps dependencies isolated and your system clean. This isn’t strictly necessary for a small project, but it’s good practice.
mkdir ai-agent-tutorial
cd ai-agent-tutorial
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activateBashWith your environment active, you’re ready to install libraries and start coding.
Pro Tip 💡: Level up with our python local dev environment setup guide!
Step 1: Setting Up the Python Environment
Time to prepare our development environment with the necessary libraries.
Install Required Libraries:
pip install openai python-dotenvBashWhat are these for?
openai: The official Python SDK for OpenAI’s API, handling authentication and request formattingpython-dotenv: Loads environment variables from a.envfile, keeping your API key secure and out of your code
Configure Your API Key Securely:
Never hard-code API keys in your source code. Instead, create a .env file in your project directory:
OPENAI_API_KEY=your_api_key_hereBashReplace your_api_key_here with your actual OpenAI API key. Add .env to your .gitignore if using version control – this ensures you never accidentally commit sensitive credentials.
Project Structure:
For this MVP agent, we’ll keep everything in a single file called agent.py. As your agent grows more complex, you might split tools into separate modules, but for learning purposes, one file makes it easier to see how everything connects.
ai-agent-tutorial/
├── .env
├── agent.py
└── venv/BashSimple, clean, and ready to code. Let’s build this agent! 🚀
Step 2: Building the Basic Agent (LLM Chatbot)
We start with the simplest possible version: a stateless chatbot that uses the LLM to respond to queries. This is our foundation.
Initialize the LLM Client:
First, we need to load our API key and set up the OpenAI client. Add this to your agent.py file:
import os
from openai import OpenAI
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def get_response(user_message):
"""
Send a message to the LLM and get a response.
"""
response = client.chat.completions.create(
model="gpt-5-nano",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": user_message}
],
temperature=0.7
)
return response.choices[0].message.contentBashWhat’s happening here? We’re importing the necessary libraries, loading our API key securely, and creating a function that sends a message to GPT-3.5. The messages parameter includes two roles: system (sets the agent’s personality and instructions) and user (the actual query). The API returns a completion, and we extract the text content.
Create a Basic Chat Loop:
Now let’s add a simple interactive loop so we can test our agent:
def main():
print("AI Agent initialized. Type 'quit' to exit.")
print("-" * 50)
while True:
user_input = input("\nYou: ")
if user_input.lower() in ['quit', 'exit']:
print("Goodbye!")
break
response = get_response(user_input)
print(f"\nAgent: {response}")
if __name__ == "__main__":
main()PythonTest It Out:
Run your agent with python agent.py. Try asking questions:
You: Hello, how are you?
Agent: Hello! I’m here and ready to help. How are you doing today? What can I assist you with?
You: What is the capital of France?
Agent: Paris.Code language: HTTP (http)Understanding the Limitation: Notice that this agent has no memory. If you ask a follow-up question that references a previous message, it fails completely. Try this:
You: I have 4 apples.
Agent: Nice! You have 4 apples. What would you like to do with them?...
You: If I eat 1, how many do I have left?
Agent: If you eat 1 of something, you'll have the total minus 1 remaining.
Could you specify what you're referring to?Code language: PHP (php)See the problem? The agent doesn’t remember you mentioned 4 apples. This is where memory comes in – our next step.
Step 3: Adding Memory (Context)
Without memory, our agent is like someone with amnesia – every interaction is brand new. Let’s fix that by implementing conversation history.
Why Memory Matters: Humans expect conversational continuity. When you say “Tell me more about that,” you’re referencing something previously discussed. The LLM itself is stateless – it doesn’t remember past interactions unless we explicitly provide that context. As documented in agent memory implementations, maintaining conversation history is essential for natural interactions.
Implement Memory with Message History:
Modify your agent.py to maintain a list of messages:
def main():
print("AI Agent initialized. Type 'quit' to exit.")
print("-" * 50)
# Initialize conversation history
messages = [
{"role": "system", "content": "You are a helpful assistant."}
]
while True:
user_input = input("\nYou: ")
if user_input.lower() in ['quit', 'exit']:
print("Goodbye!")
break
# Add user message to history
messages.append({"role": "user", "content": user_input})
# Get response with full conversation context
response = client.chat.completions.create(
model="gpt-5-nano",
messages=messages,
)
assistant_message = response.choices[0].message.content
# Add assistant response to history
messages.append({"role": "assistant", "content": assistant_message})
print(f"\nAgent: {assistant_message}")PythonThe Key Change: Instead of sending only the current message, we now send the entire conversation history with each API call. The messages list accumulates both user inputs and assistant responses, giving the LLM full context for every response.
Test the Improvement:
Run the agent again and try the apple scenario:
You: I have 4 apples.
Agent: That's nice! Apples are healthy and delicious. Are you planning
to eat them, use them in a recipe, or something else?
You: If I eat 1, how many do I have left?
Agent: If you eat 1 of your 4 apples, you'll have 3 apples left.Code language: PHP (php)Success! The agent remembers the context and can answer follow-up questions logically.
Important Consideration:
API calls have token limits ( GPT-5.2 Limit).
For an MVP, this isn’t a problem – short conversations work fine. For production systems, you’d implement conversation summarization or sliding window memory (keeping only recent messages). But that’s an optimization for later.
Pro Tip 💡: Print the messages list occasionally during development to see exactly what context you’re sending to the LLM. It’s invaluable for debugging unexpected responses. 🔍
Step 4: Adding Tools/Actions
Now we reach the heart of what makes an agent truly powerful: the ability to take actions through tools. This transforms our chatbot into something that can interact with the world beyond its training data.
Why Tools Matter: The LLM knows a lot, but it doesn’t know real-time information (current weather, stock prices, breaking news) or your specific data (contents of your database, files on your computer). Tools bridge this gap. Tools extend the agent’s reach from pure language into actionable tasks.
Choosing Our Tool: We’ll implement a Wikipedia search tool. When the agent encounters a query it can’t answer confidently from its training, it can search Wikipedia for relevant information. This is both practical and educational – you’ll see exactly how tool integration works.
Implement the Wikipedia Search Tool:
For simplicity, we’ll use Python’s requests library with Wikipedia’s API. Add this function to your agent.py:
import requests
def wikipedia_search(query):
"""
Search Wikipedia and return a summary of the topic.
"""
try:
url = "https://en.wikipedia.org/api/rest_v1/page/summary/" + query.replace(" ", "_")
response = requests.get(url)
if response.status_code == 200:
data = response.json()
return data.get('extract', 'No information found.')
else:
return f"Could not find information about '{query}'."
except Exception as e:
return f"Search error: {str(e)}"PythonDon’t forget to install requests: pip install requests
Integrate Tool with Agent Logic:
Now comes the interesting part: teaching our agent when and how to use this tool. We’ll use a simple approach where the agent can request a search by responding with a special format.
Modify the system prompt to teach the agent about its tool:
def main():
print("AI Agent initialized. Type 'quit' to exit.")
print("-" * 50)
messages = [
{"role": "system", "content": """You are a helpful assistant with access to a Wikipedia search tool.
When you need to search for information you don't know, respond with:
SEARCH: [topic to search]
After receiving search results, use that information to answer the user's question.
Keep answers concise and helpful."""}
]
while True:
user_input = input("\nYou: ")
if user_input.lower() in ['quit', 'exit']:
print("Goodbye!")
break
messages.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="gpt-5-nano",
messages=messages,
temperature=0.7
)
assistant_message = response.choices[0].message.content
# Check if agent wants to use the search tool
if assistant_message.startswith("SEARCH:"):
search_query = assistant_message.replace("SEARCH:", "").strip()
print(f"\n[Agent is searching for: {search_query}]")
# Execute the tool
search_result = wikipedia_search(search_query)
# Add tool result to conversation as a system message
messages.append({"role": "assistant", "content": assistant_message})
messages.append({"role": "system", "content": f"Search result: {search_result}"})
# Let agent formulate final answer with search results
response = client.chat.completions.create(
model="gpt-5-nano",
messages=messages,
)
assistant_message = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_message})
print(f"\nAgent: {assistant_message}")PythonWhat’s Happening: When the agent determines it needs external information, it responds with “SEARCH: topic”. Our code detects this, executes the Wikipedia search, adds the results to the conversation, then gives the agent another chance to respond – this time with the search results available. It’s a simplified version of the ReAct pattern in action.
Testing Tool Integration:
Try asking about something recent or niche:
You: Who is Satoshi Nakamoto?
[Agent is searching for: Satoshi Nakamoto]
Agent: Satoshi Nakamoto is the pseudonymous person or group who created
Bitcoin, the first cryptocurrency. The true identity remains unknown,
though there have been various claims and speculation over the years.
Nakamoto published the Bitcoin whitepaper in 2008 and was active in the
Bitcoin development community until 2010, after which they disappeared
from public view.Code language: JavaScript (javascript)Behind the scenes, the agent recognized it should search for current or detailed information, executed the tool, received Wikipedia’s summary, and formulated a comprehensive answer.
Important Note ⚠️: This implementation is simplified for education. Production agents use more sophisticated approaches like OpenAI’s function calling API or frameworks that parse structured tool requests. However, the core principle remains identical: detect when a tool is needed, execute it, provide results back to the agent, and continue reasoning.
Pro Tip 💡: Log every tool execution during development. Knowing which tools fire, when, and with what results makes debugging agent behavior much easier than trying to guess what happened. 📝
Step 5: Testing the Agent End-to-End
Let’s put everything together and run comprehensive tests to see our agent in action across different scenarios.
Test Scenario 1: Direct Knowledge Response
Questions the LLM can answer from its training data should work without tools:
You: Who wrote To Kill a Mockingbird?
Agent: Harper Lee wrote To Kill a Mockingbird. The novel was published
in 1960 and won the Pulitzer Prize the following year.Perfect. The agent answered directly – no search needed because this is well-known information within its training data.
Test Scenario 2: Tool-Assisted Response
Now let’s ask something that requires external lookup:
You: What is quantum entanglement?
[Agent is searching for: quantum entanglement]
Agent: Quantum entanglement is a phenomenon in quantum physics where
pairs or groups of particles interact in such a way that the quantum
state of each particle cannot be described independently. When particles
are entangled, the state of one particle instantaneously influences the
state of the other, regardless of the distance between them. This
phenomenon is often described as "spooky action at a distance" and is
fundamental to quantum mechanics and has applications in quantum
computing and quantum cryptography.Code language: JavaScript (javascript)Excellent! The agent recognized it should search for detailed information, executed the Wikipedia tool, and synthesized a clear answer from the results.
Test Scenario 3: Multi-Turn Conversation with Context
Let’s verify memory works alongside tools:
You: Tell me about Python programming language.
[Agent is searching for: Python programming language]
Agent: Python is a high-level, interpreted programming language known
for its clear syntax and readability. Created by Guido van Rossum and
first released in 1991, Python emphasizes code readability and allows
programmers to express concepts in fewer lines of code...
You: Who created it?
Agent: Python was created by Guido van Rossum. He began working on
Python in the late 1980s and first released it in 1991.Code language: HTTP (http)Notice the agent didn’t search again for the follow-up question – it used the context from the previous conversation to answer directly. Memory and tools working together seamlessly.
Console Output Analysis: Each test demonstrates different agent capabilities.
Step 5: Enhancements and Next Steps
Congratulations! You’ve built a functional AI agent from scratch. But this is just the beginning. Let’s explore how to make it production-ready and even more capable.
Robust Tool Handling: Our current implementation uses a simple string-matching approach. For multiple tools, consider structured approaches like OpenAI’s function calling API, which lets the model specify which function to call with properly formatted parameters. You could define tools like:
tools = [
{
"name": "wikipedia_search",
"description": "Search Wikipedia for information",
"parameters": {"query": "string"}
},
{
"name": "calculate",
"description": "Perform mathematical calculations",
"parameters": {"expression": "string"}
}
]PythonThe LLM would then return structured requests that your code routes to the appropriate function. This scales much better than pattern matching.
Error Handling: Production agents need resilience. Wrap API calls and tool executions in try-except blocks:
try:
response = client.chat.completions.create(...)
except Exception as e:
print(f"API error: {e}")
# Fallback behavior or retry logicPythonHandle rate limits, network timeouts, and malformed responses gracefully. Your agent should never crash from a single failed API call.
Security Considerations: Tools can be dangerous. For example, if you implement a calculator tool using Python’s eval(), you’ve created a security nightmare – users could execute arbitrary code. Always sanitize inputs, use safe evaluation methods (like the ast module’s literal_eval), and never give agents unrestricted system access. Sandbox everything.
Improving Intelligence: Upgrading to GPT-5.2 dramatically improves reasoning quality. The model makes better decisions about tool use, handles complex multi-step tasks more reliably, and produces more accurate responses. For memory-intensive applications, consider vector databases (like Pinecone or Weaviate) to store and retrieve relevant context from thousands of past interactions.
Scaling Up: Your simple agent could evolve into sophisticated systems. You might implement:
- Multiple specialized agents that delegate tasks to each other
- Planning capabilities where the agent breaks complex goals into sub-tasks
- Learning from feedback by logging successful patterns
- Custom knowledge bases that the agent queries before searching the web
The architecture you’ve learned here scales to these advanced use cases – it’s just a matter of adding more sophisticated components to the basic perceive-reason-act loop.
Using AI Agent Frameworks vs. Building from Scratch
Now that you understand how agents work under the hood, let’s talk about when to use frameworks versus rolling your own.
When Frameworks Make Sense: Building from scratch is fantastic for learning and for simple, controlled use cases. But when you need complex multi-step reasoning, dozens of tools, sophisticated memory management, or robust error handling, frameworks save enormous time. They’ve already solved the hard problems.
Popular Agent Frameworks:
- LangChain: The most popular framework for chaining LLM calls with tools and memory. Excellent for rapid prototyping and has integrations with virtually every AI service. Great for agents that need to chain multiple reasoning steps together.
- LlamaIndex: Focused on building agents that interact with your data. If your agent needs to query documents, databases, or knowledge bases, LlamaIndex provides optimized retrieval and indexing.
- Microsoft AutoGen: Designed for multi-agent systems where multiple AI agents collaborate. Perfect if you’re building complex workflows with specialized agents handling different tasks.
- Haystack: Production-focused framework for building search systems and agents that need robust NLP pipelines.
Trade-offs: Frameworks add abstraction layers. You gain speed and features but lose transparency. When debugging, you’re troubleshooting framework code, not just your own. That’s why understanding the fundamentals (what you just learned) is crucial – even when using frameworks, you’ll know what’s happening behind the scenes.
For your next project, consider starting with a framework if you’re building something complex or production-focused. But the knowledge from this tutorial? That’s permanent. Frameworks come and go, but understanding how agents perceive, reason, and act is timeless.
Question❓: After building from scratch, would you choose a framework for your next agent? Or stick with custom code for maximum control? There’s no wrong answer – it depends on your specific needs. 🤔
FAQ: Common Questions About Building AI Agents
Not at all. Our example uses cloud APIs (OpenAI), so the heavy AI processing happens on their servers. You can run this agent from a standard laptop since you’re just making API calls. If you wanted to run local models, then yes – you’d need a decent GPU to run large language models. But for learning and prototyping with APIs, a regular PC is perfectly fine.
GPT-5-nano costs about $0.05(input)/$.4(output) per 1,000 tokens. For light experimentation, you’re looking at a few cents per conversation. If you run hundreds of queries daily or use GPT-5.2 (which costs more), costs add up – monitor your usage. The free trial credits OpenAI provides are usually enough for learning. Open-source models can run free on your hardware, but you pay in complexity and setup time.
Once working locally, integrate it into a web application. Create a simple Flask or FastAPI app that accepts user input via HTTP, routes it to your agent code, and returns responses. Deploy the app on cloud services like Heroku, AWS, or Google Cloud. You could also build chat interfaces with frameworks like Streamlit or Gradana for rapid UI development. The agent logic remains the same – you’re just changing how users interact with it.
It depends on the tools. A Wikipedia search? Perfectly safe. A tool that executes shell commands or modifies files? Extremely risky without proper safeguards. Always implement sandboxing, input validation, and access controls. Never give agents unrestricted system access. Start with read-only tools, then carefully expand capabilities while maintaining security boundaries.
Conclusion
You’ve just built a functional AI agent from scratch, and that’s no small feat. You’ve learned how an LLM serves as the reasoning engine, how memory enables contextual conversations, and how tools extend capabilities beyond pure language. Most importantly, you understand the perceive-reason-act loop that makes something a true agent rather than just another chatbot.
This foundation opens countless possibilities. Your agent could evolve into a personal research assistant, a code debugging companion, a task automation system, or anything else you imagine. The pattern is universal: define goals, give the agent tools to achieve them, and let AI handle the reasoning.
Your Next Steps: Try integrating another tool – maybe a calculator or a weather API. Connect your agent to a messaging platform like Slack or Discord. Experiment with different LLMs or prompting strategies to see how behavior changes.
As AI technology evolves at breakneck speed, the skills you’ve developed here become increasingly valuable. Companies are scrambling to integrate AI agents into their products. Developers who understand not just how to use frameworks, but how agents actually work under the hood, will have a massive advantage.
Got questions? Hit issues with your implementation? That’s part of the journey. Debug methodically, read error messages carefully, and remember: every AI researcher and engineer started exactly where you are now. The difference between a beginner and an expert is just a few hundred hours of building and breaking things.
Now go build something amazing. The future of AI agents is being written right now, and you’re part of it. 🚀
Discover more from CodeSamplez.com
Subscribe to get the latest posts sent to your email.

Leave a Reply