So, when I built my own AI agent PoC and testing it to see how useful it was. I did not add memory to the AI agent just yet. It was 2 AM, furled by cold coffee, and I’d just asked my chatbot about something we discussed just five minutes earlier. It replied with the digital equivalent of “Who are you again?” 🤦
That’s when I just realized: without memory, AI agents are like goldfish swimming in circles—impressive for thirty seconds, then utterly useless.
Today, we’re going to add memory to AI agent systems the right way. No bloated frameworks, no magic black boxes—just clean Python code that you’ll actually understand. By the end of this tutorial, your agent will remember conversations, build context, and feel genuinely intelligent.
Long-term memory for AI agents is the ability to store, retrieve, and reference past interactions across multiple sessions, enabling contextual awareness and personalized responses based on historical data.
Here’s the brutal truth: stateless agents are party tricks. They answer questions brilliantly but can’t maintain a coherent conversation beyond a single exchange.
Memory transforms your agent from a fancy autocomplete tool into something genuinely useful:
Real-world scenario? I built a code review agent that remembers my team’s style guide, our architecture decisions, and even our running jokes about variable naming. It’s not just helpful—it’s contextualized helpful. That’s the difference memory makes.
We’re building on the foundation from this excellent AI agent tutorial. If you haven’t read it, pause here and skim it—we’ll wait.
Back? Great. Let’s add a memory system that actually works.
Before writing code, understand the two memory types you need:
We’re implementing both using Python dictionaries and JSON for simplicity. No external dependencies beyond what you already have.
Here’s the memory system I use in production. It’s battle-tested and beginner-friendly:
import json
import os
from datetime import datetime
from typing import Dict, List, Optional
class AgentMemory:
"""
Memory system for AI agents with short-term and long-term storage.
Short-term: In-memory conversation history for current session
Long-term: Persistent JSON storage across sessions
"""
def __init__(self, agent_id: str, memory_file: str = "agent_memory.json"):
self.agent_id = agent_id
self.memory_file = memory_file
# Short-term memory: current conversation
self.conversation_history: List[Dict] = []
# Long-term memory: persistent user data
self.long_term_memory: Dict = self._load_long_term_memory()
def _load_long_term_memory(self) -> Dict:
"""Load persistent memory from disk or create new"""
if os.path.exists(self.memory_file):
with open(self.memory_file, 'r') as f:
all_memories = json.load(f)
return all_memories.get(self.agent_id, {
"user_preferences": {},
"facts": [],
"context": {}
})
return {"user_preferences": {}, "facts": [], "context": {}}
def _save_long_term_memory(self):
"""Persist memory to disk"""
all_memories = {}
if os.path.exists(self.memory_file):
with open(self.memory_file, 'r') as f:
all_memories = json.load(f)
all_memories[self.agent_id] = self.long_term_memory
with open(self.memory_file, 'w') as f:
json.dump(all_memories, f, indent=2)
def add_to_conversation(self, role: str, content: str):
"""Add message to short-term conversation history"""
self.conversation_history.append({
"role": role,
"content": content,
"timestamp": datetime.now().isoformat()
})
def store_fact(self, fact: str):
"""Store important information in long-term memory"""
if fact not in self.long_term_memory["facts"]:
self.long_term_memory["facts"].append({
"content": fact,
"stored_at": datetime.now().isoformat()
})
self._save_long_term_memory()
def update_preference(self, key: str, value: str):
"""Update user preferences in long-term memory"""
self.long_term_memory["user_preferences"][key] = value
self._save_long_term_memory()
def get_context_for_llm(self, max_history: int = 10) -> str:
"""
Format memory into context string for LLM prompts.
Combines recent conversation + relevant long-term memory.
"""
context_parts = []
# Add long-term facts if available
if self.long_term_memory["facts"]:
facts_str = "\n".join([f"- {f['content']}" for f in self.long_term_memory["facts"][-5:]])
context_parts.append(f"Relevant facts about the user:\n{facts_str}")
# Add user preferences
if self.long_term_memory["user_preferences"]:
prefs_str = "\n".join([f"- {k}: {v}" for k, v in self.long_term_memory["user_preferences"].items()])
context_parts.append(f"User preferences:\n{prefs_str}")
# Add recent conversation history
recent_messages = self.conversation_history[-max_history:]
if recent_messages:
conv_str = "\n".join([f"{m['role']}: {m['content']}" for m in recent_messages])
context_parts.append(f"Recent conversation:\n{conv_str}")
return "\n\n".join(context_parts)
def clear_conversation(self):
"""Clear short-term memory (new conversation session)"""
self.conversation_history = []PythonPro Tip: Notice how we separate concerns? Conversation history lives in RAM for speed, while facts and preferences persist to disk. This architecture scales beautifully—I’ve used variations of this for agents handling thousands of daily interactions.
Now we enhance the basic agent from the referenced tutorial with our memory system:
import anthropic
import os
class MemoryEnabledAgent:
"""
AI Agent with conversation and long-term memory capabilities.
Extends basic agent architecture with persistent context.
"""
def __init__(self, agent_id: str = "default_agent"):
self.client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
self.memory = AgentMemory(agent_id)
self.model = "claude-sonnet-4-20250514"
def chat(self, user_message: str, store_facts: bool = True) -> str:
"""
Process user message with full memory context.
Args:
user_message: The user's input
store_facts: Whether to auto-extract facts for long-term storage
Returns:
Agent's response string
"""
# Add user message to conversation history
self.memory.add_to_conversation("user", user_message)
# Build context-aware system prompt
memory_context = self.memory.get_context_for_llm()
system_prompt = f"""You are a helpful AI assistant with memory capabilities.
{memory_context}
When users share important information about themselves, their projects, or preferences, acknowledge it naturally.
Refer back to previous context when relevant to show you remember past interactions."""
# Prepare messages for API call
messages = [{"role": "user", "content": user_message}]
# Call Claude API with memory-enriched context
response = self.client.messages.create(
model=self.model,
max_tokens=1024,
system=system_prompt,
messages=messages
)
assistant_reply = response.content[0].text
# Store assistant response in conversation history
self.memory.add_to_conversation("assistant", assistant_reply)
# Optional: Auto-extract important facts (simple version)
if store_facts and self._seems_important(user_message):
self.memory.store_fact(user_message)
return assistant_reply
def _seems_important(self, message: str) -> bool:
"""Simple heuristic to detect if message contains storable information"""
importance_keywords = ["my name is", "i work on", "i prefer", "remember that", "my project"]
return any(keyword in message.lower() for keyword in importance_keywords)
def remember_preference(self, key: str, value: str):
"""Explicitly store a user preference"""
self.memory.update_preference(key, value)
def new_conversation(self):
"""Start fresh conversation while keeping long-term memory"""
self.memory.clear_conversation()
# Usage example
if __name__ == "__main__":
agent = MemoryEnabledAgent(agent_id="user_john")
# First interaction - agent learns about user
response1 = agent.chat("Hi! My name is John and I'm working on a Python web scraper project.")
print(f"Agent: {response1}\n")
# Second interaction - agent remembers context
response2 = agent.chat("What was I working on again?")
print(f"Agent: {response2}\n")
# Explicitly store preference
agent.remember_preference("coding_style", "PEP 8 strict")
# New session (simulated) - long-term memory persists
agent.new_conversation()
response3 = agent.chat("What do you know about my coding preferences?")
print(f"Agent: {response3}")PythonRun this code and watch the magic happen. Your agent now remembers conversations, tracks preferences, and builds genuine context over time.
“My agent keeps forgetting everything!”
Check your agent_id consistency. Each unique ID gets separate memory storage. If you’re instantiating with different IDs, you’re creating parallel universes where your agent has amnesia. Use consistent identifiers tied to actual users.
“The memory file is getting massive”
Implement retention policies. I typically keep the last 100 facts and 30 days of preferences. Add this to your store_fact method:
# Limit facts to most recent 100
if len(self.long_term_memory["facts"]) > 100:
self.long_term_memory["facts"] = self.long_term_memory["facts"][-100:]Python“Context length errors from the API”
You’re feeding too much history into the LLM. The max_history parameter in get_context_for_llm() exists for this reason. Tune it based on your model’s context window—I use 10-15 messages for most cases.
Pro Tip 💡: Add conversation summarization for long sessions. After 20+ messages, use your LLM to summarize the conversation and store the summary instead of raw messages. This helps compress context while preserving meaning.
Let’s be honest: this approach isn’t perfect.
Scalability ceiling: JSON file storage breaks around 10,000+ users with heavy usage. At that point, migrate to SQLite (simple tutorial here) or PostgreSQL for production deployments.
No semantic search: We’re storing facts as strings, which means we can’t find “that thing about the Python project” without exact keywords. For semantic search, you need vector embeddings—check out Pinecone or Chroma when you’re ready to level up.
Privacy concerns: You’re storing user data in plain text JSON files. Add encryption for sensitive information or comply with GDPR requirements. Never store passwords, personal health data, or financial info without proper security measures.
Mitigation strategies:
_save_long_term_memory() method for sensitive deploymentsYou’ve built a memory-enabled AI agent from scratch. That’s genuinely impressive. 🎉
Ready to go deeper? Here’s your roadmap:
Question to reflect on: What would you build if your agent could remember everything about a user across years of interactions? Customer support that knows your entire product history? A coding assistant that evolves with your project architecture?
Adding memory to AI agent systems transforms them from impressive demos into genuinely useful tools. You now have a production-ready memory architecture that handles conversation context, stores long-term facts, and scales from prototype to production.
The code you’ve written today is the same foundation I use for agents handling real user workloads. The only difference? I’ve added logging, error handling, and swapped JSON for PostgreSQL. But the core architecture? Identical.
Your turn: Clone the code, customize it for your use case, and build something that remembers. Start with a simple personal assistant or a project-specific helper bot. Run it, break it, improve it.
And when your agent successfully recalls a conversation from last week without you prompting it? That’s the moment you’ll understand why memory is the difference between AI toys and AI tools.
Now go build something that sticks around. 🚀
Model Context Protocol (MCP) is Anthropic's open standard that lets AI agents securely access your tools, databases, and APIs. Think of it as USB-C for…
Ever wondered how AI agents like Siri actually work? This comprehensive guide teaches you building AI agent from scratch using Python, covering everything from LLM…
Ever wondered what happens when you run Python code? The Python runtime environment—comprising the interpreter, virtual machine, and system resources—executes your code through bytecode compilation…
This website uses cookies.