AutoGen Tutorial (Python): adding memory to agents for advanced developers

By Cyprian AaronsUpdated 2026-04-21

autogenadding-memory-to-agents-for-advanced-developerspython

This tutorial shows how to give an AutoGen agent persistent memory in Python using a simple, production-friendly pattern: store facts outside the chat loop, retrieve them before each turn, and inject them back into the agent’s context. You need this when you want agents to remember user preferences, case history, policy details, or prior decisions across sessions instead of treating every conversation like a blank slate.

What You'll Need

•Python 3.10+
•autogen-agentchat
•autogen-ext
•An OpenAI API key set as OPENAI_API_KEY
•A working AutoGen setup with basic agent and chat loop familiarity
•
Optional but useful:
- •SQLite for local persistence
- •A vector store if you want semantic retrieval later

Install the packages:

pip install autogen-agentchat autogen-ext openai

Step-by-Step

•Start with a normal AutoGen assistant and a memory store.

The simplest reliable pattern is not to stuff everything into the model prompt. Keep memory in your own store, then retrieve relevant items before each response. Here we’ll use SQLite because it is durable, easy to inspect, and good enough for most internal tools.

import sqlite3
from dataclasses import dataclass
from typing import List

@dataclass
class MemoryItem:
    user_id: str
    fact: str

class MemoryStore:
    def __init__(self, db_path: str = "memory.db"):
        self.conn = sqlite3.connect(db_path)
        self.conn.execute(
            """
            CREATE TABLE IF NOT EXISTS memories (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                user_id TEXT NOT NULL,
                fact TEXT NOT NULL
            )
            """
        )
        self.conn.commit()

    def add(self, user_id: str, fact: str) -> None:
        self.conn.execute(
            "INSERT INTO memories (user_id, fact) VALUES (?, ?)",
            (user_id, fact),
        )
        self.conn.commit()

    def list(self, user_id: str) -> List[str]:
        rows = self.conn.execute(
            "SELECT fact FROM memories WHERE user_id = ? ORDER BY id DESC",
            (user_id,),
        ).fetchall()
        return [row[0] for row in rows]

•Build an AutoGen assistant that can receive memory as extra context.

In AutoGen 0.4+, the clean way is to create an assistant agent and pass memory into the system message or task input before each call. That keeps the agent stateless while your application handles persistence.

import os
import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
)

assistant = AssistantAgent(
    name="support_assistant",
    model_client=model_client,
    system_message=(
        "You are a support assistant. Use provided memory facts when relevant. "
        "If memory conflicts with the user's latest message, trust the latest message."
    ),
)

•Add a helper that injects retrieved memory into the prompt.

This is the core pattern. Fetch facts for the current user, format them as a short block, then prepend them to the request so the model sees them on every turn.

def build_prompt(user_message: str, memories: list[str]) -> str:
    memory_block = "\n".join(f"- {fact}" for fact in memories) if memories else "- None"
    return f"""Relevant memory:
{memory_block}

User message:
{user_message}
"""

store = MemoryStore()
store.add("user_123", "User prefers concise answers.")
store.add("user_123", "User works in insurance claims automation.")

prompt = build_prompt("Draft a follow-up email for my client.", store.list("user_123"))
print(prompt)

•Wrap the agent call in an async function and keep writing back new facts.

You usually want two kinds of memory: retrieved context and newly learned facts. In practice, you can store explicit preferences or stable profile data after each interaction instead of trying to auto-extract everything from raw chat logs.

async def run_turn(user_id: str, user_message: str):
    memories = store.list(user_id)
    prompt = build_prompt(user_message, memories)

    result = await assistant.run(task=prompt)
    print(result.messages[-1].content)

    # Example of app-managed memory write-back.
    if "prefer concise" in user_message.lower():
        store.add(user_id, "User prefers concise answers.")

if __name__ == "__main__":
    asyncio.run(run_turn("user_123", "Can you summarize this case update?"))

•Add a simple extraction step for durable facts.

For advanced use cases, don’t rely on manual writes only. Extract stable facts like preferences, identity attributes, or long-lived project context from the conversation and persist them separately from transient chat content.

def extract_memory_candidates(user_message: str) -> list[str]:
    candidates = []
    lower = user_message.lower()

    if "i prefer" in lower:
        candidates.append(user_message.strip())
    if "my team" in lower or "our team" in lower:
        candidates.append(user_message.strip())
    if "remember that" in lower:
        candidates.append(user_message.strip())

    return candidates

async def run_turn_with_extraction(user_id: str, user_message: str):
    for fact in extract_memory_candidates(user_message):
        store.add(user_id, fact)

    prompt = build_prompt(user_message, store.list(user_id))
    result = await assistant.run(task=prompt)
    return result.messages[-1].content

Testing It

Run the script twice with messages from the same user_id. On the second run, verify that earlier stored facts appear in the “Relevant memory” block before the model responds. Then change one preference explicitly and confirm your application writes a new memory entry instead of overwriting old data blindly.

A good test is to ask something like “What tone should you use with me?” after storing “User prefers concise answers.” If retrieval is working, the model should answer with that preference without being re-taught inside the same prompt.

If you want stronger validation, inspect memory.db directly with SQLite tools and confirm rows are being inserted per user. That catches bugs where retrieval works but persistence silently fails.

Next Steps

•Replace keyword-based extraction with an LLM-based memory classifier that decides what gets stored.
•Add semantic retrieval using embeddings so agents recall related facts even when exact keywords don’t match.
•Separate short-term session state from long-term profile memory so your agent doesn’t accumulate junk over time.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit