Skip to content

Module 3: Memory & Context

  • Understand the types of agent memory
  • Implement persistent customer memory
  • Learn hierarchical context management (L1/L2/L3)
  • Understand token optimization strategies
  • Explore the Skills.md pattern for modular context
TypeScopeExample
ConversationCurrent sessionChat history within one interaction
Short-termRecent sessionsUser’s last few requests
Long-termPersistentCustomer preferences, past orders
EpisodicLearning”Last time this customer had a billing issue, they needed…”

Strands Agents maintains conversation memory automatically within a session. For persistent memory, we implement our own storage (or use AgentCore Memory in production).

  1. Open the memory module

    Terminal window
    code module_03_memory/agent_with_memory.py
  2. Review the memory tools

    We create two tools for persistent memory:

    @tool
    def remember_customer_preference(
    customer_id: str, key: str, value: str
    ) -> dict:
    """Store a customer preference for future interactions."""
    memory = load_memory()
    memory[customer_id][key] = value
    save_memory(memory)
    return {"stored": True, ...}
    @tool
    def recall_customer_info(customer_id: str) -> dict:
    """Recall stored info about a customer."""
    memory = load_memory()
    return memory.get(customer_id, {})
  3. Run the memory agent

    Terminal window
    python module_03_memory/agent_with_memory.py
  4. Test memory across conversations

    You: Hi, I'm Alice. I prefer email communication and love electronics.
    You: What do you remember about Alice?
    You: What headphones do you have?

    Now restart the agent and ask again:

    You: What do you know about Alice?

    The memory persists across sessions because it’s stored in a JSON file.

One of the most important concepts in production agents is managing the context window efficiently. LLMs have limited context windows, and every token costs money.

block-beta
    columns 1
    block:L1["L1: Active Context (Always Loaded) ~200 tokens"]:1
        A["System Prompt"] B["Current Conversation"] C["Tool Definitions"]
    end
    block:L2["L2: On-Demand Context (Loaded by Tools)"]:1
        D["FAQ Entries"] E["Customer Memory"] F["Policy Details"]
    end
    block:L3["L3: External Storage (API Calls)"]:1
        G["Product Catalog"] H["Order Database"] I["Ticket System"]
    end

    style L1 fill:#4CAF50,color:#fff
    style L2 fill:#FF9800,color:#fff
    style L3 fill:#2196F3,color:#fff

L1 (Always loaded): The system prompt and tool descriptions are always in context. Keep this lean, ~200-500 tokens for the prompt.

L2 (Loaded on demand): Detailed information loaded only when the agent calls a tool. FAQ entries, customer preferences, and detailed policies live here.

L3 (External): Full databases and APIs accessed through tool calls. Only relevant slices of data enter the context.

StrategyHowSavings
Progressive disclosureLoad details only when needed60-80%
Response summarizationSummarize long tool outputs30-50%
Focused tool outputReturn only relevant fields20-40%
Context pruningDrop old conversation turnsVariable
Skills.md patternFrontmatter always loaded, body on demand70-90%

The Skills.md pattern (popularized by Anthropic’s Claude Code) is a powerful context management technique:

skills/
├── billing/
│ └── SKILL.md # Frontmatter: name + description (always visible)
│ # Body: full instructions (loaded on demand)
├── returns/
│ └── SKILL.md
└── technical/
└── SKILL.md

Frontmatter (lightweight, always in context):

---
name: billing-support
description: Handle billing inquiries, payment issues, and refunds
---

Body (loaded only when the skill is activated):

## Instructions
When handling billing inquiries:
1. Always look up the order first
2. Verify customer identity
3. Check refund eligibility...

This lets you register hundreds of skills while keeping the context window lean, only the names and descriptions are always visible.

In production, you’d replace the JSON file with Amazon Bedrock AgentCore Memory:

# Conceptual - AgentCore Memory integration
from bedrock_agentcore.memory import AgentCoreMemory
memory = AgentCoreMemory(agent_id="supportbot")
memory.store("alice", {"preference": "email", "interests": "electronics"})
info = memory.recall("alice")

AgentCore Memory supports:

  • Semantic memory: Factual knowledge about customers
  • Episodic memory: Learning from past interactions
  • Cross-session persistence: Memory survives agent restarts
  • Automatic indexing: Find relevant memories by context
  • Memory transforms agents from stateless to personalized
  • Use hierarchical context (L1/L2/L3) to manage token budgets
  • The Skills.md pattern enables scalable, modular agent capabilities
  • Always load context on-demand, don’t stuff everything into the prompt
  • In production, use AgentCore Memory for persistent, scalable storage