Module 3: Memory & Context
Learning Objectives
Section titled “Learning Objectives”- Understand the types of agent memory
- Implement persistent customer memory
- Learn hierarchical context management (L1/L2/L3)
- Understand token optimization strategies
- Explore the Skills.md pattern for modular context
Types of Agent Memory
Section titled “Types of Agent Memory”| Type | Scope | Example |
|---|---|---|
| Conversation | Current session | Chat history within one interaction |
| Short-term | Recent sessions | User’s last few requests |
| Long-term | Persistent | Customer preferences, past orders |
| Episodic | Learning | ”Last time this customer had a billing issue, they needed…” |
Strands Agents maintains conversation memory automatically within a session. For persistent memory, we implement our own storage (or use AgentCore Memory in production).
Hands-On: Memory-Aware Agent
Section titled “Hands-On: Memory-Aware Agent”-
Open the memory module
Terminal window code module_03_memory/agent_with_memory.py -
Review the memory tools
We create two tools for persistent memory:
@tooldef remember_customer_preference(customer_id: str, key: str, value: str) -> dict:"""Store a customer preference for future interactions."""memory = load_memory()memory[customer_id][key] = valuesave_memory(memory)return {"stored": True, ...}@tooldef recall_customer_info(customer_id: str) -> dict:"""Recall stored info about a customer."""memory = load_memory()return memory.get(customer_id, {}) -
Run the memory agent
Terminal window python module_03_memory/agent_with_memory.py -
Test memory across conversations
You: Hi, I'm Alice. I prefer email communication and love electronics.You: What do you remember about Alice?You: What headphones do you have?Now restart the agent and ask again:
You: What do you know about Alice?The memory persists across sessions because it’s stored in a JSON file.
Hierarchical Context Management
Section titled “Hierarchical Context Management”One of the most important concepts in production agents is managing the context window efficiently. LLMs have limited context windows, and every token costs money.
The Three-Level Pattern
Section titled “The Three-Level Pattern”block-beta
columns 1
block:L1["L1: Active Context (Always Loaded) ~200 tokens"]:1
A["System Prompt"] B["Current Conversation"] C["Tool Definitions"]
end
block:L2["L2: On-Demand Context (Loaded by Tools)"]:1
D["FAQ Entries"] E["Customer Memory"] F["Policy Details"]
end
block:L3["L3: External Storage (API Calls)"]:1
G["Product Catalog"] H["Order Database"] I["Ticket System"]
end
style L1 fill:#4CAF50,color:#fff
style L2 fill:#FF9800,color:#fff
style L3 fill:#2196F3,color:#fff
L1 (Always loaded): The system prompt and tool descriptions are always in context. Keep this lean, ~200-500 tokens for the prompt.
L2 (Loaded on demand): Detailed information loaded only when the agent calls a tool. FAQ entries, customer preferences, and detailed policies live here.
L3 (External): Full databases and APIs accessed through tool calls. Only relevant slices of data enter the context.
Token Optimization Strategies
Section titled “Token Optimization Strategies”| Strategy | How | Savings |
|---|---|---|
| Progressive disclosure | Load details only when needed | 60-80% |
| Response summarization | Summarize long tool outputs | 30-50% |
| Focused tool output | Return only relevant fields | 20-40% |
| Context pruning | Drop old conversation turns | Variable |
| Skills.md pattern | Frontmatter always loaded, body on demand | 70-90% |
The Skills.md Pattern
Section titled “The Skills.md Pattern”The Skills.md pattern (popularized by Anthropic’s Claude Code) is a powerful context management technique:
skills/├── billing/│ └── SKILL.md # Frontmatter: name + description (always visible)│ # Body: full instructions (loaded on demand)├── returns/│ └── SKILL.md└── technical/ └── SKILL.mdFrontmatter (lightweight, always in context):
---name: billing-supportdescription: Handle billing inquiries, payment issues, and refunds---Body (loaded only when the skill is activated):
## InstructionsWhen handling billing inquiries:1. Always look up the order first2. Verify customer identity3. Check refund eligibility...This lets you register hundreds of skills while keeping the context window lean, only the names and descriptions are always visible.
AgentCore Memory (Production)
Section titled “AgentCore Memory (Production)”In production, you’d replace the JSON file with Amazon Bedrock AgentCore Memory:
# Conceptual - AgentCore Memory integrationfrom bedrock_agentcore.memory import AgentCoreMemory
memory = AgentCoreMemory(agent_id="supportbot")memory.store("alice", {"preference": "email", "interests": "electronics"})info = memory.recall("alice")AgentCore Memory supports:
- Semantic memory: Factual knowledge about customers
- Episodic memory: Learning from past interactions
- Cross-session persistence: Memory survives agent restarts
- Automatic indexing: Find relevant memories by context
Key Takeaways
Section titled “Key Takeaways”- Memory transforms agents from stateless to personalized
- Use hierarchical context (L1/L2/L3) to manage token budgets
- The Skills.md pattern enables scalable, modular agent capabilities
- Always load context on-demand, don’t stuff everything into the prompt
- In production, use AgentCore Memory for persistent, scalable storage