Skip to main content

Command Palette

Search for a command to run...

Memory Is Half the Job of Building an Intelligent AI Agent

Choosing the memory architecture that fits your agent's actual reasoning pattern — not the one that sounds most sophisticated on a slide — is, in my estimate, fully half the job of building an intelligent agent

Updated
5 min read
Memory Is Half the Job of Building an Intelligent AI Agent
S

Helping startup founders build their products, drive their growth @ WeBuidl

Human intelligence runs on two core systems: memory and creativity.

Everything else — reasoning, judgment, instinct — is built on top of those two. AI agents are no different. We talk endlessly about the creativity side: better prompts, better reasoning chains, better tool use. Memory gets treated like plumbing. It isn't. It's half the job.

An agent without memory isn't an agent. It's a very articulate autocomplete that forgets you exist the moment the session ends. Real agentic behavior — the kind that gets better over time, that remembers what worked last quarter — needs memory architecture, not just a longer context window. Picking the wrong one is one of the most expensive mistakes you can make early, because it's painful to rip out later.

There isn't one right answer. There are four real patterns, each the right call for a different shape of problem.

Chat History and Agent State

This is the baseline — short-term working memory. The running conversation, current task state, maybe a scratchpad the agent writes to mid-execution. Cheap, fast, and what every agent has by default whether you designed for it or not.

Use it for: single-session tasks where continuity beyond "right now" doesn't matter. A coding agent mid-task, a support bot resolving one ticket.

Don't use it for: anything that needs to persist. The moment the agent needs to remember a preference from three weeks ago, working memory falls apart. Long-context models with large windows at flat pricing have made "just stuff more into context" cheaper than people assume for smaller agents — but that's a patch, not an architecture. It still resets.

RAG and Vector Stores

This is the default most teams reach for, and for good reason — embed your documents, retrieve by semantic similarity, stuff the top results into context. It's mature, well-tooled, and good at one thing in particular: surfacing the most similar piece of information to a query.

Use it for: large, relatively flat document corpora where similarity search is genuinely what you need — FAQ retrieval, support docs, anything where "find me the chunk that sounds like this question" is the actual task.

Don't use it for: anything requiring multi-hop reasoning or relationship-tracing. A vector store hands back a ranked list of similar chunks. It can't tell you that customer X uses product Y, which had an incident last month similar to a case from a different customer entirely. That's a structural limitation — flat similarity search can't traverse relationships it was never asked to encode.

Knowledge Graphs (Neo4j and friends)

This is the architecture for when relationships are the data. Entities, edges, multi-hop traversal — an agent can walk from a fact to a related fact, which a vector store fundamentally cannot do. The more advanced versions now treat time as a first-class dimension — facts get versioned, and the agent can reason about what was true versus what's true now.

Use it for: domains that are genuinely relational — fraud detection, compliance webs, anything where the value is in the connections between entities, not just the entities themselves.

Don't use it for: everything, which is the trap I almost fell into. Knowledge graphs are powerful and also heavy — schema design, entity extraction, ongoing graph maintenance. If your problem doesn't actually need multi-hop relational reasoning, you're paying graph-database tax for vector-store value.

Wiki-Style Document Memory

This is the one I don't see written about enough, probably because it's less novel than a graph and less default than RAG — but it's been the right answer for a system I recently built.

I was building a financial research and analysis system where the agent's memory is, functionally, a wiki: structured notes, reports, and insights, each its own document. The pattern is two-stage retrieval. First, the agent uses indexing and metadata filters to retrieve a list of candidate notes — titles, tags, summaries — without pulling full content. Then, based on that list, it retrieves the full content of only the notes actually relevant to the task.

I almost built this as a knowledge graph first. It felt like the "correct" engineering answer. It took me longer than I'd like to admit to realize research and analysis isn't fundamentally about traversing relationships between entities — it's about retrieving the right documents and reasoning over them as wholes. A graph would've fragmented exactly what made the notes useful: the analyst's full reasoning, written in one place.

Use it for: structured analytical work where documents are the natural unit of knowledge — research notes, investment theses, audit reports, anything a human analyst would also want to read whole, not as a fragmented graph node. It's also the most enterprise-friendly of the four: a wiki of notes is something a finance or research team can browse and manage directly, outside the agent entirely. That matters — memory only an engineer can inspect is memory the rest of the org doesn't trust.

Don't use it for: high-volume, low-structure data where indexing every chunk as a discrete "note" creates more overhead than value, or for problems that are genuinely relational rather than document-centric.


Choosing the memory architecture that fits your agent's actual reasoning pattern — not the one that sounds most sophisticated on a slide — is, in my estimate, fully half the job of building an intelligent agent. The other half is creativity: reasoning, prompting, judgment under ambiguity. I'll get into that next week.

Get memory wrong and no amount of prompt engineering saves you. Get it right and the agent starts looking less like a clever autocomplete and more like something that's actually been paying attention.