RESEARCH / 2026.06.09

Models Are Rentals. Context Is the Asset.

AI MEMORY BRIEFING
ISSUE 1

MEMORIES

MODELClaude Fable 5

CONTEXT KEPT: 100%

DRAG TO ORBIT
TAP TO ADD A MEMORY

The labs are betting you never notice who owns your context.

In the last 60 days, every frontier lab made a major bet on memory. Anthropic announced Dreaming for its Managed Agents and started testing Memory Files. OpenAI shipped Dreaming V3 as its new memory system. Two new YC-backed memory startups launched. Garry Tan open-sourced his personal AI brain. Andrej Karpathy posted a writeup of how he runs his own LLM-maintained wiki, and dozens of independent builders shipped their own versions of it within weeks.

The frontier labs are racing to own the context layer because they have already figured out what most people building on top of them haven't: switching models is cheap, switching context is not.

That is why they are racing to lock you in with proprietary memory: cross-chat memory, Projects, Memory Sources, Saved Info. The more each one learns about you, the more it costs to leave. None of these are built for you. They are walls.

If you are running a business on AI right now, you need to be ready for this. Every month you build deeper into one provider's walled garden is a month of context you cannot take with you when the next better model ships. And the next better model is always 90 days away. The question is not whether you will eventually need to switch. The question is whether you will be able to.

This is Issue 1 of an ongoing briefing on where AI memory is actually going, who is building it, and what the frontier labs cannot ship. I'm Channing Chasko. I build MyChatArchive. Let's get into it.

Why the lead doesn't last

Start with the rotating crown. The week this issue went out, Anthropic shipped Claude Fable 5, a new tier above Opus that took the top spot on most benchmarks the day it landed. Before that, the strongest general models were some mix of Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, and DeepSeek V4, and which one was "best" depended on the week and the benchmark. A year ago the list looked different. A year from now it will look different again.

People object that frontier models cannot commoditize, because the cost to train one is enormous and still rising. They are right about the cost and wrong about who it protects. A training run that costs hundreds of millions is a moat around the lab, in that it keeps almost everyone else from fielding a frontier model at all. It does close to nothing to keep you, the person building on top, from switching. From where you sit, moving from one model to another is a config change, and if you route through something like OpenRouter you may not even swap an API key. In any real system you are not switching a bare model anyway. You are switching one component inside a stack of prompts, tools, retrieval, and orchestration, and the model is the most replaceable part of that stack, not the least.

What actually matters to a buyer is how long the lead lasts, and the lead is collapsing. The Llama 3 line took about sixteen months to reach GPT-4. DeepSeek V4 closed most of the gap to GPT-5 in about eight months, by NIST's own evaluation. The interval keeps shrinking, largely because distillation lets a fast follower learn directly from a leader's outputs. Frontier capability now has a shelf life measured in single-digit months before someone, often an open-weight someone, catches up.

None of that means the lead is worthless while it lasts. The opposite is true. At the frontier, the last few points of capability are where the value and the difficulty both concentrate. Reliability gets counted in nines, 99 percent, then 99.9, then 99.99, and each nine is roughly ten times harder to reach than the one before, because you are clearing most of the failures that remain every time. Going from ninety to ninety-nine on a hard benchmark is not nine percent better, it is the difference between a model that can finish the job and one that cannot. That is why labs burn fortunes to hold a lead measured in months, and why brief, exclusive access to whoever is in front commands a real premium. You are still renting it. You are just renting the front of the line.

And there is no longer one frontier. There are several running in parallel: the US and EU paid models, the Chinese models (DeepSeek V4, Qwen 3.7, MiniMax M3 shipped June 1), and the open-weight tier (Llama 4, DeepSeek, Qwen, Mistral, Nous Hermes). At any given moment the top paying customer has access to roughly the same capability across providers. That diversity of supply is not a counterargument to commoditization. It is the proof of it.

One honest exception: you can run a local model you genuinely own, and for some workloads that is a real edge. The frontier model you reach for on the hard problems, though, lives in someone else's cloud, and that is the one you are renting.

This does not mean the models are interchangeable. People pay real premiums for the best model on a given task, and the best model for legal reasoning is not the best for code. That premium is exactly the argument for portability: when a better tool ships for the task in front of you, you want to move the same day, and you can only move that fast if your context is not stranded with the last one. A rotating lead is not a moat. So the work moves to what you feed the model. Karpathy has a name for it: context engineering, putting the right information in the window at the right moment. That is the subject of this briefing.

WHY MODELS COMMODITIZE

The lead has a half-life.

GPT-3

OPT-175B

24MO

GPT-4

Llama 3

16MO

GPT-5

DeepSeek V4

8MO

next?

~4MO

next?

~2MO

FRONTIER LEADER SHIPS FAST FOLLOWER CATCHES UP

PUBLIC RELEASE DATES · NIST CAISI · JUNE 2026

Why context is different

Here is the asymmetry. There are maybe a dozen models that matter, and you could list them on a napkin. You cannot do that with context, because context is not a list. It is generative. Every domain where people do knowledge work spins up its own context market the moment AI shows up in it: CRM context, legal context, medical context, sales calls, customer support, code, personal life. Each one is a different shape, accumulates differently, and is worth something different to the people who own it.

That is the part the napkin math misses. "AI memory" is not one market that one company wins. It is a category that keeps generating new markets, one for every domain it touches.

Two properties make context the thing that sticks. It is personal or domain-specific, so it does not transfer cleanly between users or between verticals. And it accumulates, so its value compounds with time. The model you used last year is gone and you do not miss it. The two years of decisions, corrections, preferences, and history that a system has learned about your business is not something you can regenerate by typing faster.

You can watch this happen in real time. Spend time in a sales org adopting AI and you see context scatter: custom dashboards, a dozen connected MCP servers, call notes in one tool, account history in another, every team wiring up its own stack. All of it flows into the model, and the model quietly becomes the place that holds the through-line, the de facto memory for how the business runs. The accumulated context is the real asset, and right now it pools inside whatever model happens to be plugged in.

That is backwards. Switching the model is trivial. Switching the context is the whole cost. If your context lives somewhere you control, in a format you can move, the model underneath becomes a swappable part and you keep the leverage. If it lives inside one provider's product, the provider keeps the leverage, and every month makes it worse.

This is also exactly why the frontier labs cannot win the interesting part. A lab can ship one context primitive: chat memory inside its own app. It cannot ship CRM-grade memory, or legal-grade memory, or memory that understands your codebase, because the value in those is the domain integration, not the storage. Horizontal storage is the commodity. Domain depth is the product. The labs are structurally built to ship the commodity.

If you run anything on AI, the takeaway is not a prediction, it is a to-do. You need a context portability strategy, a written answer to whether you could walk your accumulated context across the street to a better model next quarter. Most teams do not have one. The labs are betting you never write it down.

WHY CONTEXT IS DIFFERENT

A dozen models. Endless contexts.

MODELS ARE A LIST · CONTEXT KEEPS BRANCHING · ONE MARKET PER DOMAIN

The walled garden problem

Look at what the labs actually shipped, and the pattern is unmistakable. Every memory feature is scoped to its own product.

Claude's memory works in Claude. ChatGPT's Memory Sources work in ChatGPT. Google's Saved Info works in Google. There is no version of any of these that follows you to a competitor, because following you to a competitor is the opposite of the point. Your Claude memory lives on Anthropic's servers and works only inside Claude, so the moment you switch to ChatGPT or Gemini or Cursor you start from zero, and every tool ends up building its own separate profile of you. The wall is not a locked door, it is that profile. The longer you stay, the more a tool's cross-chat memory and Projects know about how you work, and that accumulated understanding is exactly what you would have to rebuild from scratch anywhere else.

Take Anthropic's import tool, the thing that is supposed to make switching painless, and look at how it actually works. Anthropic hands you a prompt. You paste that prompt into ChatGPT or Gemini, the other model dumps everything it has learned about you into a block of text, and you paste that block into Claude. Copy, paste, paste. That is the state of the art in cross-tool portability in 2026, a snapshot you carry across by hand. The arrow points one way: the tool exists to pull ChatGPT and Gemini users into Claude. There is a memory export too, but it is mostly there to satisfy data-portability law, and a compliance dump is not the same as a bridge. Every major lab shipped its own version of this copy-paste move inside about a month this spring, under the same regulatory and competitive pressure. They are all fighting the same fight, and it is over switching costs.

The doubling-down is the tell. In May, testing-channel reports surfaced a feature Anthropic is trialing called Memory Files, which would spread a user's notes across multiple structured documents organized by topic and project instead of one blob. Treat that as a leak rather than a launch, but the direction is clear: more memory, better organized, and still living inside the walls. On June 4, OpenAI rolled out Dreaming V3, a background process that synthesizes context from past conversations, replacing its old memory system for everyone, and using the same "dreaming" language Anthropic introduced weeks earlier. The two leaders are now borrowing each other's memory vocabulary. That is what a land grab looks like.

Memory is the highest-leverage retention mechanic the labs have, because it is the one thing that does not move when the model underneath gets beaten. Which is the whole reason you should not let yours live there.

THE WALLED GARDEN PROBLEM

Every tool keeps its own you.

ONE YOU · FOUR TOOLS, FOUR PARTIAL PROFILES · THE COMPLETE ONE EXISTS NOWHERE

The markdown brain movement

While the labs build walls, a different approach has been spreading in the open, and it is the most interesting thing happening in this space right now. Call it the markdown brain.

It started with Andrej Karpathy. He tweeted the idea on April 3 and published the full writeup the next day, laying out how he runs his own LLM-maintained wiki. The architecture is three layers: raw sources at the bottom, immutable and never edited; a markdown wiki in the middle that the LLM writes and maintains, with cross-references, summary lines, and tags; and a schema on top, a configuration file that tells the agent how to behave. By his own post, his instance runs around a hundred articles and four hundred thousand words, and a single new source can touch ten to fifteen wiki pages. His line for it: "Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase." The point he keeps hammering is that ordinary RAG makes the model rediscover everything from scratch on every query, with no accumulation, and a maintained wiki fixes exactly that.

It did not stay a one-off. Within weeks, dozens of independent builders shipped their own versions: llm-wiki-compiler, claude-obsidian, and a long tail of forks and rewrites, most of them local-first and markdown-native.

Then it went up the status ladder. On April 5, Garry Tan open-sourced GBrain, the personal brain he runs his own agents on. Same shape as Karpathy's: a git-versioned markdown repo at the core, a retrieval layer on Postgres and pgvector, and a set of markdown skill files that route work. The repo's own README puts the scale at 146,646 pages, nearly 25,000 people and over 5,000 companies tracked, and 66 cron jobs ingesting his meetings, emails, and calls while he sleeps. When the head of YC publishes his personal memory stack, builders notice.

A month later it shipped as a product. OpenHuman launched in mid-May, hit number one Product of the Day on Product Hunt, and reported several thousand stars and several thousand users inside its first week. It is local-first, built on a hierarchical markdown Memory Tree, Obsidian-compatible, and openly credits Karpathy's workflow as the inspiration. The same pattern has reached coding specifically: Byterover, out of a Vietnam-based team, published an arXiv paper describing a file-based "Context Tree" of markdown entries carrying provenance and synthesis, running with zero external infrastructure, no vector database, no graph database, just markdown on the local filesystem.

The common thread is the whole story: local-first, markdown-native, schema as configuration, raw sources compiled into useful context and then into output, with the raw layer kept sacred. Not everyone agrees markdown is the endgame, and some teams are betting on HTML as the substrate instead, but the honest version of that debate is that the format matters less than the orchestration around it. None of this is RAG bolted onto a chatbot. It is a full pipeline that maintains knowledge over time, and it is pointed in the exact opposite direction from the walled gardens. This is not a moment. It is a movement.

THE MARKDOWN BRAIN

Raw in. Memory out.

RAW STAYS RAW · THE LLM MAINTAINS THE WIKI · A WIKI ACCUMULATES, RAG RE-DERIVES

The funded landscape

If the open-source movement is one half of the picture, the funded startups are the other. Here is the map as of early June 2026, grouped by the kind of context each one is trying to own. The figures move fast, so treat them as a snapshot.

Five companies are fighting to be the general memory layer for agents, the horizontal infrastructure everyone else builds on. Mem0 leads by almost every metric, with a $24M raise, around 58,000 GitHub stars and north of 14 million downloads, and an exclusive deal as the memory provider for the AWS Strands Agent SDK. Its founder Taranjeet Singh frames the goal plainly: every agentic application needs memory the way every application needs a database. The fine print on that AWS deal is its own data point: AWS made Mem0 exclusive for one agent SDK while shipping its own competing AgentCore Memory in Bedrock. Even the biggest cloud is hedging on who owns the memory layer. Letta, the Berkeley spinout out of the MemGPT paper, raised $10M and bets on memory-first agents that learn, with memory you can inspect and edit rather than a black box. Zep is early-stage on funding but punches above it with a temporal knowledge graph that tracks when a fact was true, not just that it was. Cognee, out of Berlin, raised $7.5M and is already running live in seventy-plus companies, Bayer among them. And Supermemory is the outlier on team size: a $3M round backed by Google and Cloudflare executives, built almost entirely by Dhravya Shah, who started it in his dorm room at Arizona State.

One company is going after personal and relational context specifically. Honcho, from Plastic Labs in New York, raised a $5.35M pre-seed and sits around five thousand stars and growing fast, and its bet is different from the rest: model the user's psychology over time, treat users and agents and groups as first-class "peers," and reason dialectically about who someone is. It is the only one in the set explicitly building a model of the person rather than a store of their facts.

The rest of the action is splitting along verticals and angles the horizontal players are not built for. Byterover, covered above, owns coding context. On the company-and-team layer, two YC startups from the current batch are racing: Memory Store, building "your company's brain" out of Slack, Gmail, and Granola with self-updating briefs, and Wato, pitching an orchestration layer for AI inside institutions. Neither has won that layer yet, and it may be the most valuable one. Two newer entrants are attacking from the edges: Graphon AI raised $8.3M in May for what it calls a pre-model intelligence layer, structured relational graphs that sit in front of the model, and Walrus Memory, a portable layer that the Sui-based storage platform Walrus launched on June 3, puts encrypted memory on decentralized storage so agents can carry context across providers instead of rebuilding it each time. Walrus stands out because provider-independent portability is the whole pitch, which is still rare.

That is a dozen-plus funded attempts at "memory," and most of them are not really competing with each other. They are colonizing different contexts. Which is the thesis again: the context layer is not one market with one winner. It is a category that keeps minting them.

AI MEMORY · FUNDED LANDSCAPE

Where the money went.

$0$6M$12M$18M$24M

Mem0

$24M

Letta

$10M

Graphon

$8.3M

Cognee

$7.5M

Honcho

$5.35M

Supermemory

~$3M

Zep

UNDISCLOSED

New · YC S26

WatoMemory Store

A dozen funded bets. Almost none competing head to head.

DISCLOSED ROUNDS · JUNE 2026

What is still underbuilt

For all that funding and all those stars, the genuinely open problems are not the ones getting the most money. A few stand out.

Real cross-lab portability barely exists. Almost everything above either lives inside one lab's walls or inside one vendor's cloud. Walrus Memory, mentioned above, is the most explicit swing at making provider-independent portability the product itself, and it is only days old. The space where you own your context and point it at whichever model is best this quarter is still mostly empty.

Vertical depth is thin. There is a lot of horizontal "memory layer for agents" and very little memory that deeply understands a specific domain. CRM context, legal context, medical context, education context: these are large, defensible markets, and the winner in each will look more like a domain company that happens to do memory than a memory company that happens to touch a domain. The horizontal players are not positioned to go that deep, and the labs cannot.

Personal cross-model archives are early. Most people now have their actual thinking scattered across ChatGPT, Claude, Cursor, and a couple of others, with no single place that holds it and no clean way to move it. That is the specific gap I am building into, and I will come back to it.

There is one open question I am only going to flag here. Most of these systems maintain a single canonical interpretation of you and update it over time. I think the more interesting design generates several interpretations from the same raw substrate and lets the agent or the user choose, rather than betting everything on one synthesis that has to be corrected forever. That is a future piece of its own, so I will leave it there.

And the team-and-company layer has no clear winner. Memory Store and Wato are both running at it, the labs are circling it with their managed-agent products, and it may be the largest prize of all because the buyer is an organization, not a person. As of today nobody owns it.

What I'm building, and what I'm watching

The reason I am writing any of this is that I hit the problem myself. My own thinking is scattered across ChatGPT, Claude, Cursor, and Grok, with no single place that holds it and no clean way to move it. So I started building MyChatArchive, partly to fix that for myself and partly to understand this space from the inside instead of from the outside. It is open source, local-first, and AGPL: it pulls your exports from those tools into one store you own on your own machine, and exposes it to your agents over MCP so any model can read your history instead of starting cold. Late last month I shipped a plugin that adds it as a memory provider inside the Hermes ecosystem. I started at the personal layer because that is the problem I had, not because I think the opportunity stops there. The same context problem shows up at the team and enterprise level, harder and worth more, and I would build toward whichever version of it turns out to matter most.

Three things I am watching from here: whether Anthropic's Memory Files actually ships and stays locked inside Claude when it does, whether the markdown brain movement matures from weekend projects into something normal people can run without being Karpathy, and whether anyone cracks the team-and-company layer before the labs fence it off inside their managed-agent products.

That is the map as I see it. The pieces are moving fast enough that some of this will be wrong by Issue 2, and I would rather stake a position and be corrected than hedge and say nothing.

The one thing to take away

The compounding returns in AI do not come from the model. They come from everything you build around it: the context you accumulate, the workflows you wire up, the way the whole thing is orchestrated. Everyone has access to the same models now, and people still get wildly different results from them, because the difference is the architecture around the model: MCP tools, memories, knowledge graphs, what context reaches the window and how.

If you run a business on this, the action item is concrete: figure out where your context lives, what format it is in, and whether you could move it to a better model next quarter without starting over. If the answer is that you could not, you have just found the most important thing on your roadmap.

There is more coming, on this and the other lanes I write in. If you want it, subscribe at chasko.ai.