From Pointers to Footnotes: Why Agents will be “Reference-First”

If you’ve built anything non-trivial with LLMs, you’ve felt the temptation: just paste more context. Another chunk. Another thread. Another doc. Another “final final” spec.

It works — until it doesn’t.

Costs climb, prompts sprawl, and reliability gets weirdly brittle. You start debugging where the important sentence sits, not what it says. And you realize you’re treating the context window like a warehouse when it’s really a workbench.

This is where an old idea returns: reference-first thinking.

Not the buzzword kind. The systems-history kind.

Because we’ve been here before — back when memory was measured in kilobytes, and operating systems had to be built on machines like the PDP-11. That era didn’t solve scarcity by “remembering more.” It solved it by inventing better ways to point.

The recurring problem: scarce workspaces

There’s a reason “workbench” is the right metaphor. In cognitive science, working memory isn’t long-term storage; it’s the limited-capacity space where you hold a few things and manipulate them. (PubMed)

And it’s small. Reviews of working-memory capacity often put the core limit around 3–5 chunks under many conditions. (PMC)

That’s not a flaw — it’s a design constraint. Humans compensate with external memory: notebooks, libraries, indexes, footnotes. We don’t carry the whole book in our heads; we remember how to find it.

LLM agents are facing the same constraint in silicon: the context window is a limited working set. You can make it bigger, but you still need to manage it like a scarce resource.

Act I: Pointers weren’t a “C quirk” — they were a scarcity technology

C didn’t “invent” addresses. Hardware has always lived in an addressable world. What C did — famously and unapologetically — was to expose that reality as a programming model.

Dennis Ritchie’s own account of C’s development is inseparable from Unix and the PDP-11 era. He describes early Unix work moving onto the PDP-11 and the evolution from B toward what became C. (Nokia Corporation | Nokia)

The deeper point: in that environment, copying data was expensive, and identity mattered. A pointer is a compact way to say:

“Not this thing — **where this thing lives**.”

This is reference-first as an engineering instinct:

  • Don’t ship the payload if you can ship an address.
  • Don’t duplicate state if you can share a reference.
  • Don’t confuse “same value” with “same thing.”

In other words: addressability is how you scale when the workspace is tiny.

Act II: Unix turns reference-first into a universal interface (the file descriptor)

If pointers are the reference primitive for memory, file descriptors (FDs) are the reference primitive for system resources.

Here’s the key sentence from the Linux man-pages (which is also the cleanest conceptual definition I know):

“A file descriptor is a reference to an open file description.” (man7.org)

And open() doesn’t just “open a path.” It creates an open file description—an entry in a system-wide table that stores state like file offset and status flags. (man7.org)

The killer detail is what comes next: this reference is stable even if the filesystem name changes. The man-page notes that the FD’s reference is “unaffected” if the path is removed or modified to refer to a different file. (man7.org)

That’s a systems definition of identity:

  • Path is a label you can rewrite.
  • FD is a handle to a specific open-instance state.

Unix didn’t make you carry the file around. It gave you a small integer — an ID you can pass, store, duplicate, hand off, and compose into pipelines.

That’s not an implementation detail. It’s a design philosophy:

Make the expensive thing live elsewhere. Pass around cheap, stable references.

Act III: LLMs live in a “value world” (and that’s why they drift)

Now swing the mirror to LLM agents.

An LLM doesn’t naturally have “addresses.” It has token sequences and learned associations. The default mode is value-first:

  • “Put the text in the context.”
  • “Summarize the text into the context.”
  • “Rephrase the text into the context.”

Everything becomes content, and content gets duplicated. That’s why agent memory systems often degrade into:

  • re-summarizing summaries,
  • losing provenance,
  • blending sources,
  • and slowly drifting.

Even if you buy a bigger desk (longer context), there’s a second problem: models don’t necessarily use long context robustly.

The paper “Lost in the Middle” tests long-context models on tasks where the relevant information is at different positions in the prompt. They find performance is often best when the key information is at the beginning or end — and can degrade sharply when it’s in the middle. (arXiv)

So “just add more” fails twice:

  1. it’s expensive and messy, and
  2. it’s not even reliably utilized.

A bigger desk is not a library.

Act IV: Reference-first is the missing primitive for agents

This is why serious agent design trends toward what you called “use memory like a library”:

  • Context = workbench / working set
  • Memory = external store
  • References = the bridge

In research, retrieval-augmented generation (RAG) is one prominent expression of this idea: combine parametric memory (the model’s weights) with explicit non-parametric memory (an external index), and fetch what you need at generation time. (arXiv)

What’s especially telling is why the RAG paper motivates the approach. It explicitly calls out provenance and the difficulty of updating world knowledge as core open problems for pure parametric models. (arXiv)

That’s reference-first, stated plainly:

  • If you need provenance, you need links back to sources.
  • If you need updates, you need knowledge to live in a place you can swap, version, and re-index, not re-train into weights.

This is how “pointers to footnotes” becomes more than a metaphor:

  • Unix handles let you track which open resource instance you’re using.
  • Agent references let you track which source you’re grounding on.

A citation is a human-friendly FD.

The reference-first pattern (for Medium readers who want something actionable)

Here’s the minimal architecture shift:

1) Store handles, not payloads, in the workbench

Instead of pasting full documents into context, carry compact references:

  • doc_id, chunk_id, message_id
  • version, timestamp
  • hash or signature for integrity

2) Make dereferencing a first-class operation

A reference isn’t useful unless it’s reliably dereferenceable:

  • fetch the exact chunk,
  • retrieve by version,
  • with consistent permissions.

3) Treat “verification” as part of memory, not a bonus

Unix FDs are meaningful because they refer to something real and stable. Similarly, agent references should support:

  • provenance (where did this come from?)
  • integrity (did it change?)
  • reproducibility (can I get the same thing tomorrow?)

4) Keep a small “working set” on the desk

Working memory isn’t a database. It’s a staging area:

  • the current plan,
  • the current constraints,
  • the current open questions,
  • plus a handful of references.

That’s it.

A useful warning: similarity is not identity

One of the most common agent design mistakes is treating vector similarity like an address.

Similarity gives you neighbors, not the thing itself. Two passages can embed close but mean different things. Or the same source can drift after edits, leaving your “reference” pointing to a different semantic region over time.

Unix solved this by separating labels (paths) from handles (FDs). (man7.org) Agent memory needs a similar separation:

  • embeddings help you find candidates,
  • but stable references decide what you actually mean.

Why this is a “systems history” story, not an AI fad

Once you see the pattern, it stops being about LLMs.

When the workspace is scarce, systems evolve toward addressability:

  • C makes memory addressable.
  • Unix makes resources handleable.
  • Humans make knowledge citeable.
  • Agents will make context manageable by becoming reference-first.

Not because it’s elegant, but because it’s what survives scale.

And the evidence is already visible:

  • Long-context models don’t consistently exploit “more” context well. (arXiv)
  • Retrieval-based approaches explicitly target provenance and updatability. (arXiv)
  • Even in human cognition, the workbench is inherently limited. (PMC)

Closing: Context is a workbench; memory is a library

If you want a single line to end the piece (and to guide your agent designs):

Don’t make your agent “remember more.” Make it “point better.”

That’s the throughline from pointers to file descriptors to footnotes.

And it’s why reference-first isn’t a technique — it’s a re-discovery of an old systems truth.

Citation

Zhang, Di. "From Pointers to Footnotes: Why Agents will be “Reference-First”." Zhang Di Blog, January 25, 2026. Originally published on Medium.

BibTeX
@misc{zhang2026frompointerstofootnotesw,
  title = { From Pointers to Footnotes: Why Agents will be “Reference-First” },
  author = { Zhang, Di },
  year = { 2026 },
  month = { January },
  howpublished = {\url{ https://trotsky1997.github.io/blog/from-pointers-to-footnotes-why-agents-will-be-reference-first/ }},
  note = {Blog post; originally published on Medium: https://medium.com/@di-zhang-fdu/from-pointers-to-footnotes-why-agents-will-be-reference-first-88d0e9730e37}
}