Journaling ๐
Coding agents forget what they just did. The session ends, context gets compacted, or you walk away for lunch, and when you come back the agent is ready to try the same fix that failed an hour ago. The usual memory setups (CLAUDE.md, project wikis, and so on) are fine for things that don’t change much, but they don’t capture what’s actually happening in the task you’re working on right now.
There are plenty of approaches to this: vector stores, knowledge bases, custom memory hooks. Most of them are more machinery than the problem probably needs.
A fix is to have the agent maintain a journal file in the working directory. A new file per attempt or session, named with an incrementing number: journal-1.md, journal-2.md, and so on. The agent appends an entry for every step it takes: command run, file edited, output observed, hypothesis formed, result, etc.
Before starting new work, or after a compaction, the agent reads the current journal back. If it’s starting a fresh attempt at something it’s tried before, it can scan the previous journals too.
That’s the whole pattern. Plain markdown files, one per session, append-only, written as the work happens. Nothing fancy.
This mostly helps on longer tasks, like debugging something tricky or poking around in a live environment, or any bigger chunk of dev work where you’re going to step away and come back and wish you had a record of what was already tried.
What Goes in the Journal ๐
Each entry has a timestamp, a one-line summary, and supporting detail. Things I’ve found worth capturing:
- The exact command that was run, and the actual result or output (not a paraphrase)
- Files edited and why
- Hypotheses, and whether they held up
- Dead-ends, with a note on why the thing didn’t work
- Links read during research
- Decisions made, with the reasoning behind them
The key part is that the agent writes entries as it does the work, not as a summary at the end. A summary after the fact loses the dead ends and wrong turns, which are often the most valuable part to read back later.
A CLAUDE.md Example Snippet ๐
Drop this into your CLAUDE.md (or the equivalent instruction file for your harness):
## Journal
At the start of a session, create a new journal file at the root of
the working directory: `journal-N.md`, where N is one higher than
the highest existing `journal-*.md` (start at 1 if none exist).
Append an entry for every non-trivial action you take. Write it as
you do the work, not as a summary at the end.
Each entry should include:
- ISO timestamp (`YYYY-MM-DD HH:MM`)
- One-line summary
- The exact command, if one was run, and the actual result or
output (not a paraphrase)
- Files edited and why
- Hypotheses and whether they held up
- Dead-ends, with a note on why the thing didn't work
- Links read during research
- Decisions made and the reasoning behind them
Before starting new work, or after a context compaction, read the
current journal to orient yourself. If this is a fresh attempt at a
task you've tried before, skim the previous `journal-*.md` files
too.
The imperative tone matters. If you say “consider keeping a journal,” the agent will skip it when the task gets interesting. Tell it to do it.
Simplicity ๐
A few reasons this holds up better than it probably should:
- There’s no retrieval layer to misbehave. The agent already knows how to read and write files.
- The journal survives
/compactand session boundaries. It’s on disk. - It’s human-readable. You can scroll it, grep it, paste pieces into a bug report or PR description.
- It’s harness-independent. Claude Code, Codex, Cursor, pi, or a plain CLI wrapper all work the same way.
- Forcing the agent to write down what it just did and what it expected seems to reduce the “confidently wrong next step” pattern. This is observational, not measured.
Caveats ๐
- The journal can capture raw command output, which may include secrets. Don’t commit it without review. A
.gitignoreentry forjournal-*.mdis reasonable. - A single journal for a long session will get unwieldy. Starting a new numbered file per attempt or session keeps each one scannable.
- Context cost. Reading a long journal back eats tokens. The numbered-per-session approach helps, but on a really long session you may still want to point the agent at just the most recent entries rather than the whole file.
- Vague entries are worse than none. If the agent writes “ran the command, it worked” instead of the actual command and output, the journal stops being useful. Push for verbatim output in the instruction.
Drop the snippet into your CLAUDE.md, run through a real task with it, and see how it goes.