Fresh Hacker News | The Log is the Agent

93 points by iacguy 15 hours ago | 22 comments

▲gcr 3 hours ago

Very cool work!! This is the same pattern we used at $MY_STARTUP to develop $MY_HARNESS which persists the entire graph to disk, unlike all the other agent harnesses which only store the graph nodes and edges.

Event graphs aren’t just the agentic foundation for $MY_HARNESS — they’re the working cognitive substrate, native to what our favorite toolcall gremlins actually consume.

(Looking for lead investors for our angel syndicate btw! DM me if interested)

▲agentdev001 2 hours ago

Nice work! Excited to try $YOUR_HARNESS out!

Reading your comment reminded me; I actually did something quite similar at $MY_BETTER_STARTUP! My approach is slightly different, however, employing what I like to call State-Horizon-Aware-Rercursive-Threaded-Graph-Position-Topology.

With a 400% increase in words, $MY_WAY_BETTER_HARNESS looks to be about four times as performant. SHARTGPT isn’t just a harness engineer's playground — it's a the cyber jungle gym that frees them from $MY_HARNESS.

If you want, I can even include a sentence or two that will really tell those potential investors why they should shower you with money instead of the other commenter! Just say the word!

(Looking for lead investors for our angel syndicate btw! DM me if interested)

▲knollimar 2 hours ago

Did you just emdash not-just-X-it's-Y unironically or am I missing the satire?

▲gcr 2 hours ago

Oh pardon, I’m trying to sarcastically complain how a lot of comments in this thread have a similar form: “I used this pattern in my own agent, which is different from (all the other agents which use the same representation)”

Agentic development tends to encourage siloed individualistic development, so a lot of engineers reinvent similar patterns from first principles. It’s easier to write your own new thing than survey other approaches, so you’re more likely to perceive good ideas as original to your session.

▲dominotw 1 hour ago

https://news.ycombinator.com/item?id=47425470

i had asked about this a while ago

▲lmwnshn 2 hours ago

I was at an AI event and the number of people who have started talking like AI output is crazy.

▲dd8601fn 29 minutes ago

You were right to call that out. We need to identify the real shape of the problem.

▲gcr 2 hours ago

It’s insidious! I used Claude code at my last job enough for it to influence my writing and speaking style without me really noticing, even though I tried to be wary of that.

▲lmwnshn 1 hour ago

Soon we'll be able to carbon date humans based on AI exposure. brb pivoting my startup

▲Terr_ 13 minutes ago

"That new hire tested very well. It only took five minutes of AI interviewing before he threw the monitor out of the window."

▲ 14 minutes ago

▲bob1029 47 minutes ago

> graph-memory research

The problem I have with graph representations relative to LLMs is that you can never directly apply the concept. Everything that speaks to an LLM must ultimately be serialized. There's no getting around the token stream semantics.

I've found that one big flat markdown file tends to outperform everything. You could certainly project an event log into a graph and then serialize that, but it starts to feel like a Rube Goldberg machine at this point. It's a lot easier if you just work with the same terms that the models do.

Remember if that big document rarely changes and everything that comes before it is also constant, you'll pay something like 10% of the normal rate with providers like OAI for the tokens in those documents. The clever schemes to piecemeal out information feel good to the ego and might appeal to accounting at first glance, but I think the bitter lesson will ultimately win out here. We already have a million token context windows. Even if 90% of that is bullshit it's still a lot of tokens to work with.

▲dpc_01234 38 minutes ago

That's pretty much the architecture I'm using in my personal coding harness Tau (tau-agent.dev) . There are some other points in here, but there are relatively minor. I think the observation that event log / event sourcing / cqrs works perfectly for harnesses is not very novel.

▲datadrivenangel 2 hours ago

I think this can be safely ignored.

"...and how it extends the BabyAGI lineage and prior graph-memory research. "

From BabyAGI from two years ago: "This is a framework built by Yohei who has never held a job as a developer. The purpose of this repo is to share ideas and spark discussion and for experienced devs to play with. Not meant for production use. Use with cautioun."

▲lukebuehler 9 hours ago

Very cool. I settled on the same/similar design in my agent harness.

All relevant events that affect the context window are stored in an event log. Forking agents and sessions is simply setting a pointer to the sequence number of another event log.

So if you want to check an implementation of this pattern see: https://github.com/smartcomputer-ai/lightspeed

▲throw1234567891 5 hours ago

pi harness does this by default, sessions go into session jsonl files, is it not how everyone is doing this?

▲lukebuehler 2 hours ago

getting downvoted for my other answer. I wasn't clear: yes, there is of course a lot of prior art in pi, and pi specifically does not just store the session events, but adds an abstraction for easy branching which is great.

But what I tried to get at is the question what additional events you store to construct more than just the llm session log but also more fine grained events around the entire agent state, which of course depends on what you want out of your agent.

The paper here in question is going even further and is event sourcing a larger state than just the session transcript, specifically additional graph structures that are getting built as part of the session.

in my agent, specifically, I focus on the event sourcing all the stuff that makes an agent work well as part of a deterministic workflow, which again is prerequisite to run agents in durable workflow engines like Temporal.

I write more about my approach here: https://github.com/smartcomputer-ai/lightspeed/blob/main/doc...

▲lukebuehler 5 hours ago

most coding harnesses store the exact messages that have been sent to the llm and the messages items that have come back from the llm, not much else. Then, there is also a set of events/commands/etc that are in-memory only. Together they constitute the current state of the agent loop.

In Lightspeed, we store all of them as events, thus can always reconstruct the exact state of the loop (e.g. state of open tool calls, compaction decisions in-flight, etc). This makes it possible to run the agent in a durable workflow engine easily.

▲tern 7 hours ago

Nice work. Excited to check this out.

▲bigcat12345678 10 hours ago

This is true after learning this framing.

It's more like the log is the only user/agent accepted consensus. It has to be the grounding base. Although extending it into an agentic system architecture becomes something not necessarily effective in practice.

▲lmwnshn 9 hours ago

With my database hat on, in the context of agentic systems I would argue that write-ahead logs form a good (and potentially transactional) interface between speculative agent work and durable world mutations [0].

That said, there are a _lot_ of "logs for agents" papers that I've read (and unfortunately gotten assigned to review) which are basically "we asked claude to hack on a graph DB and generate a paper".

[0] https://onewill.ai/blog/2026/stealing-50-years-of-database-i...

▲try-working 6 hours ago

We should probably only interact with the agent by writing to the log, which it executes from, and the agent should probably only interact with the external environment by writing and executing code. That fixes a lot of issues with non-determinism.

▲lmwnshn 59 minutes ago

Agreed. While not directly applicable, I was a huge fan of Mozilla's rr [0] in undergrad. Quoting their site:

> rr records a group of Linux user-space processes and captures all inputs to those processes from the kernel, plus any nondeterministic CPU effects performed by those processes (of which there are very few).

I think the solution will resemble that. You don't control the LLM, sure. But you can control what it sees, and maybe that's good enough.

[0] https://rr-project.org/

▲rufasterisco 9 hours ago

https://activegraph.ai/

The paper’s pip library can be tried here

▲MelonUsk 4 hours ago

Current text-based LLMs are the same old story - text-based vs graphical UIs that ate them whole for most of humanity:

Chatbot is the command line

Agent is the bash script

___ is the GUI (macOS/Windows/GTA 6)

You need Xerox PARC all over again and we have one

▲klntsky 9 hours ago

Can someone explain why such a trivial knowhow is paper-worthy? Event sourcing is well known

▲esafak 3 hours ago

Anybody can publish to arxiv. Consider this more like a formal blog post.

▲gchamonlive 3 hours ago

[dead]

▲ 7 hours ago

▲carterschonwald 7 hours ago

This paper points at an idea, but its really only legible if you have a more developed version of the idea already. I really should write more

▲politician 4 hours ago

As others have commented, this is an obvious application of event sourcing. It's irritating to see the claim of "deterministic replay" in the abstract along with the caveat "we can't actually do deterministic replay, so we store all of the model's responses and reproject off of that". Sure, ok, whatever. You're doing session recording and calling it replay.

▲gcr 3 hours ago

Agree with your critique. I think this work is presenting common ideas as novel without thinking through existing problems. Defining a provider-agnostic event graph that enables full session branching replay was the whole point of pi: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/ , though the language around it perhaps didn’t click until a bit later. I don’t even think pi was the first to do this.

Another critique: the abstract mentions how their system allows for “branch[ing] a run at any event without re-executing the shared prefix,” but that’s only possible with very careful KV caching. Generally, rerunning inference from an earlier point still incurs O(n) input token cost and this paper is working at the wrong layer to see that. In this work, execution refers to tool calls but token generation is the expensive part.

▲jamiegregz 9 hours ago

> In this arrangement the log is a byproduct: an audit artifact written alongside the real computation, never the substrate of it.

I’ve come to the same conclusion building my own agents. It simply feels ‘wrong’ that most frameworks will happily mutate your context. You have to explicitly go out of your way to store the original events. I’ve now started storing an event log for my own agents, this is used as the source of truth for deriving all subsequent context.

The great thing about this is that I have finer control over drift in long runs, as I can look back through the conversation/tool history and build context suitable for the current state of the agent. It also allows me to run compactions across the entire event history instead of ‘compactions on top of compactions’ which happens on long runs with checkpoints.

It definitely feels like this will be a bigger issue going forward as we have agents running longer and more complex workflows, I’ve started building a product aimed at addressing this issue in a framework agnostic way. [0]

[0]: https://statefabric.dev

▲spike021 9 hours ago

Why not save progress and important results of a conversation (i.e. including tool calls and such) to a project markdown (even multiple as needed) and clear your context window completely rather than compacting many times? You can then just specify a markdown file to be included as context. Especially if following any kind of plan document and executing on a part of it.

▲jamiegregz 8 hours ago

[dead]

▲tern 9 hours ago

Arrived at a version of this view as well and building one on Elixir/Ash.

▲barrenko 7 hours ago

Didn't read the paper yet, but if you have a giant log, I'd guess that's RLMable?

▲ 8 hours ago

▲dominotw 1 hour ago

weird why did my same submission 3 days ago not get picked up . how does the algorithm work https://news.ycombinator.com/item?id=48752135

▲PufPufPuf 5 hours ago

The AI folks have discovered CQRS?

▲gcr 3 hours ago

shh you’re ruining the fun! :-)

▲try-working 8 hours ago

This is one of the most interesting papers I've seen. Someone said it's AI slop, well I sent it to 5.5 Pro and it was a great read.

▲corgihamlet 11 hours ago

My log has a message for you.

▲ares623 9 hours ago

if the folks at Anthropic/OpenAI can stop their loops for one second they would've figured this out too

but wouldn't feeding that log for each request/response iteration must get expensive really fast no?

also "We discuss--without claiming to demonstrate--" wtf? someone had a showerthought and slopped this out in 10mins to see what others thought?

▲datadrivenangel 2 hours ago

The author is a VC and the BabyAGI author, and doesn't even have a valid ssl cert on their website...

▲dofm 3 hours ago

> someone had a showerthought and slopped this out in 10mins to see what others thought?

The window on back-of-napkin-idea acquihires is closing fast. ;-)

▲1105714 3 hours ago

[flagged]

▲jkwang 10 hours ago

[flagged]