Fresh Hacker News | Show HN: Context Gateway – Compress agent context before it hits the LLM

▲Show HN: Context Gateway – Compress agent context before it hits the LLM(github.com)

35 points by ivzak 2 hours ago | 14 comments

▲kuboble 1 hour ago

I wonder what is the business model.

It seems like the tool to solve the problem that won't last longer than couple of months and is something that e.g. claude code can and probably will tackle themselves soon.

▲kennywinker 17 minutes ago

Business model is: Get acquired

▲sethcronin 33 minutes ago

I guess I'm skeptical that this actually improves performance. I'm worried that the middle man, the tool outputs, can strip useful context that the agent actually needs to diagnose.

▲thebeas 22 minutes ago

That's why give the chance to the model to call expand() in case if it needs more context. We know it's counterintuitive, so we will add the benchmarks to the repo soon.

Given our observations, the performance depends on the task and the model itself, most visible on long-running tasks

▲fcarraldo 7 minutes ago

How does the model know it needs more context?

▲thebeas 1 minute ago

[dead]

▲tontinton 1 hour ago

Is it similar to rtk? Where the output of tool calls is compressed? Or does it actively compress your history once in a while?

If it's the latter, then users will pay for the entire history of tokens since the change uncached: https://platform.claude.com/docs/en/build-with-claude/prompt...

How is this better?

▲BloondAndDoom 42 minutes ago

This is a bit more akin to distill - https://github.com/samuelfaj/distill

Advantage of SML in between some outputs cannot be compressed without losing context, so a small model does that job. It works but most of these solutions still have some tradeoff in real world applications.

▲thebeas 18 minutes ago

We do both:

We compress tool outputs at each step, so the cache isn't broken during the run. Once we hit the 85% context-window limit, we preemptively trigger a summarization step and load that when the context-window fills up.

▲root_axis 1 hour ago

Funny enough, Anthropic just went GA with 1m context claude that has supposedly solved the lost-in-the-middle problem.

▲SyneRyder 57 minutes ago

Just for anyone else who hadn't seen the announcement yet, this Anthropic 1M context is now the same price as the previous 256K context - not the beta where Anthropic charged extra for the 1M window:

https://x.com/claudeai/status/2032509548297343196

As for retrieval, the post shows Opus 4.6 at 78.3% needle retrieval success in 1M window (compared with 91.9% in 256K), and Sonnet 4.6 at 65.1% needle retrieval in 1M (compared with 90.6% in 256K).

▲siva7 49 minutes ago

now that's major news

▲ 18 minutes ago

▲BloondAndDoom 41 minutes ago

In addition to context rot, cost matters, I think lots of people use toke compression tools for that not because of context rot

▲hinkley 15 minutes ago

From a determinism standpoint it might be better for the rot to occur at ingest rather than arbitrarily five questions later.

▲thesiti92 2 hours ago

do you guys have any stats on how much faster this is than claude or codex's compression? claudes is super super slow, but codex feels like an acceptable amount of time? looks cool tho, ill have to try it out and see if it messes with outputs or not.

▲thebeas 4 minutes ago

[dead]

▲esafak 1 hour ago

I can already prevent context pollution with subagents. How is this better?

▲lambdaone 56 minutes ago

This company sounds like it has months to live, or until the VC money runs out at most. If this idea is good, Anthropic et. al. will roll it into their own product, eliminating any purpose for it to exist as an independent product. And if it isn't any good, the company won't get traction.

▲uaghazade 1 hour ago

ok, its great

▲verdverm 2 hours ago

I don't want some other tooling messing with my context. It's too important to leave to something that needs to optimize across many users, there by not being the best for my specifics.

The framework I use (ADK) already handles this, very low hanging fruit that should be a part of any framework, not something external. In ADK, this is a boolean you can turn on per tool or subagent, you can even decide turn by turn or based on any context you see fit by supplying a function.

YC over indexed on AI startups too early, not realizing how trivial these startup "products" are, more of a line item in the feature list of a mature agent framework.

I've also seen dozens of this same project submitted by the claws the led to our new rule addition this week. If your project can be vibe coded by dozens of people in mere hours...

▲jc-myths 11 minutes ago

[dead]

▲poushwell 44 minutes ago

[flagged]

▲BrianFHearn 2 hours ago

[flagged]

▲zenon_paradox 2 hours ago

[dead]

▲eegG0D 1 hour ago

[flagged]

▲mmastrac 1 hour ago

Please don't dump AI-generated comments into HN. The signal is already pretty hard to find around all the noise.

▲post-it 1 hour ago

> This is a massive win for anyone serious about "Signal over Noise."

Not you, clearly.

▲jameschaearley 2 hours ago

[flagged]

▲metadat 1 hour ago

Don't post generated/AI-edited comments. HN is for conversation between humans https://news.ycombinator.com/item?id=47340079 - 1 day ago, 1700 comments

▲linkregister 33 minutes ago

How do you know this comment is created using generative AI?

▲altruios 1 hour ago

Regardless, these appear to be valid/sound questions, with answers to which I am interested.

▲PufPufPuf 1 hour ago

That comment reads pretty normal to me, and it raises valid points