Show HN: Llmbuffer – Python library for cache-optimized LLM conversation history

Show HN (score: 5)

Found: June 10, 2026

ID: 5072

Description

Show HN: Llmbuffer – Python library for cache-optimized LLM conversation history I was not getting good cache utilization when including dynamic context in agent threads. After a lot of experimentation, I found a good pattern that minimizes how often long lived conversation history gets modified while still supporting dynamic context. It has flexible hooks for doing things like truncating or summarizing tool outputs when transitioning messages to the long term history. And I'm seeing >>90% of tokens hitting the cache for my agents despite including a lot of dynamic user context.

There are a wide range of agent prompting strategies so I'd love to hear where this library works well and where there are patterns that don't fit well into the current API!

More from Show

No other tools from this source yet.

Show HN: Llmbuffer – Python library for cache-optimized LLM conversation history

Description

More from Show

DevTools Assistant