Show HN: Keep large tool output out of LLM context: 3x accuracy 95% fewer tokens

Show HN (score: 6)

Found: March 05, 2026

ID: 3624

Description

Show HN: Keep large tool output out of LLM context: 3x accuracy 95% fewer tokens LLM agents often place raw JSON tool outputs directly in the prompt. After a few tool calls, earlier results get compacted or truncated and answers become incorrect or inconsistent.

I built Sift, a drop-in MCP gateway that stores tool outputs as local artifacts (filesystem blobs indexed in SQLite) and returns an `artifact_id` plus compact schema hints when responses are large or paginated.

Instead of reasoning over full JSON in the prompt, the model runs a small Python query:

    def run(data, schema, params):
        return max(data, key=lambda x: x["magnitude"])["place"]

Query code runs in a constrained subprocess (AST/import guards + timeout/memory caps). Only the computed result is returned to the model.

Benchmark (Claude Sonnet 4.6, 103 questions across 12 datasets):

- Baseline (raw JSON in prompt): 34/103 (33%), 10.7M input tokens

- Sift (artifact + code query): 102/103 (99%), 489K input tokens

Open benchmark + MIT code: https://github.com/lourencomaciel/sift-gateway

Install:

    pipx install sift-gateway
    sift-gateway init --from claude

Works with Claude Code, Cursor, Windsurf, Zed, and VS Code. Existing MCP servers and tools require no changes.

More from Show

No other tools from this source yet.

Show HN: Keep large tool output out of LLM context: 3x accuracy 95% fewer tokens

Description

More from Show

DevTools Assistant