🛠️ Hacker News Tools
Showing 181–200 of 3052 tools from Hacker News
Last Updated
June 05, 2026 at 04:00 AM
Show HN: Dari-docs – Optimize your docs using parallel coding agents
Hacker News (score: 14)[Other] Show HN: Dari-docs – Optimize your docs using parallel coding agents It’s well known at this point that documentation needs to be optimized for AI agents - we’re all pointing our Claude Code / Codex / Pi agents at documentation, and expecting the models to figure out how to implement a product.<p>This, however, changes the entire optimization problem when writing documentation. Good documentation now becomes more objective - you are solving the very concrete problem: can a dumb harness running the dumbest model implement this reliably?<p>Humans can typically compensate for inconsistent terminology or scattered context across pages, but for agents, this often will waste time (or even just completely confuse the agent).<p>We’ve been building a small project around this called dari-docs: users can upload their documentation via website or CLI and run agents across different providers to see where they falter. You can upload your documentation, feed a list of tasks, and ask agents with varying intelligence / cost levels to complete those tasks in parallel. When a run is complete, you get back a list feedback markdown files from each agent run and can apply changes based on agent feedback.<p>Managed service: <a href="https://optimize.dari.dev/">https://optimize.dari.dev/</a>, repo link: <a href="https://github.com/mupt-ai/dari-docs" rel="nofollow">https://github.com/mupt-ai/dari-docs</a><p>The agents actually try to use the product end-to-end. They search through the docs, follow instructions, run commands, try examples, and attempt to debug failures. Importantly, this is not a static LLM review of the documentation. The agents are actually attempting the integration.<p>You can also enable live verification with test credentials so the agents can actually verify workflows against real APIs:<p><pre><code> dari-docs check . --live-verify --secret-env DARI_TEST_API_KEY --task "Create a checkout session" </code></pre> If you’re building a CLI, API, MCP server, or SDK and actively maintaining docs for humans or agents, we’d love to work with you and test this on real workflows!
Show HN: Lance – image/video generation and understanding in one model
Show HN (score: 62)[Other] Show HN: Lance – image/video generation and understanding in one model The model has 3B active parameters. We put the code, homepage, paper and model links here:<p>- Code: <a href="https://github.com/bytedance/Lance" rel="nofollow">https://github.com/bytedance/Lance</a><p>- Homepage: <a href="https://lance-project.github.io/" rel="nofollow">https://lance-project.github.io/</a><p>- Paper: <a href="https://arxiv.org/abs/2605.18678" rel="nofollow">https://arxiv.org/abs/2605.18678</a><p>- Model: <a href="https://huggingface.co/bytedance-research/Lance" rel="nofollow">https://huggingface.co/bytedance-research/Lance</a><p>p.s. Lance is a research project, not a polished product. The model was trained using fewer than 128 GPUs.
SBCL: the ultimate assembly code breadboard (2014)
Hacker News (score: 133)[Other] SBCL: the ultimate assembly code breadboard (2014)
Formal Verification Gates for AI Coding Loops
Hacker News (score: 66)[Other] Formal Verification Gates for AI Coding Loops
Show HN: Hocuspocus 4 – self-hosted Yjs collaboration backend
Hacker News (score: 25)[Other] Show HN: Hocuspocus 4 – self-hosted Yjs collaboration backend Hi HN! I'm Philip, one of the founders of Tiptap. Next to our open-source rich text editor framework, we started developing Hocuspocus about five years ago and open-sourced it too, to solve one of our biggest challenges back then: real-time collaboration in web editors. We found Yjs by Kevin Jahns, a CRDT library that handles concurrent edits without conflicts. Basically, Yjs merges changes from users without conflicts and in real-time. Hocuspocus is the WebSocket server built on top of Yjs. It handles real-time sync, presence/awareness, persistence, and Redis-based scaling.<p>While we use Hocuspocus at Tiptap as the collaboration backend for our cloud services, it also works with any Yjs client (Slate, Quill, Monaco, ProseMirror, or your own setup), and Yjs documents aren't limited to text at all. You can sync any structured data through them, and in the meantime we see projects that rely on Hocuspocus without using the Tiptap editor.<p>We released Hocuspocus v4 under the MIT license a few weeks ago, and the biggest change is that it's no longer tied to Node. The previous versions depended on the ws package, which meant you couldn't run Hocuspocus on Bun, Deno, or Cloudflare Workers. We moved to crossws, a universal websocket adapter, so the same server now runs on Node, Bun, Deno, Cloudflare Workers, and Node with uWebSockets. That also lets you run collaboration at the edge.<p>The other changes are smaller but are important if you're using Hocuspocus in production:<p>1. Every core class and hook payload takes a generic Context type now, so the auth/session shape you build in onAuthenticate flows through every other hook with full type safety (defaults to any so existing code doesn't break).<p>2. Document updates are now processed sequentially per connection through an internal queue, which fixes a correctness bug where async hooks could cause CRDT updates to apply out of order under load.<p>3. Transaction origins are structured objects now with a source field instead of raw values and there's an isTransactionOrigin() helper for narrowing.<p>4. Hook payloads use web-standard Request and Headers instead of Node's IncomingMessage.<p>5. The wire protocol is backward compatible in both directions, so you can roll out servers and providers independently.<p>If you want to test Hocuspocus: npm install @hocuspocus/server @hocuspocus/provider<p>Docs at: https://tiptap.dev/docs/hocuspocus<p>Source at: https://github.com/ueberdosis/hocuspocus<p>Because running real-time collaboration on Workers or Durable Objects is new in v4, that's the use case we'd most like to hear your questions and feedback on.
Testing distributed systems with AI agents
Hacker News (score: 59)[Testing] Testing distributed systems with AI agents
GitHub confirms breach of 3,800 repos via malicious VSCode extension
Hacker News (score: 91)[Other] GitHub confirms breach of 3,800 repos via malicious VSCode extension Previous thread in sequence:<p><i>GitHub is investigating unauthorized access to their internal repositories</i> - <a href="https://news.ycombinator.com/item?id=48201316">https://news.ycombinator.com/item?id=48201316</a> - May 2026 (321 comments)
Show HN: Open-Source Agentic QA Harness with Memory
Show HN (score: 12)[Testing] Show HN: Open-Source Agentic QA Harness with Memory GitHub - <a href="https://github.com/vostride/agent-qa" rel="nofollow">https://github.com/vostride/agent-qa</a> Live Demos - <a href="https://vostride.com/demo/agent-qa" rel="nofollow">https://vostride.com/demo/agent-qa</a>
Incident Report: May 19, 2026 – GCP Account Suspension
Hacker News (score: 430)[Other] Incident Report: May 19, 2026 – GCP Account Suspension Previous thread: <i>Incident Report: Railway Blocked by Google Cloud [resolved]</i> - <a href="https://news.ycombinator.com/item?id=48201484">https://news.ycombinator.com/item?id=48201484</a>
Testing MiniMax M2.7 via API on three real ML and coding workflows
Hacker News (score: 14)[Testing] Testing MiniMax M2.7 via API on three real ML and coding workflows
Key, in sight – A guide, of sorts, to keyboard customization
Hacker News (score: 16)[Other] Key, in sight – A guide, of sorts, to keyboard customization
GitHub is investigating unauthorized access to their internal repositories
Hacker News (score: 571)[Other] GitHub is investigating unauthorized access to their internal repositories <a href="https://xcancel.com/github/status/2056884788179726685" rel="nofollow">https://xcancel.com/github/status/2056884788179726685</a>
Remove–AI–Watermarks – CLI and library for removing AI watermarks from images
Hacker News (score: 156)[CLI Tool] Remove–AI–Watermarks – CLI and library for removing AI watermarks from images
Show HN: I built a native macOS Markdown viewer 100% with AI coding agents
Show HN (score: 5)[Other] Show HN: I built a native macOS Markdown viewer 100% with AI coding agents I built Markdown Viewer because every Markdown app I found was either bloated (VS Code, Obsidian) or too bare-bones. Wanted something that loads instantly, renders Obsidian-style features cleanly, and weighs in at a few megabytes.<p>Built with Tauri 2 (Rust backend + webview frontend): - GitHub Flavored Markdown + Obsidian extensions (wikilinks, callouts, emoji, math, Mermaid diagrams) - Frontmatter rendered as a structured metadata bar above content - HTML sanitization via ammonia for security - No heavy dependencies, no Electron<p>What makes it interesting isn't so much the features — but how it was built. Every line of Rust, CSS, and JavaScript was written by AI coding agents (pi.dev/Qwen and Claude Code) without a single human writing code. No hand-holding, no "prompt then copy-paste" — just a high-level brief and iterative agent-driven development.<p>I've been using this project to hone into my pi.dev setup - am getting somewhere with pi.dev/Qwen3.6 with a small set of extensions. Trying to avoid Claude Code/Opus for this project - want to see what I can do with local LLM.<p>Key stats: - Instant load (no webview overhead, pure rendering) - ~few MB binary - Sanitized HTML via ammonia (XSS-safe) - Open source on GitHub<p>Open source at <a href="https://github.com/rajatarya/mdviewer" rel="nofollow">https://github.com/rajatarya/mdviewer</a>
Gemini CLI will stop working from June 18, 2026
Hacker News (score: 100)[Other] Gemini CLI will stop working from June 18, 2026
Gemini 3.5 Flash
Hacker News (score: 192)[API/SDK] Gemini 3.5 Flash <a href="https://ai.google.dev/gemini-api/docs/models/gemini-3.5-flash" rel="nofollow">https://ai.google.dev/gemini-api/docs/models/gemini-3.5-flas...</a>
[CLI Tool] Show HN: LibreOffice-rs – I built a pure-Rust LibreOffice using autoresearch Hey HN,<p>I built libreoffice-rs: a pure-Rust, std-only library + CLI for reading, writing, converting, and rendering office documents — with *zero* LibreOffice, Java, or C dependencies.<p>100x faster... I know, I know.<p>It supports DOCX, XLSX, PPTX, ODT/ODS/ODP, PDF, Markdown, CSV, HTML, SVG, and more. The CLI is designed to feel familiar:<p>```bash cargo install libreoffice-pure<p># soffice-style usage libreoffice-pure --headless --convert-to pdf report.docx libreoffice-pure --headless --convert-to csv spreadsheet.xlsx<p># Markdown extraction libreoffice-pure docx-to-md report.docx report.md libreoffice-pure pptx-to-md slides.pptx slides.md<p># Render pages as images libreoffice-pure docx-to-pngs report.docx pages/ --dpi 144 ```
Show HN: Superlog (YC P26) – Observability that installs itself and fixes bugs
Hacker News (score: 34)[Monitoring/Observability] Show HN: Superlog (YC P26) – Observability that installs itself and fixes bugs Hey HN, we’re Nico and Arseniy, co-founders of Superlog (<a href="https://superlog.sh">https://superlog.sh</a>). We're building a self-installing, self healing observability tool meant not to be opened. It has a wizard that daily sets up proper logging and an agent that investigates errors and opens PRs.<p>Super short demo: <a href="https://www.youtube.com/watch?v=xFhU9Mk247M" rel="nofollow">https://www.youtube.com/watch?v=xFhU9Mk247M</a>.<p>In our earlier startups, we tried Sentry, Datadog, Grafana, Dash0, and nothing was good enough. Proper telemetry and alerting still requires a ton of manual setup. We struggled with adding good logs, so debugging was tough, especially as codebases grow at a faster pace. Meanwhile, the Datadog/Dash0 bill kept climbing, and we still spent engineering hours to learn, configure, and maintain our observability tooling.<p>With Sentry, we found ourselves flooded by a stream of alerts into our Slack channel, most were duplicates or lacked context, so alert fatigue/constant interrupts were a real pain. The #ops notification is consistently the worst feeling on a Saturday morning<p>We’ve seen too many times servers run out of memory and disk, and three AWS metrics giving us three different values. Half of the graphs on dashboards are normally empty or outdated, and manually clicking through UIs, especially when the team is small, seems like a huge waste of time.<p>At some point we realized that solving this problem would be more valuable than the things we had been working on, and we had the expertise to do it, since Arseniy had spent years at Datadog, getting paged during the night to debug production incidents. So we decided to build a platform that would just work: agent-first, MCP-native, zero-setup.<p>Here’s how Superlog works: we have a wizard that scans your repo, and automatically instruments it with well-structured logs, traces and metrics via OpenTelemetry. We make sure to highlight main failure modes, endpoint performance, usage per tenant, and LLM/upstream cost (by callsite, tenant and model).<p>Errors get fingerprinted and grouped into incidents, so you see one issue, not a thousand duplicates. When you get a notification from Superlog, you see a clear failure summary, its inferred severity and impact upfront.<p>Then the agent investigates and tries to solve the issue. If it has enough context, it produces a concise and tested PR. If it doesn't, it posts its findings for the investigating team, and automatically pulls in the engineers that could contribute more context based on documentation, previous investigations and Slack threads.<p>Either way the output is one clean PR per incident, posted in Slack, that you can merge, ignore, or open as a Claude Code session and modify.<p>Three things we think are different from other observability vendors:<p>(1) We solve the setup pain. The wizard will instrument everything with native OTel SDKs, respecting the semantic conventions, with proper service and environment tagging. We’re also working on native automatic dashboards and alerts, so that you can see what’s going on in a glance and don’t miss subtle failure modes.<p>(2) Our telemetry doesn’t decay. The wizard runs daily, and keeps adding logs, alerts and dashboards where it’s needed. You don't have to remember to instrument new features. The next time something breaks, the data you need to debug it is already there.<p>(3) Our goal is to solve alert fatigue. We use agents to merge similar errors and refine the summaries, giving you relevant information upfront. We have a custom evaluation setup that makes sure that our summaries are dense and correct, and severity and impact is on point. We also give you confidence scores for every LLM-enhanced metric so that wrong guesses don’t get boosted.<p>Important: superlog telemetry is vendor-neutral, so you keep all the logs/metrics/traces we install. Pricing is on the site. We're early, so expect rough edges and please tell us when you find them.<p>You can try it at <a href="https://superlog.sh">https://superlog.sh</a>. We'd love to hear what you're using today, what's broken about it, and whether the "one mergeable PR per incident" model sounds useful or terrifying. Especially keen to hear from folks running integration-heavy products, anyone who's rolled their own observability, and anyone who has tried Sentry / Datadog MCPs and given up. Comments and feedback welcome!
Converting an Integer to a Decimal String in Under Two Nanoseconds
Hacker News (score: 10)[Other] Converting an Integer to a Decimal String in Under Two Nanoseconds
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
Hacker News (score: 18)[Other] Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments.<p>I built Forge, an open-source reliability layer for self-hosted LLM tool-calling.<p>What it does:<p>- Adds domain-and-tool-agnostic guardrails (retry nudges, step enforcement, error recovery, VRAM-aware context management) to local models running on consumer hardware<p>- Takes an 8B model from ~53% to ~99% on multi-step agentic workflows without changing the model - just the system around it<p>- Ships with an eval harness and interactive dashboard so you can reproduce every number<p>I wanted to run a handful of always-on agentic systems for my portfolio, didn't want to pay cloud frontier costs, and immediately hit the compounding math problem on local models. 90% per-step accuracy sounds great, but with a 5-step workflow that's a 40% failure rate. No existing framework seemed to address this mechanical reliability issue - they all seemed tailor-made for cloud frontier.<p>Demo video: <a href="https://youtu.be/MzRgJoJAXGc" rel="nofollow">https://youtu.be/MzRgJoJAXGc</a> (side-by-side: same model, same task, with and without Forge guardrails)<p>The paper (accepted to ACM CAIS '26, presenting May 26-29 in San Jose) covers the peer-reviewed findings across 97 model/backend configurations, 18 scenarios, 50 runs each. Key numbers:<p>- Ministral 8B with Forge: 99.3%. Claude Sonnet with Forge: 100%. The gap between a free local 8B model on a $600 GPU and a frontier API is less than 1 point.<p>- The same 8B local model with Forge (99.3%) outperforms Claude Sonnet without guardrails (87.2%) - an 8B model with framework support beats the best result you can get through frontier API alone.<p>- Error recovery scores 0% for every model tested - local and frontier - without the retry mechanism. Not a capability gap, an architectural absence.<p>I'm currently using this for my home assistant running on Ministral 14B-Reasoning, and for my locally hosted agentic coding harness (8B managed to contribute to the codebase!).<p>The guardrail stack has five layers, each independently toggleable. The two that carry the most weight (per ablation study with McNemar's test): retry nudges (24-49 point drops when disabled) and error recovery (~10 point drops, significant for every model tested). Step enforcement is situational - only fires for models with weaker sequencing discipline. Rescue parsing and context compaction showed no significance in the eval but are retained for production workloads where they activate once in a while.<p>One thing I really didn't expect: the serving backend matters. Same Mistral-Nemo 12B weights produce 7% accuracy on llama-server with native function calling and 83% on Llamafile in prompt mode. A 75-point swing from infrastructure alone. I don't think anyone's published this because standard benchmarks don't control for serving backend.<p>Another surprise: there's no distinction in current LLM tool-calling between "the tool ran successfully and returned data" and "the tool ran successfully but found nothing." Both return a value, the orchestrator marks the step complete, and bad data cascades downstream. It's the equivalent of HTTP having 200 but no 404. Forge adds this as a new exception class (ToolResolutionError) - the model sees the error and can retry instead of silently passing garbage forward.<p>Biggest technical challenge was context compaction for memory-constrained hardware. Both Ollama and Llamafile silently fall back to CPU when the model exceeds VRAM - no warning, no error, just 10-100x slower inference. Forge queries nvidia-smi at startup and derives a token budget to prevent this.<p>How to try it:<p>- Clone the repo, run the eval harness on a model I haven't tested. If you get interesting results I'll add them to the dashboard.<p>- Try the proxy server mode - point any OpenAI-compatible client at Forge and it handles guardrails transparently. It's the newest model and I'd love more eyes on it.<p>- Dogfooding led me to optimize model parameters in v0.6.0. The harder eval suite (26 scenarios) is designed to raise the ceiling so no one sits at 100%. Several that did on the original suite can't sweep it - including Opus 4.6. Curious if anyone finds scenarios that expose gaps I haven't thought of. Paper numbers based on pre v0.6.0 code.<p>Background: prior ML publication in unsupervised learning (83 citations). This paper accepted to ACM CAIS '26 - presenting May 26-29.<p>Repo: <a href="https://github.com/antoinezambelli/forge" rel="nofollow">https://github.com/antoinezambelli/forge</a><p>Paper: <a href="https://www.caisconf.org/program/2026/demos/forge-agentic-reliability/" rel="nofollow">https://www.caisconf.org/program/2026/demos/forge-agentic-re...</a> <a href="https://github.com/antoinezambelli/forge/blob/main/docs/forge_ieee_preprint.pdf" rel="nofollow">https://github.com/antoinezambelli/forge/blob/main/docs/forg...</a><p>Dashboard: <a href="https://github.com/antoinezambelli/forge/docs/results/dashboard.html" rel="nofollow">https://github.com/antoinezambelli/forge/docs/results/dashbo...</a>