🛠️ All DevTools

Showing 1–20 of 4687 tools

Last Updated
May 20, 2026 at 08:00 PM

[Other] Formal Verification Gates for AI Coding Loops

Found: May 20, 2026 ID: 4684

[Other] Show HN: Hocuspocus 4 – self-hosted Yjs collaboration backend Hi HN! I&#x27;m Philip, one of the founders of Tiptap. Next to our open-source rich text editor framework, we started developing Hocuspocus about five years ago and open-sourced it too, to solve one of our biggest challenges back then: real-time collaboration in web editors. We found Yjs by Kevin Jahns, a CRDT library that handles concurrent edits without conflicts. Basically, Yjs merges changes from users without conflicts and in real-time. Hocuspocus is the WebSocket server built on top of Yjs. It handles real-time sync, presence&#x2F;awareness, persistence, and Redis-based scaling.<p>While we use Hocuspocus at Tiptap as the collaboration backend for our cloud services, it also works with any Yjs client (Slate, Quill, Monaco, ProseMirror, or your own setup), and Yjs documents aren&#x27;t limited to text at all. You can sync any structured data through them, and in the meantime we see projects that rely on Hocuspocus without using the Tiptap editor.<p>We released Hocuspocus v4 under the MIT license a few weeks ago, and the biggest change is that it&#x27;s no longer tied to Node. The previous versions depended on the ws package, which meant you couldn&#x27;t run Hocuspocus on Bun, Deno, or Cloudflare Workers. We moved to crossws, a universal websocket adapter, so the same server now runs on Node, Bun, Deno, Cloudflare Workers, and Node with uWebSockets. That also lets you run collaboration at the edge.<p>The other changes are smaller but are important if you&#x27;re using Hocuspocus in production:<p>1. Every core class and hook payload takes a generic Context type now, so the auth&#x2F;session shape you build in onAuthenticate flows through every other hook with full type safety (defaults to any so existing code doesn&#x27;t break).<p>2. Document updates are now processed sequentially per connection through an internal queue, which fixes a correctness bug where async hooks could cause CRDT updates to apply out of order under load.<p>3. Transaction origins are structured objects now with a source field instead of raw values and there&#x27;s an isTransactionOrigin() helper for narrowing.<p>4. Hook payloads use web-standard Request and Headers instead of Node&#x27;s IncomingMessage.<p>5. The wire protocol is backward compatible in both directions, so you can roll out servers and providers independently.<p>If you want to test Hocuspocus: npm install @hocuspocus&#x2F;server @hocuspocus&#x2F;provider<p>Docs at: https:&#x2F;&#x2F;tiptap.dev&#x2F;docs&#x2F;hocuspocus<p>Source at: https:&#x2F;&#x2F;github.com&#x2F;ueberdosis&#x2F;hocuspocus<p>Because running real-time collaboration on Workers or Durable Objects is new in v4, that&#x27;s the use case we&#x27;d most like to hear your questions and feedback on.

Found: May 20, 2026 ID: 4686

[Testing] Testing distributed systems with AI agents

Found: May 20, 2026 ID: 4685

[Other] GitHub confirms breach of 3,800 repos via malicious VSCode extension Previous thread in sequence:<p><i>GitHub is investigating unauthorized access to their internal repositories</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=48201316">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=48201316</a> - May 2026 (321 comments)

Found: May 20, 2026 ID: 4683

can1357/oh-my-pi

GitHub Trending

[CLI Tool] ⌥ AI Coding agent for the terminal — hash-anchored edits, optimized tool harness, LSP, Python, browser, subagents, and more

Found: May 20, 2026 ID: 4681

[Testing] Show HN: Open-Source Agentic QA Harness with Memory GitHub - <a href="https:&#x2F;&#x2F;github.com&#x2F;vostride&#x2F;agent-qa" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;vostride&#x2F;agent-qa</a> Live Demos - <a href="https:&#x2F;&#x2F;vostride.com&#x2F;demo&#x2F;agent-qa" rel="nofollow">https:&#x2F;&#x2F;vostride.com&#x2F;demo&#x2F;agent-qa</a>

Found: May 20, 2026 ID: 4687

[Testing] Testing MiniMax M2.7 via API on three real ML and coding workflows

Found: May 20, 2026 ID: 4679

[Other] GitHub is investigating unauthorized access to their internal repositories <a href="https:&#x2F;&#x2F;xcancel.com&#x2F;github&#x2F;status&#x2F;2056884788179726685" rel="nofollow">https:&#x2F;&#x2F;xcancel.com&#x2F;github&#x2F;status&#x2F;2056884788179726685</a>

Found: May 20, 2026 ID: 4682

[CLI Tool] Remove–AI–Watermarks – CLI and library for removing AI watermarks from images

Found: May 19, 2026 ID: 4677

[Other] Show HN: I built a native macOS Markdown viewer 100% with AI coding agents I built Markdown Viewer because every Markdown app I found was either bloated (VS Code, Obsidian) or too bare-bones. Wanted something that loads instantly, renders Obsidian-style features cleanly, and weighs in at a few megabytes.<p>Built with Tauri 2 (Rust backend + webview frontend): - GitHub Flavored Markdown + Obsidian extensions (wikilinks, callouts, emoji, math, Mermaid diagrams) - Frontmatter rendered as a structured metadata bar above content - HTML sanitization via ammonia for security - No heavy dependencies, no Electron<p>What makes it interesting isn&#x27;t so much the features — but how it was built. Every line of Rust, CSS, and JavaScript was written by AI coding agents (pi.dev&#x2F;Qwen and Claude Code) without a single human writing code. No hand-holding, no &quot;prompt then copy-paste&quot; — just a high-level brief and iterative agent-driven development.<p>I&#x27;ve been using this project to hone into my pi.dev setup - am getting somewhere with pi.dev&#x2F;Qwen3.6 with a small set of extensions. Trying to avoid Claude Code&#x2F;Opus for this project - want to see what I can do with local LLM.<p>Key stats: - Instant load (no webview overhead, pure rendering) - ~few MB binary - Sanitized HTML via ammonia (XSS-safe) - Open source on GitHub<p>Open source at <a href="https:&#x2F;&#x2F;github.com&#x2F;rajatarya&#x2F;mdviewer" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;rajatarya&#x2F;mdviewer</a>

Found: May 19, 2026 ID: 4680

[Other] Gemini CLI will stop working from June 18, 2026

Found: May 19, 2026 ID: 4678

Gemini 3.5 Flash

Hacker News (score: 192)

[API/SDK] Gemini 3.5 Flash <a href="https:&#x2F;&#x2F;ai.google.dev&#x2F;gemini-api&#x2F;docs&#x2F;models&#x2F;gemini-3.5-flash" rel="nofollow">https:&#x2F;&#x2F;ai.google.dev&#x2F;gemini-api&#x2F;docs&#x2F;models&#x2F;gemini-3.5-flas...</a>

Found: May 19, 2026 ID: 4672

[CLI Tool] Show HN: LibreOffice-rs – I built a pure-Rust LibreOffice using autoresearch Hey HN,<p>I built libreoffice-rs: a pure-Rust, std-only library + CLI for reading, writing, converting, and rendering office documents — with *zero* LibreOffice, Java, or C dependencies.<p>100x faster... I know, I know.<p>It supports DOCX, XLSX, PPTX, ODT&#x2F;ODS&#x2F;ODP, PDF, Markdown, CSV, HTML, SVG, and more. The CLI is designed to feel familiar:<p>```bash cargo install libreoffice-pure<p># soffice-style usage libreoffice-pure --headless --convert-to pdf report.docx libreoffice-pure --headless --convert-to csv spreadsheet.xlsx<p># Markdown extraction libreoffice-pure docx-to-md report.docx report.md libreoffice-pure pptx-to-md slides.pptx slides.md<p># Render pages as images libreoffice-pure docx-to-pngs report.docx pages&#x2F; --dpi 144 ```

Found: May 19, 2026 ID: 4676

[Monitoring/Observability] Show HN: Superlog (YC P26) – Observability that installs itself and fixes bugs Hey HN, we’re Nico and Arseniy, co-founders of Superlog (<a href="https:&#x2F;&#x2F;superlog.sh">https:&#x2F;&#x2F;superlog.sh</a>). We&#x27;re building a self-installing, self healing observability tool meant not to be opened. It has a wizard that daily sets up proper logging and an agent that investigates errors and opens PRs.<p>Super short demo: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=xFhU9Mk247M" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=xFhU9Mk247M</a>.<p>In our earlier startups, we tried Sentry, Datadog, Grafana, Dash0, and nothing was good enough. Proper telemetry and alerting still requires a ton of manual setup. We struggled with adding good logs, so debugging was tough, especially as codebases grow at a faster pace. Meanwhile, the Datadog&#x2F;Dash0 bill kept climbing, and we still spent engineering hours to learn, configure, and maintain our observability tooling.<p>With Sentry, we found ourselves flooded by a stream of alerts into our Slack channel, most were duplicates or lacked context, so alert fatigue&#x2F;constant interrupts were a real pain. The #ops notification is consistently the worst feeling on a Saturday morning<p>We’ve seen too many times servers run out of memory and disk, and three AWS metrics giving us three different values. Half of the graphs on dashboards are normally empty or outdated, and manually clicking through UIs, especially when the team is small, seems like a huge waste of time.<p>At some point we realized that solving this problem would be more valuable than the things we had been working on, and we had the expertise to do it, since Arseniy had spent years at Datadog, getting paged during the night to debug production incidents. So we decided to build a platform that would just work: agent-first, MCP-native, zero-setup.<p>Here’s how Superlog works: we have a wizard that scans your repo, and automatically instruments it with well-structured logs, traces and metrics via OpenTelemetry. We make sure to highlight main failure modes, endpoint performance, usage per tenant, and LLM&#x2F;upstream cost (by callsite, tenant and model).<p>Errors get fingerprinted and grouped into incidents, so you see one issue, not a thousand duplicates. When you get a notification from Superlog, you see a clear failure summary, its inferred severity and impact upfront.<p>Then the agent investigates and tries to solve the issue. If it has enough context, it produces a concise and tested PR. If it doesn&#x27;t, it posts its findings for the investigating team, and automatically pulls in the engineers that could contribute more context based on documentation, previous investigations and Slack threads.<p>Either way the output is one clean PR per incident, posted in Slack, that you can merge, ignore, or open as a Claude Code session and modify.<p>Three things we think are different from other observability vendors:<p>(1) We solve the setup pain. The wizard will instrument everything with native OTel SDKs, respecting the semantic conventions, with proper service and environment tagging. We’re also working on native automatic dashboards and alerts, so that you can see what’s going on in a glance and don’t miss subtle failure modes.<p>(2) Our telemetry doesn’t decay. The wizard runs daily, and keeps adding logs, alerts and dashboards where it’s needed. You don&#x27;t have to remember to instrument new features. The next time something breaks, the data you need to debug it is already there.<p>(3) Our goal is to solve alert fatigue. We use agents to merge similar errors and refine the summaries, giving you relevant information upfront. We have a custom evaluation setup that makes sure that our summaries are dense and correct, and severity and impact is on point. We also give you confidence scores for every LLM-enhanced metric so that wrong guesses don’t get boosted.<p>Important: superlog telemetry is vendor-neutral, so you keep all the logs&#x2F;metrics&#x2F;traces we install. Pricing is on the site. We&#x27;re early, so expect rough edges and please tell us when you find them.<p>You can try it at <a href="https:&#x2F;&#x2F;superlog.sh">https:&#x2F;&#x2F;superlog.sh</a>. We&#x27;d love to hear what you&#x27;re using today, what&#x27;s broken about it, and whether the &quot;one mergeable PR per incident&quot; model sounds useful or terrifying. Especially keen to hear from folks running integration-heavy products, anyone who&#x27;s rolled their own observability, and anyone who has tried Sentry &#x2F; Datadog MCPs and given up. Comments and feedback welcome!

Found: May 19, 2026 ID: 4675

[Other] Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks Hi HN, I&#x27;m Antoine Zambelli, AI Director at Texas Instruments.<p>I built Forge, an open-source reliability layer for self-hosted LLM tool-calling.<p>What it does:<p>- Adds domain-and-tool-agnostic guardrails (retry nudges, step enforcement, error recovery, VRAM-aware context management) to local models running on consumer hardware<p>- Takes an 8B model from ~53% to ~99% on multi-step agentic workflows without changing the model - just the system around it<p>- Ships with an eval harness and interactive dashboard so you can reproduce every number<p>I wanted to run a handful of always-on agentic systems for my portfolio, didn&#x27;t want to pay cloud frontier costs, and immediately hit the compounding math problem on local models. 90% per-step accuracy sounds great, but with a 5-step workflow that&#x27;s a 40% failure rate. No existing framework seemed to address this mechanical reliability issue - they all seemed tailor-made for cloud frontier.<p>Demo video: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;MzRgJoJAXGc" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;MzRgJoJAXGc</a> (side-by-side: same model, same task, with and without Forge guardrails)<p>The paper (accepted to ACM CAIS &#x27;26, presenting May 26-29 in San Jose) covers the peer-reviewed findings across 97 model&#x2F;backend configurations, 18 scenarios, 50 runs each. Key numbers:<p>- Ministral 8B with Forge: 99.3%. Claude Sonnet with Forge: 100%. The gap between a free local 8B model on a $600 GPU and a frontier API is less than 1 point.<p>- The same 8B local model with Forge (99.3%) outperforms Claude Sonnet without guardrails (87.2%) - an 8B model with framework support beats the best result you can get through frontier API alone.<p>- Error recovery scores 0% for every model tested - local and frontier - without the retry mechanism. Not a capability gap, an architectural absence.<p>I&#x27;m currently using this for my home assistant running on Ministral 14B-Reasoning, and for my locally hosted agentic coding harness (8B managed to contribute to the codebase!).<p>The guardrail stack has five layers, each independently toggleable. The two that carry the most weight (per ablation study with McNemar&#x27;s test): retry nudges (24-49 point drops when disabled) and error recovery (~10 point drops, significant for every model tested). Step enforcement is situational - only fires for models with weaker sequencing discipline. Rescue parsing and context compaction showed no significance in the eval but are retained for production workloads where they activate once in a while.<p>One thing I really didn&#x27;t expect: the serving backend matters. Same Mistral-Nemo 12B weights produce 7% accuracy on llama-server with native function calling and 83% on Llamafile in prompt mode. A 75-point swing from infrastructure alone. I don&#x27;t think anyone&#x27;s published this because standard benchmarks don&#x27;t control for serving backend.<p>Another surprise: there&#x27;s no distinction in current LLM tool-calling between &quot;the tool ran successfully and returned data&quot; and &quot;the tool ran successfully but found nothing.&quot; Both return a value, the orchestrator marks the step complete, and bad data cascades downstream. It&#x27;s the equivalent of HTTP having 200 but no 404. Forge adds this as a new exception class (ToolResolutionError) - the model sees the error and can retry instead of silently passing garbage forward.<p>Biggest technical challenge was context compaction for memory-constrained hardware. Both Ollama and Llamafile silently fall back to CPU when the model exceeds VRAM - no warning, no error, just 10-100x slower inference. Forge queries nvidia-smi at startup and derives a token budget to prevent this.<p>How to try it:<p>- Clone the repo, run the eval harness on a model I haven&#x27;t tested. If you get interesting results I&#x27;ll add them to the dashboard.<p>- Try the proxy server mode - point any OpenAI-compatible client at Forge and it handles guardrails transparently. It&#x27;s the newest model and I&#x27;d love more eyes on it.<p>- Dogfooding led me to optimize model parameters in v0.6.0. The harder eval suite (26 scenarios) is designed to raise the ceiling so no one sits at 100%. Several that did on the original suite can&#x27;t sweep it - including Opus 4.6. Curious if anyone finds scenarios that expose gaps I haven&#x27;t thought of. Paper numbers based on pre v0.6.0 code.<p>Background: prior ML publication in unsupervised learning (83 citations). This paper accepted to ACM CAIS &#x27;26 - presenting May 26-29.<p>Repo: <a href="https:&#x2F;&#x2F;github.com&#x2F;antoinezambelli&#x2F;forge" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;antoinezambelli&#x2F;forge</a><p>Paper: <a href="https:&#x2F;&#x2F;www.caisconf.org&#x2F;program&#x2F;2026&#x2F;demos&#x2F;forge-agentic-reliability&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.caisconf.org&#x2F;program&#x2F;2026&#x2F;demos&#x2F;forge-agentic-re...</a> <a href="https:&#x2F;&#x2F;github.com&#x2F;antoinezambelli&#x2F;forge&#x2F;blob&#x2F;main&#x2F;docs&#x2F;forge_ieee_preprint.pdf" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;antoinezambelli&#x2F;forge&#x2F;blob&#x2F;main&#x2F;docs&#x2F;forg...</a><p>Dashboard: <a href="https:&#x2F;&#x2F;github.com&#x2F;antoinezambelli&#x2F;forge&#x2F;docs&#x2F;results&#x2F;dashboard.html" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;antoinezambelli&#x2F;forge&#x2F;docs&#x2F;results&#x2F;dashbo...</a>

Found: May 19, 2026 ID: 4673

rtk-ai/rtk

GitHub Trending

[CLI Tool] CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies

Found: May 19, 2026 ID: 4668

[API/SDK] Show HN: Hsrs – Type-Safe Haskell Bindings Generator for Rust Hey everyone! I&#x27;ve been working on hsrs, a type-safe Haskell Bindings Generator for Rust.<p>I couldn&#x27;t really find any bindings generator that would create type-safe, rich bindings for Haskell from Rust. Naturally, both languages have rich type systems, so I was amazed that no awesome bindings generator already existed, hence I decided to write my own. hsrs feels very similar to pyo3 and napi-rs, and if you&#x27;ve used those, hsrs will feel right at home.<p>What&#x27;s unique about hsrs as opposed to hs-bindgen is that it has type-safe bindings for rich types, like Result, Maybe, etc. while also generating Haskell bindings. The repo contains a minimal example, and more details are available in the haskell discourse: <a href="https:&#x2F;&#x2F;discourse.haskell.org&#x2F;t&#x2F;ann-hsrs-ergonomic-haskell-bindings-for-rust&#x2F;14129" rel="nofollow">https:&#x2F;&#x2F;discourse.haskell.org&#x2F;t&#x2F;ann-hsrs-ergonomic-haskell-b...</a>

Found: May 19, 2026 ID: 4666

[Other] LLMCap – A proxy that hard-stops LLM API calls when you hit a dollar cap

Found: May 19, 2026 ID: 4669

[Code Quality] Sieve – scans Cursor/Claude chat history for leaked API keys Background: I was using Cursor to set up an OpenAI integration.The agent read my .env file, added the key to the config, and everything worked. What I didn&#x27;t think about: that key was now sitting in a plaintext SQLite database at ~&#x2F;Library&#x2F;ApplicationSupport&#x2F;Cursor&#x2F;User&#x2F;workspaceStorage&#x2F;..<p>AI coding tools (Cursor, Claude Code, Copilot, Cline) routinely read .env files as part of normal operation. Every secret they touch gets embedded in their local transcript&#x2F;state files — unencrypted, outside .gitignore, persisted indefinitely.<p>Standard secret scanners (gitleaks, detect-secrets) scan git repos. Nobody scans AI transcript stores. That&#x27;s the gap.<p>Sieve scans those files locally on your Mac. Flags exposed keys by severity. Redacts them in-place. Stores fingerprints in Keychain — never plaintext. Covers Cursor, Claude Code, Claude Desktop, Copilot, Cline, Roo Cline, Windsurf, Gemini CLI, and .env files.<p>Happy to answer questions about how the SQLite parsing works or the detection rules.

Found: May 19, 2026 ID: 4667

[Other] Show HN: Tracecast – open-source generative data apps built on top of Marimo Hi HN, I&#x27;m Malachy, the founder of Tracecast. This project lets you generate interactive data apps on top of your data, using a Cursor-style AI chat. It stitches together Marimo, LangGraph agents, and data warehouse query tools. It has an Apache 2.0 license.<p>The initial use case that spurred this project was business analytics, specifically generating product usage dashboards.<p>This project&#x27;s main inspiration is Marimo, an open source python notebook that can be &quot;queried with SQL, run as a script, and deployed as an app&quot; [1]. The recent release of Marimo Pair [2] demonstrated the power of connecting AI agents like Claude Code to Marimo notebooks directly. This project seeks to build on that work by incorporating a LangGraph agent with two key abilities: (1) the ability to execute queries against a connected data warehouse (such as Snowflake); (2) the ability to write Marimo notebooks.<p>When prompted, the LangGraph agent will run exploratory data analysis using database query tools. Then, it creates a polished Marimo notebook that&#x27;s presented to the user in read-only mode. This project intentionally hides the Marimo edit mode. That means that the end user only ever sees a finished, read-only data app. Ease of use and trust in AI output were the main drivers behind this decision.<p>4 data sources are currently supported: Snowflake, BigQuery, Postgres, and Metabase. The code for the database query tools was derived from Google&#x27;s open source MCP Toolbox for Databases.<p>There is currently no support for MCP. Instead, data query tools are hardcoded. This decision was made to ensure high quality AI queries and limit tool bloat.<p>This is an early stage project, and is configured to only run locally at this time.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;marimo-team&#x2F;marimo" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;marimo-team&#x2F;marimo</a> [2] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=47678844">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=47678844</a>

Found: May 18, 2026 ID: 4671
Previous Page 1 of 235 Next