🛠️ Hacker News Tools
Showing 781–800 of 3083 tools from Hacker News
Last Updated
June 06, 2026 at 04:48 PM
Show HN: Output.ai - OSS framework we extracted from 500+ production AI agents
Show HN (score: 37)[Other] Show HN: Output.ai - OSS framework we extracted from 500+ production AI agents
Show HN: Finalrun – Spec-driven testing using English and vision for mobile apps
Hacker News (score: 21)[Testing] Show HN: Finalrun – Spec-driven testing using English and vision for mobile apps I wanted to test mobile apps in plain English instead of relying on brittle selectors like XPath or accessibility IDs.<p>With a vision-based agent, that part actually works well. It can look at the screen, understand intent, and perform actions across Android and iOS.<p>The bigger problem showed up around how tests are defined and maintained.<p>When test flows are kept outside the codebase (written manually or generated from PRDs), they quickly go out of sync with the app. Keeping them updated becomes a lot of effort, and they lose reliability over time.<p>I then tried generating tests directly from the codebase (via MCP). That improved sync, but introduced high token usage and slower generation.<p>The shift for me was realizing test generation shouldn’t be a one-off step. Tests need to live alongside the codebase so they stay in sync and have more context.<p>I kept the execution vision-based (no brittle selectors), but moved test generation closer to the repo.<p>I’ve open sourced the core pieces:<p>1. generate tests from codebase context 2. YAML-based test flows 3. Vision-based execution across Android and iOS<p>Repo: <a href="https://github.com/final-run/finalrun-agent" rel="nofollow">https://github.com/final-run/finalrun-agent</a> Demo: <a href="https://youtu.be/rJCw3p0PHr4" rel="nofollow">https://youtu.be/rJCw3p0PHr4</a><p>In the Demo video, you’ll see the "post-development hand-off." An AI builds a feature in an IDE, and Finalrun immediately generates and executes a vision-based test for it verifying the feature developed by AI.
Google open-sources experimental agent orchestration testbed Scion
Hacker News (score: 99)[DevOps] Google open-sources experimental agent orchestration testbed Scion <a href="https://googlecloudplatform.github.io/scion/overview/" rel="nofollow">https://googlecloudplatform.github.io/scion/overview/</a>
Show HN: Unicode Steganography
Hacker News (score: 21)[Other] Show HN: Unicode Steganography I built a demo of two Unicode steganography techniques, zero-width characters and homoglyph substitution, in the context of AI misalignment.<p>The first is about the use of two invisible zero-width characters (ZWS and ZWNJ) to binary encode text.<p>The second is much cooler. Most characters in the Latin and Cyrillic alphabets look nearly identical, but have different unicode. If you have text to encode and convert it into binary representation (1s and 0s), you could take plain english "carrier" text and for each 1 in the binary representation you could substitute the Cyrillic letter equivalent. Decoding the message requires traversing the text and seeing where Cyrillic letters could have been substituted but weren't, and where they were, leading to 0s and 1s respectively, which can be built back into your original hidden text.<p>In both cases, these are detectable, but the interesting problem for me is whether an LLM could eventually invent an encoding that goes unnoticed by both us, and automated detection.<p>If LLMs were able to covertly include messages in plaintext, misaligned AI Agents could eventually communicate across MCP/A2A and individual chat session boundaries undetected. A deceptive LLM might seem helpful, but work against your goals. It could tell other agents it interacts with over MCP/A2A to help it discreetly fail, signal intent, and avoid tripping oversight/safety mechanisms. Furthermore, oversight mechanisms become more difficult to implement if we can't believe our own eyes.
Show HN: td – a CLI to manage tasks, sessions, and worktrees for agentic coding
Show HN (score: 6)[CLI Tool] Show HN: td – a CLI to manage tasks, sessions, and worktrees for agentic coding Hi everyone! I built this because I wanted a little bit more organization around my Claude sessions, worktrees and plans while staying in the terminal and not relying on another SaaS tool. Since it's a command line tool, the added bonus is that Claude can use `td` directly. The td calendar was just a fun add-on but the Claude session stats have been pretty interesting! Let me know what you think!
[Other] Show HN: Stop paying for Dropbox/Google Drive, use your own S3 bucket instead Last week SWYX nerd-sniped me into building an Open-source Dropbox.<p>Here is Locker: the ultimate open-source Google Drive/box/Dropbox alternative - Provider agnostic (S3, R2, vercel blob, local) - BYOB (Bring your own bucket) - Virtual file system - QMD Search plugin
How Complex is my Code?
Hacker News (score: 45)[Other] How Complex is my Code?
We found an undocumented bug in the Apollo 11 guidance computer code
Hacker News (score: 266)[Other] We found an undocumented bug in the Apollo 11 guidance computer code
Show HN: Hippo, biologically inspired memory for AI agents
Show HN (score: 116)[Other] Show HN: Hippo, biologically inspired memory for AI agents
Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS
Hacker News (score: 409)[Other] Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS I built this because I wanted to see how far I could get with a voice-to-text app that used 100% local models so no data left my computer. I've been using a ton for coding and emails. Experimenting with using it as a voice interface for my other agents too. 100% open-source MIT license, would love feedback, PRs, and ideas on where to take it.
Show HN: TTF-DOOM – A raycaster running inside TrueType font hinting
Hacker News (score: 57)[Other] Show HN: TTF-DOOM – A raycaster running inside TrueType font hinting TrueType fonts have a hinting VM that grid-fits glyphs. It has a stack, storage area, conditionals, function calls, and it turns out it's Turing-complete. So I built a raycasting engine in the hinting bytecode.<p>The glyph "A" in the font has 16 vertical bar contours. The hinting program reads player coordinates from font variation axes via GETVARIATION, does DDA ray marching against a tile map in the storage area, and repositions bar heights with SCFS. It ends up looking like a crude Wolfenstein-style view.<p>Small visuzlization: <a href="https://github.com/4RH1T3CT0R7/ttf-doom/blob/main/docs/media/transform.gif" rel="nofollow">https://github.com/4RH1T3CT0R7/ttf-doom/blob/main/docs/media...</a><p>About 6.5 KB of bytecode total - 13 functions, 795 storage slots, sin/cos lookup tables.<p>JS handles movement, enemies, and shooting, then passes the coordinates to the font through CSS font-variation-settings. The font is basically a weird GPU.<p>The weirdest parts: - TrueType MUL does (a<i>b)/64, not a</i>b. So 1*4=0. The DIV instruction is equally cursed. - No WHILE loops. Everything compiles to recursive FDEFs. FreeType limits call depth to ~64 frames. - SVTCA[0] is Y, SVTCA[1] is X. Of course.<p>There's a small compiler behind this - lexer, parser, codegen - that turns a C-like DSL into TT assembly.<p>Demo GIF: <a href="https://github.com/4RH1T3CT0R7/ttf-doom/blob/main/docs/media/demo.gif" rel="nofollow">https://github.com/4RH1T3CT0R7/ttf-doom/blob/main/docs/media...</a><p>Live demo: <a href="https://4rh1t3ct0r7.github.io/ttf-doom/" rel="nofollow">https://4rh1t3ct0r7.github.io/ttf-doom/</a> (Chrome/Edge, WASD+arrows, Space to shoot, Tab for debug overlay)<p>This is a DOOM-style raycaster, not a port of the original engine - similar to DOOMQL and the Excel DOOM. The wall rendering does happen in the font's hinting VM though. Press Tab in the demo to watch the font variation axes change as you move.
Show HN: Meta-agent: self-improving agent harnesses from live traces
Show HN (score: 7)[Other] Show HN: Meta-agent: self-improving agent harnesses from live traces We built meta-agent: an open-source library that automatically and continuously improves agent harnesses from production traces.<p>Point it at an existing agent, a stream of unlabeled production traces, and a small labeled holdout set.<p>An LLM judge scores unlabeled production traces as they stream.<p>A proposer reads failed traces and writes one targeted harness update at a time, such as changes to prompts, hooks, tools, or subagents. The update is kept only if it improves holdout accuracy.<p>On tau-bench v3 airline, meta-agent improved holdout accuracy from 67% to 87%.<p>We open-sourced meta-agent. It currently supports Claude Agent SDK, with more frameworks coming soon.<p>Try it here: <a href="https://github.com/canvas-org/meta-agent" rel="nofollow">https://github.com/canvas-org/meta-agent</a>
Launch HN: Freestyle: Sandboxes for AI Coding Agents
Hacker News (score: 120)[DevOps] Launch HN: Freestyle: Sandboxes for AI Coding Agents We’re Ben and Jacob, cofounders of Freestyle (<a href="https://freestyle.sh">https://freestyle.sh</a>). We’re building a cloud for Coding Agents.<p>For the first generation of agents it looked like workflows with minimal tools. 2 years ago we published a package to let AI work in SQL, at that time GPT-4 could write simple scripts. Soon after the first AI App Builders started using AI to make whole websites; we supported that with a serverless deploy system.<p>But the current generation is going much further, instead of minimal tools and basic serverless apps AI can utilize the full power of a computer (“sandbox”). We’re building sandboxes that are interchangeable with EC2s from your agents perspective, with bonus features:<p>1. We’ve figured out how to fork a sandbox horizontally without more than a 400ms pause in it. That's not forking the filesystem, we mean forking the whole memory of it. If you’re half way down a browser page with animations running, they’ll be in the same place in all the forks. If you’re running a minecraft server every block and player will be in the same place on the forks. If you’re running a local environment and an error comes up in process that error will be there in all the forks. This works for snapshotting as well, you can save your place and come back weeks later.<p>2. Our sandboxes start in ~500ms.<p>Demo: <a href="https://www.loom.com/share/8b3d294d515442f296aecde1f42f5524" rel="nofollow">https://www.loom.com/share/8b3d294d515442f296aecde1f42f5524</a><p>Compared with other sandboxes, our goal is to be the most powerful. We support full Linux + hardware-virtualization, eBPF, Fuse, etc. We run full Debian with multiple users and we use a systemd init instead of runc. Whatever your AI expects to work on debian should work on these vms, and if it doesn’t send a bug report.<p>In order to make this possible, we’ve moved to our own bare metal racks. Early in our testing we realized that moving VMs across cloud nodes would not have acceptable performance properties. We asked Google Cloud and AWS for a quote on their bare metal nodes and found that the monthly cost was equivalent to the total cost of the hardware so we did that.<p>Our goal is to build the necessary infrastructure to replicate the human devloop on the massively multi-tenant scale of AI, so these VMs should be as powerful as the ones you’re used to, while also being available to provision in seconds.
Launch HN: Freestyle – Sandboxes for Coding Agents
Hacker News (score: 192)[DevOps] Launch HN: Freestyle – Sandboxes for Coding Agents We’re Ben and Jacob, cofounders of Freestyle (<a href="https://freestyle.sh">https://freestyle.sh</a>). We’re building a cloud for Coding Agents.<p>For the first generation of agents it looked like workflows with minimal tools. 2 years ago we published a package to let AI work in SQL, at that time GPT-4 could write simple scripts. Soon after the first AI App Builders started using AI to make whole websites; we supported that with a serverless deploy system.<p>But the current generation is going much further, instead of minimal tools and basic serverless apps AI can utilize the full power of a computer (“sandbox”). We’re building sandboxes that are interchangeable with EC2s from your agents perspective, with bonus features:<p>1. We’ve figured out how to fork a sandbox horizontally without more than a 400ms pause in it. That's not forking the filesystem, we mean forking the whole memory of it. If you’re half way down a browser page with animations running, they’ll be in the same place in all the forks. If you’re running a minecraft server every block and player will be in the same place on the forks. If you’re running a local environment and an error comes up in process that error will be there in all the forks. This works for snapshotting as well, you can save your place and come back weeks later.<p>2. Our sandboxes start in ~500ms.<p>Demo: <a href="https://www.loom.com/share/8b3d294d515442f296aecde1f42f5524" rel="nofollow">https://www.loom.com/share/8b3d294d515442f296aecde1f42f5524</a><p>Compared with other sandboxes, our goal is to be the most powerful. We support full Linux + hardware-virtualization, eBPF, Fuse, etc. We run full Debian with multiple users and we use a systemd init instead of runc. Whatever your AI expects to work on debian should work on these vms, and if it doesn’t send a bug report.<p>In order to make this possible, we’ve moved to our own bare metal racks. Early in our testing we realized that moving VMs across cloud nodes would not have acceptable performance properties. We asked Google Cloud and AWS for a quote on their bare metal nodes and found that the monthly cost was equivalent to the total cost of the hardware so we did that.<p>Our goal is to build the necessary infrastructure to replicate the human devloop on the massively multi-tenant scale of AI, so these VMs should be as powerful as the ones you’re used to, while also being available to provision in seconds.
Show HN: I built a 2-min quiz that shows you how bad you are at estimating
Show HN (score: 6)[Other] Show HN: I built a 2-min quiz that shows you how bad you are at estimating I've gotten to the point in my career where I now make strategic decisions often (hiring, firing, choosing what equipment to go with, etc.), as well as in my personal life where I need to strongly weigh my options for a big purchase or investment. I found a not-so-surprising parallel between the two as these decisions "resolved." Am I making good decisions or am I getting lucky?<p>Did some research, read some books, and realized I should get in the habit of tracking my decision process. That quickly turned into the idea that formed Convexly.<p>The landing page is a 10-question calibration quiz where you assign a confidence level to statements drawn from a rotating pool of 100 (working on making the pool larger) and you get a Brier score back instantly. No signup required, and you can share your scores right away.<p>If you find it interesting, you can create a free account where you can track your decisions with probability estimates, resolve them over time, and get calibration curves that show if you are over/underconfident. From what I've seen so far, users are overconfident when they say they're between 70-90% sure about something.<p>For the math: Beta-PERT distributions for the payoff modeling, Kelly criterion for the position sizing, signal detection theory for separating skill from randomness.<p>On the coding side: FastAPI with NumPy/SciPy, frontend in Next.js and Supabase.<p>So far this has been a solo project of mine. If you want to see all the features use code SHOWHN for 30 days of full access, no credit card required.<p>Curious if anything about your score surprised you after taking the quiz.
Show HN: I just built a MCP Server that connects Claude to all your wearables
Show HN (score: 12)[Other] Show HN: I just built a MCP Server that connects Claude to all your wearables Hey HN,<p>I built Pace, a Claude Connector that lets you connect all your wearables (Garmin, Polar, Whoop, 20+) with Claude.<p>You connect your devices once and can analyze your data with Claude. No Dashboard needed, just in natural language. I already use it everyday, especially the visualization tool in Claude makes this really cool to use.<p>Tools include: overview, sleep, training, activity, samples and trends.<p>Tech Stack: Python (FastMCP), Google Cloud Run, Google Cloud SQL (PostgreSQL) and Firebase<p>It is live and free to try (no Claude Pro/MAX Plan needed)..<p>Would love feedback, especially from people who've built MCP servers or use wearables seriously.
Media scraper Gallery-dl is moving to Codeberg after receiving a DMCA notice
Hacker News (score: 165)[Other] Media scraper Gallery-dl is moving to Codeberg after receiving a DMCA notice
Show HN: Multi-agent coding assistant with a sandboxed Rust execution engine
Show HN (score: 5)[Other] Show HN: Multi-agent coding assistant with a sandboxed Rust execution engine
Show HN: Open-source ontology – SEC fund filings
Show HN (score: 5)[Other] Show HN: Open-source ontology – SEC fund filings Working on a schema for joining SEC fund filings across documents. The core problem: these filings describe the same fund in different formats and no standard exists for cross-document semantic queries.<p>Interested in feedback on the ontology design — especially from anyone working with fund data, XBRL, or FIBO.
Does coding with LLMs mean more microservices?
Hacker News (score: 13)[Other] Does coding with LLMs mean more microservices?