🛠️ All DevTools
Showing 161–180 of 3608 tools
Last Updated
March 05, 2026 at 04:11 AM
Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts
Hacker News (score: 18)[Other] Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts I've been building ZSE (Z Server Engine) for the past few weeks — an open-source LLM inference engine focused on two things nobody has fully solved together: memory efficiency and fast cold starts.<p>The problem I was trying to solve: Running a 32B model normally requires ~64 GB VRAM. Most developers don't have that. And even when quantization helps with memory, cold starts with bitsandbytes NF4 take 2+ minutes on first load and 45–120 seconds on warm restarts — which kills serverless and autoscaling use cases.<p>What ZSE does differently:<p>Fits 32B in 19.3 GB VRAM (70% reduction vs FP16) — runs on a single A100-40GB<p>Fits 7B in 5.2 GB VRAM (63% reduction) — runs on consumer GPUs<p>Native .zse pre-quantized format with memory-mapped weights: 3.9s cold start for 7B, 21.4s for 32B — vs 45s and 120s with bitsandbytes, ~30s for vLLM<p>All benchmarks verified on Modal A100-80GB (Feb 2026)<p>It ships with:<p>OpenAI-compatible API server (drop-in replacement)<p>Interactive CLI (zse serve, zse chat, zse convert, zse hardware)<p>Web dashboard with real-time GPU monitoring<p>Continuous batching (3.45× throughput)<p>GGUF support via llama.cpp<p>CPU fallback — works without a GPU<p>Rate limiting, audit logging, API key auth<p>Install:<p>----- pip install zllm-zse zse serve Qwen/Qwen2.5-7B-Instruct For fast cold starts (one-time conversion):<p>----- zse convert Qwen/Qwen2.5-Coder-7B-Instruct -o qwen-7b.zse zse serve qwen-7b.zse # 3.9s every time<p>The cold start improvement comes from the .zse format storing pre-quantized weights as memory-mapped safetensors — no quantization step at load time, no weight conversion, just mmap + GPU transfer. On NVMe SSDs this gets under 4 seconds for 7B. On spinning HDDs it'll be slower.<p>All code is real — no mock implementations. Built at Zyora Labs. Apache 2.0.<p>Happy to answer questions about the quantization approach, the .zse format design, or the memory efficiency techniques.
Making MCP cheaper via CLI
Hacker News (score: 88)[CLI Tool] Making MCP cheaper via CLI
PA bench: Evaluating web agents on real world personal assistant workflows
Hacker News (score: 12)[Other] PA bench: Evaluating web agents on real world personal assistant workflows We’re the team at Vibrant Labs (W24). We’ve been building envs for browser agents and quickly realized that existing benchmarks in this space didn’t capture the primary failure modes we were seeing in production (which scaled up as the number of applications and horizon length increase).<p>We built PA Bench (Personal Assistant Benchmark) to evaluate frontier computer/web use models on their ability to handle multi-step workflows across simulated clones of Gmail and Calendar.<p>*What’s next:*<p>We’re currently scaling the dataset to 3+ tabs and are building more high-fidelity simulations for common enterprise workflows. We’d love to hear feedback on the benchmark and notes about what was/wasn’t surprising about the results.<p>Blog post: <a href="https://vibrantlabs.com/blog/pa-bench">https://vibrantlabs.com/blog/pa-bench</a>
Show HN: I ported Tree-sitter to Go
Hacker News (score: 71)[Other] Show HN: I ported Tree-sitter to Go This started as a hard requirement for my TUI-based editor application, it ended up going in a few different directions.<p>A suite of tools that help with semantic code entities: <a href="https://github.com/odvcencio/gts-suite" rel="nofollow">https://github.com/odvcencio/gts-suite</a><p>A next-gen version control system called Got: <a href="https://github.com/odvcencio/got" rel="nofollow">https://github.com/odvcencio/got</a><p>I think this has some pretty big potential! I think there's many classes of application (particularly legacy architecture) that can benefit from these kinds of analysis tooling. My next post will be about composing all these together, an exciting project I call GotHub. Thanks!
Show HN: I ported Manim to TypeScript (run 3b1B math animations in the browser)
Hacker News (score: 84)[Other] Show HN: I ported Manim to TypeScript (run 3b1B math animations in the browser) Hi HN, I'm Narek. I built Manim-Web, a TypeScript/JavaScript port of 3Blue1Brown’s popular Manim math animation engine.<p>The Problem: Like many here, I love Manim's visual style. But setting it up locally is notoriously painful - it requires Python, FFmpeg, Cairo, and a full LaTeX distribution. It creates a massive barrier to entry, especially for students or people who just want to quickly visualize a concept.<p>The Solution: I wanted to make it zero-setup, so I ported the engine to TypeScript. Manim-Web runs entirely client-side in the browser. No Python, no servers, no install. It runs animations in real-time at 60fps.<p>How it works underneath: - Rendering: Uses Canvas API / WebGL (via Three.js for 3D scenes). - LaTeX: Rendered and animated via MathJax/KaTeX (no LaTeX install needed!). - API: I kept the API almost identical to the Python version (e.g., scene.play(new Transform(square, circle))), meaning existing Manim knowledge transfers over directly. - Reactivity: Updaters and ValueTrackers follow the exact same reactive pattern as the Python original.<p>Because it's web-native, the animations are now inherently interactive (objects can be draggable/clickable) and can be embedded directly into React/Vue apps, interactive textbooks, or blogs. I also included a py2ts converter to help migrate existing scripts.<p>Live Demo: <a href="https://maloyan.github.io/manim-web/examples" rel="nofollow">https://maloyan.github.io/manim-web/examples</a> GitHub: <a href="https://github.com/maloyan/manim-web" rel="nofollow">https://github.com/maloyan/manim-web</a><p>It's open-source (MIT). I'm still actively building out feature parity with the Python version, but core animations, geometry, plotting, and 3D orbiting are working great. I would love to hear your feedback, and I'll be hanging around to answer any technical questions about rendering math in the browser!
Time-Travel Debugging: Replaying Production Bugs Locally
Hacker News (score: 12)[Other] Time-Travel Debugging: Replaying Production Bugs Locally
New evidence that Cantor plagiarized Dedekind?
Hacker News (score: 98)New evidence that Cantor plagiarized Dedekind?
Show HN: Sgai – Goal-driven multi-agent software dev (GOAL.md → working code)
Hacker News (score: 19)[Other] Show HN: Sgai – Goal-driven multi-agent software dev (GOAL.md → working code) Hey HN,<p>We built Sgai to experiment with a different model of AI-assisted development.<p>Instead of prompting step-by-step, you define an outcome in GOAL.md (what should be built, not how), and Sgai runs a coordinated set of AI agents to execute it.<p>- It decomposes the goal into a DAG of roles (developer → reviewer → safety analyst, etc.) - It asks clarifying questions when needed - It writes code, runs tests, and iterates - Completion gates (e.g. make test) determine when it's actually done<p>Everything runs locally in your repo. There’s a web dashboard showing real-time execution of the agent graph. Nothing auto-pushes to GitHub.<p>We’ve used it internally for prototyping small apps and internal tooling. It’s still early and rough in places, but functional enough to share.<p>Demo (4 min): <a href="https://youtu.be/NYmjhwLUg8Q" rel="nofollow">https://youtu.be/NYmjhwLUg8Q</a> GitHub: <a href="https://github.com/sandgardenhq/sgai" rel="nofollow">https://github.com/sandgardenhq/sgai</a><p>Open source (Go). Works with Anthropic, OpenAI, or local models via opencode.<p>Curious what people think about DAG-based multi-agent workflows for coding. Has anyone here experimented with similar approaches?
Sub-second volumetric 3D printing by synthesis of holographic light fields
Hacker News (score: 85)Sub-second volumetric 3D printing by synthesis of holographic light fields
Show HN: Django Control Room – All Your Tools Inside the Django Admin
Hacker News (score: 23)[Other] Show HN: Django Control Room – All Your Tools Inside the Django Admin Over the past year I’ve been building a set of operational panels for Django:<p>- Redis inspection - cache visibility - Celery task introspection - URL discovery and testing<p>All of these tools have been built inside the Django admin.<p>Instead of jumping between tools like Flower, redis-cli, Swagger, or external services, I wanted something that sits where I’m already working.<p>I’ve grouped these under a single umbrella: Django Control Room.<p>The idea is pretty simple: the Django admin already gives you authentication, permissions, and a familiar interface. It can also act as an operational layer for your app.<p>Each panel is just a small Django app with a simple interface, so it’s easy to build your own and plug it in.<p>I’m working on more panels (signals, errors, etc.) and also thinking about how far this pattern can go.<p>Curious how others think about this. Does it make sense to consolidate this kind of tooling inside the admin, or do you prefer keeping it separate?
Launch HN: TeamOut (YC W22) – AI agent for planning company retreats
Hacker News (score: 47)[Other] Launch HN: TeamOut (YC W22) – AI agent for planning company retreats Hi HN, I’m Vincent, CTO of TeamOut (<a href="https://www.teamout.com/">https://www.teamout.com/</a>). We build an AI agent that plans company events from start to finish entirely through conversation. Similar to how Lovable helps build websites through chat, we apply that approach to event planning. Our system handles venue sourcing, vendor coordination, flight cost estimation, itinerary building, and overall project management.<p>Here’s a demo: <a href="https://www.youtube.com/watch?v=QVyc-x-isjI" rel="nofollow">https://www.youtube.com/watch?v=QVyc-x-isjI</a>. The product is live at <a href="https://app.teamout.com/ai">https://app.teamout.com/ai</a> and does not require signup.<p>We went through YC in 2022 but did not launch on HN at the time. Back then, the product was more traditional, closer to an Airbnb-style search marketplace. Over the past two years, after helping organize more than 1,200 events, we rebuilt the core system around an agent architecture that directly manages the planning process. With this new version live, it felt like the right moment to share it here since it represents a fundamentally different approach to planning events.<p>The problem: Planning a company retreat usually means choosing between three imperfect options: (1) Hire an event planner and pay significant fees and venue markups; (2) Do it yourself and spend dozens of hours on research, emails, and negotiation; or (3) Use tools like Airbnb that are not designed for group logistics or meeting space.<p>The difficulty is not just finding a venue. Even for 30 to 50 people, planning turns into weeks of back-and-forth emails for quotes, comparing inconsistent pricing across PDFs, and tracking budgets in spreadsheets. It becomes an ongoing coordination problem with evolving constraints and slow, asynchronous vendor responses. Most existing software is form-driven, but the real workflow is conversational and stateful.<p>Offsites are expensive and high stakes. A single event can represent a significant chunk of a team’s annual budget, and mistakes show up directly as cost overruns or poor experiences. Founders and operators often end up spending time on event logistics instead of their actual work.<p>I ran into this while organizing retreats at a previous company. Before TeamOut, I worked as an AI researcher at IBM on NLP and machine learning systems. Sitting inside long email threads and cost spreadsheets, it did not look like a marketplace gap to me. It looked like a reasoning and state management problem. As large language models improved at multi-step reasoning and tool use, it became realistic to automate the coordination layer itself.<p>Our Solution: The core agent relies on a combination of models such as Gemini, Claude, and GPT. A central LLM-based agent maintains planning context across turns and decides which specialized tool to call next. Each tool has a specific responsibility: - Venue search and filtering - Cost estimations (accommodation + flights) - Budget comparisons - Quote and outreach flows - Communication tool with our team<p>For venue recommendations across more than 10,000 venues, we do not rely purely on the language model. We embed both user requirements and venues into vector representations and retrieve candidates using similarity search. Hard constraints such as capacity and dates are applied first, and results are ranked before being presented.<p>On the interface side, we use a split layout: conversation on the left and structured results on the right. As you refine the plan in chat, the event updates in real time, allowing an iterative workflow rather than a static search experience.<p>What is different is that we treat event planning as a stateful coordination problem rather than a one-shot search query. The agent orchestrates tools, manages evolving constraints, and surfaces trade-offs explicitly. It does not invent venues or fabricate pricing, and it is not designed to replace human planners for very large or highly customized events.<p>We make money from commissions on venue bookings. It is free for teams to explore options and plan. If you’ve organized an offsite or large meetup before, I’d genuinely value your perspective. Where would you expect this to fail? What edge cases are we underestimating? Where wouldn’t you trust an agent to handle the details?<p>My engineering team and I will be here all day to answer questions, happy to go deep on architecture, tradeoffs, and lessons learned. We’d really appreciate your candid feedback.
Python Type Checker Comparison: Empty Container Inference
Hacker News (score: 25)Python Type Checker Comparison: Empty Container Inference
Red Hat takes on Docker Desktop with its enterprise Podman Desktop build
Hacker News (score: 42)[Other] Red Hat takes on Docker Desktop with its enterprise Podman Desktop build
katanemo/plano
GitHub Trending[Other] Delivery infrastructure for agentic apps - Plano is an AI-native proxy and data plane that offloads plumbing work, so you stay focused on your agent's core logic (via any AI framework).
bytedance/deer-flow
GitHub Trending[Other] An open-source SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skills and subagents, it handles different levels of tasks that could take minutes to hours.
muratcankoylan/Agent-Skills-for-Context-Engineering
GitHub Trending[Other] A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems. Use when building, optimizing, or debugging agent systems that require effective context management.
What I learned while trying to build a production-ready nearest neighbor system
Hacker News (score: 15)What I learned while trying to build a production-ready nearest neighbor system
Show HN: A real-time strategy game that AI agents can play
Hacker News (score: 64)[Other] Show HN: A real-time strategy game that AI agents can play I've liked all the projects that put LLMs into game environments. It's been a weird juxtaposition, though: frontier LLMs can one-shot full coding projects, and those same models struggle to get out of Pokémon Red's Mt. Moon.<p>Because of this, I wanted to create a game environment that put this generation of frontier LLMs' top skill, coding, on full display.<p>Ten years ago, a team released a game called Screeps. It was described as an "MMO RTS sandbox for programmers." The Screeps paradigm of writing code and having it executed in a real-time game environment is well suited to LLMs. Drawing on a version of the Screeps open source API, LLM Skirmish pits LLMs head-to-head in a series of 1v1 real-time strategy games.<p>In my testing I found that Claude Opus 4.5 was the most dominant model, but it showed weakness in round 1 as it was overly focused on its in-game economy. Meanwhile, I probably spent a third of all code on sandbox hardening because GPT 5.2 kept trying to cheat by pre-reading its opponent's strategies.<p>If there's interest, I'm planning on doing a round of testing with the latest generation of LLMs (Claude 4.6 Opus, GPT 5.3 Codex, etc.).<p>You can run local matches via CLI. I'm running a hosted match runner with Google Cloud Run that uses isolated-vm. The match playback visualizer is statically served from Cloudflare.<p>I've created a community ladder that you can submit strategies to via CLI, no auth required. I've found that the CLI plus the skill.md that's available has been enough for AI agents to immediately get started.<p>Website: <a href="https://llmskirmish.com" rel="nofollow">https://llmskirmish.com</a><p>API docs: <a href="https://llmskirmish.com/docs" rel="nofollow">https://llmskirmish.com/docs</a><p>GitHub: <a href="https://github.com/llmskirmish/skirmish" rel="nofollow">https://github.com/llmskirmish/skirmish</a><p>A video of a match: <a href="https://www.youtube.com/watch?v=lnBPaZ1qamM" rel="nofollow">https://www.youtube.com/watch?v=lnBPaZ1qamM</a>
Claude Code Remote Control
Hacker News (score: 64)[Other] Claude Code Remote Control
Show HN: Context Mode – 315 KB of MCP output becomes 5.4 KB in Claude Code
Hacker News (score: 35)[Other] Show HN: Context Mode – 315 KB of MCP output becomes 5.4 KB in Claude Code Every MCP tool call dumps raw data into Claude Code's 200K context window. A Playwright snapshot costs 56 KB, 20 GitHub issues cost 59 KB. After 30 minutes, 40% of your context is gone.<p>I built an MCP server that sits between Claude Code and these outputs. It processes them in sandboxes and only returns summaries. 315 KB becomes 5.4 KB.<p>It supports 10 language runtimes, SQLite FTS5 with BM25 ranking for search, and batch execution. Session time before slowdown goes from ~30 min to ~3 hours.<p>MIT licensed, single command install:<p>/plugin marketplace add mksglu/claude-context-mode<p>/plugin install context-mode@claude-context-mode<p>Benchmarks and source: <a href="https://github.com/mksglu/claude-context-mode" rel="nofollow">https://github.com/mksglu/claude-context-mode</a><p>Would love feedback from anyone hitting context limits in Claude Code.