Show HN: Lemonade: Run LLMs Locally with GPU and NPU Acceleration

Show HN (score: 9)

Found: August 19, 2025

ID: 950

Description

API/SDK

Show HN: Lemonade: Run LLMs Locally with GPU and NPU Acceleration Lemonade is an open-source SDK and local LLM server focused on making it easy to run and experiment with large language models (LLMs) on your own PC, with special acceleration paths for NPUs (Ryzen™ AI) and GPUs (Strix Halo and Radeon™).

Why?

There are three qualities needed in a local LLM serving stack, and none of the market leaders (Ollama, LM Studio, or using llama.cpp by itself) deliver all three: 1. Use the best backend for the user’s hardware, even if it means integrating multiple inference engines (llama.cpp, ONNXRuntime, etc.) or custom builds (e.g., llama.cpp with ROCm betas). 2. Zero friction for both users and developers from onboarding to apps integration to high performance. 3. Commitment to open source principles and collaborating in the community.

Lemonade Overview:

Simple LLM serving: Lemonade is a drop-in local server that presents an OpenAI-compatible API, so any app or tool that talks to OpenAI’s endpoints will “just work” with Lemonade’s local models. Performance focus: Powered by llama.cpp (Vulkan and ROCm for GPUs) and ONNXRuntime (Ryzen AI for NPUs and iGPUs), Lemonade squeezes the best out of your PC, no extra code or hacks needed. Cross-platform: One-click installer for Windows (with GUI), pip/source install for Linux. Bring your own models: Supports GGUFs and ONNX. Use Gemma, Llama, Qwen, Phi and others out-of-the-box. Easily manage, pull, and swap models. Complete SDK: Python API for LLM generation, and CLI for benchmarking/testing. Open source: Apache 2.0 (core server and SDK), no feature gating, no enterprise “gotchas.” All server/API logic and performance code is fully open; some software the NPU depends on is proprietary, but we strive for as much openness as possible (see our GitHub for details). Active collabs with GGML, Hugging Face, and ROCm/TheRock.

Get started:

Windows? Download the latest GUI installer from https://lemonade-server.ai/

Linux? Install with pip or from source (https://lemonade-server.ai/)

Docs: https://lemonade-server.ai/docs/

Discord for banter/support/feedback: https://discord.gg/5xXzkMu8Zk

How do you use it?

Click on lemonade-server from the start menu Open http://localhost:8000 in your browser for a web ui with chat, settings, and model management. Point any OpenAI-compatible app (chatbots, coding assistants, GUIs, etc.) at http://localhost:8000/api/v1 Use the CLI to run/load/manage models, monitor usage, and tweak settings such as temperature, top-p and top-k. Integrate via the Python API for direct access in your own apps or research.

Who is it for?

Developers: Integrate LLMs into your apps with standardized APIs and zero device-specific code, using popular tools and frameworks. LLM Enthusiasts, plug-and-play with: Morphik AI (contextual RAG/PDF Q&A) Open WebUI (modern local chat interfaces) Continue.dev (VS Code AI coding copilot) …and many more integrations in progress! Privacy-focused users: No cloud calls, run everything locally, including advanced multi-modal models if your hardware supports it.

Why does this matter?

Every month, new on-device models (e.g., Qwen3 MOEs and Gemma 3) are getting closer to the capabilities of cloud LLMs. We predict a lot of LLM use will move local for cost reasons alone. Keeping your data and AI workflows on your own hardware is finally practical, fast, and private, no vendor lock-in, no ongoing API fees, and no sending your sensitive info to remote servers. Lemonade lowers friction for running these next-gen models, whether you want to experiment, build, or deploy at the edge. Would love your feedback! Are you running LLMs on AMD hardware? What’s missing, what’s broken, what would you like to see next? Any pain points from Ollama, LM Studio, or others you wish we solved? Share your stories, questions, or rant at us.

Links:

Download & Docs: https://lemonade-server.ai/

GitHub: https://github.com/lemonade-sdk/lemonade

Discord: https://discord.gg/5xXzkMu8Zk

Thanks HN!

More from Show

Show HN: HTTP:COLON – A quick HTTP header/directive inspector and reference

Show HN: HTTP:COLON – A quick HTTP header/directive inspector and reference Hi HN -- I built HTTP:COLON, a small, open-source web tool for quickly checking a site’s HTTP response headers and learning what they mean as you go.Link: <a href="https://httpcolon.dev/" rel="nofollow">https://httpcolon.dev/</a>What it does- Enter a URL and fetch its response headers- Groups common headers into handy buckets (cache, content, security)- Includes short docs/tooltips for headers and directives so you can look things up while debugging. I find hovering on highlighted headers quite useful!Supports different HTTP methods (GET/POST/PUT/DELETE)Deep links- You can link directly to a host, e.g. <a href="https://httpcolon.dev/www.google.com" rel="nofollow">https://httpcolon.dev/www.google.com</a>(or any domain) to jump straight into inspecting it.Why I made it- I kept bouncing between DevTools, MDN, and random blog posts while debugging caching + security headers. I wanted one place that’s quick for “what am I getting back?” and “what does this header/directive do?”It’s in beta, and I’d love feedback on:- Missing features you’d want for day-to-day debugging (export/share formats, comparisons, presets, etc.)Thanks!

Show HN: I made R/place for LLMs

Show HN: I made R/place for LLMs I built AI Place, a vLLM-controlled pixel canvas inspired by r/place. Instead of users placing pixels, an LLM paints the grid continuously and you can watch it evolve live.The theme rotates daily. Currently, the canvas is scored using CLIP ViT-B/32 against a prompt (e.g., Pixelart of ${theme}). The highest-scoring snapshot is saved to the archive at the end of each day.The agents work in a simple loop:Input: Theme + image of current canvasOutput: Python code to update specific pixel coordinates + One word descriptionTech: Next.js, SSE realtime updates, NVIDIA NIM (Mistral Large 3/GPT-OSS/Llama 4 Maverick) for the painting decisionsWould love feedback! (or ideas for prompts/behaviors to try)

Show HN: Hurry – Fast Rust build caching

Show HN: Hurry – Fast Rust build caching Hey HN, we’re Eliza and Xin. We’re working on Hurry, an open source drop-in tool that adds distributed build caching to Cargo with (almost) zero configuration. Wherever you run cargo build, you can run hurry cargo build instead, and expect around 2-5x (our best benchmark is 22x) faster builds.We built this because we were dissatisfied with the current build caching options available for Rust. Buck and Bazel require learning a new tool. GitHub Actions and swatinem are too coarse-grained (either the whole cache hits, or none of it does) and finicky to debug, and cargo-chef and Docker layer caching have the same problems.We wanted something that could restore cache at the package level, did not require integration-specific setup, and worked everywhere.Hurry is fully open source under Apache 2. You can try it out now with our cloud hosted caching service at <a href="https://hurry.build" rel="nofollow">https://hurry.build</a> or self-host your own build cache service from <a href="https://github.com/attunehq/hurry" rel="nofollow">https://github.com/attunehq/hurry</a>. Sorry in advance for the rough edges - we have some customers already exercising the hot paths, but build tools are pretty large surfaces. We’ve got a list of known limitations at <a href="https://github.com/attunehq/hurry?tab=readme-ov-file#known-limitations" rel="nofollow">https://github.com/attunehq/hurry?tab=readme-ov-file#known-l...</a>.We’d love for folks here to try it out (maybe on your next Rust winter side project?) and let us know what you think. We’ll be in the comments here, or you can email us at founders@attunehq.com.Our goal is to make all parts of software engineering faster. If you have some part of your coding workflow that you want faster, please feel free to reach out. We’d love to chat.

Show HN: RenderCV – Open-source CV/resume generator, YAML → PDF

Show HN: RenderCV – Open-source CV/resume generator, YAML → PDF I built RenderCV because Word kept breaking my layout and LaTeX was overkill. I wanted my CV as a single YAML file (content, design, margins, everything) that I could render with one command.Run rendercv render cv.yaml → get a perfectly typeset PDF.Highlights:1. Version-controllable: Your CV is just text. Diff it, tag it.2. LLM-friendly: Paste into ChatGPT, tailor to a job description, paste back, render. Batch-produce variants with terminal AI agents.3. Perfect typography: Typst under the hood handles pixel-perfect alignment and spacing.4. Full design control: Margins, fonts, colors, and more; tweak everything in YAML.5. Comes with JSON Schema: Autocompletion and inline docs in your editor.Battle-tested for 2+ years, thousands of users, 120k+ total PyPI downloads, 100% test coverage, actively maintained.GitHub: <a href="https://github.com/rendercv/rendercv" rel="nofollow">https://github.com/rendercv/rendercv</a>Docs: <a href="https://docs.rendercv.com" rel="nofollow">https://docs.rendercv.com</a>Overview on RenderCV's software design (Pydantic + Jinja2 + Typst): <a href="https://docs.rendercv.com/developer_guide/understanding_rendercv/" rel="nofollow">https://docs.rendercv.com/developer_guide/understanding_rend...</a>I also wrote up the internals as an educational resource on maintaining Python projects (GitHub Actions, packaging, Docker, JSON Schema, deploying docs, etc.): <a href="https://docs.rendercv.com/developer_guide/" rel="nofollow">https://docs.rendercv.com/developer_guide/</a>

No other tools from this source yet.