Show HN: Fine-tuned Llama 3.2 3B to match 70B models for local transcripts

Show HN (score: 14)

Found: September 01, 2025

ID: 1167

Description

Other

Show HN: Fine-tuned Llama 3.2 3B to match 70B models for local transcripts I wrote a small local tool to transcribe audio notes (Whisper/Parakeet). Code: https://github.com/bilawalriaz/lazy-notes

I wanted to process raw transcripts locally without OpenRouter. Llama 3.2 3B with a prompt was decent but incomplete, so I tried SFT. I fine-tuned Llama 3.2 3B to clean/analyze dictation and emit structured JSON (title, tags, entities, dates, actions).

Data: 13 real memos → Kimi K2 gold JSON → ~40k synthetic + gold; keys canonicalized. Chutes.ai (5k req/day).

Training: RTX 4090 24GB, ~4h, LoRA (r=128, α=128, dropout=0.05), max seq 2048, bs=16, lr=5e-5, cosine, Unsloth. On 2070 Super 8GB it was ~8h.

Inference: merged to GGUF, Q4_K_M (llama.cpp), runs in LM Studio.

Evals (100-sample, scored by GLM 4.5 FP8): overall 5.35 (base 3B) → 8.55 (fine-tuned); completeness 4.12 → 7.62; factual 5.24 → 8.57.

Head-to-head (10 samples): ~8.40 vs Hermes-70B 8.18, Mistral-Small-24B 7.90, Gemma-3-12B 7.76, Qwen3-14B 7.62. Teacher Kimi K2 ~8.82.

Why: task specialization + JSON canonicalization reduces variance; the model learns the exact structure/fields.

Lessons: train on completions only; synthetic is fine for narrow tasks; Llama is straightforward to train. Dataset pipeline + training script + evals: https://github.com/bilawalriaz/local-notes-transcribe-llm

More from Show

Show HN: Grapes Studio – HTML-first WYSIWYG website editor with LLM assistant

Show HN: Grapes Studio – HTML-first WYSIWYG website editor with LLM assistant I’ve been working with @artf (creator of GrapesJS) on Grapes Studio, an HTML-first editor with an LLM assistant on top of GrapesJS.We’re approaching this differently than the new wave of AI app/site builders which are typically generating full React applications, which we think is overkill for simple websites. From talking to people using these tools, we’ve seen a lot of issues with build errors and overly complicated pages.With our approach you can:- Edit visually via the no-code editor (drag/drop) or ask the LLM to make scoped changes (like “add a section” or “add a new page”).- Build with straight HTML/CSS- Ask AI to import your current site and start building from there instead of total rebuild.We think there’s a lot of benefit using drag and drop editor functionality with LLMs, or you can jump straight into the code in the editor if you choose.- Do you see value in this hybrid model (AI + visual + code editing)?- What are the biggest blockers you’ve run into with AI-only builders?Let us know what you think.

Show HN: Nanobot – Turn MCP servers into full AI agents

Show HN: Nanobot – Turn MCP servers into full AI agents Today we're releasing Nanobot an open-source framework for building AI agents on top of the Model Context Protocol (MCP).MCP servers are a great way to expose structured tools, but they’re usually just that—collections of functions. Nanobot makes it simple to wrap any MCP server with reasoning, a system prompt, and orchestration so it behaves like a real agent. Even better, Nanobot fully supports MCP-UI, so agents can pass rich interactive components (forms, dashboards, even mini-apps) directly into chat.A simple example: if you had a Blackjack MCP server with tools like deal, bet, and hit, you could wrap it with Nanobot to create a dealer agent that knows how to explain the game, guide a player, and render an interactive Blackjack table inside chat.We built this because we wanted agents that go beyond text and function calls, into actual interactive experiences—something useful for everything from games to e-commerce to developer tools.Code is on GitHub: <a href="https://github.com/nanobot-ai/nanobot" rel="nofollow">https://github.com/nanobot-ai/nanobot</a>Live demo (Blackjack): <a href="https://blackjack.nanobot.ai" rel="nofollow">https://blackjack.nanobot.ai</a>We’d love feedback from this community—on the framework, the design, and what you’d like to see next.

Show HN: Claudable – OpenSource Lovable that runs locally with Claude Code

Show HN: Claudable – OpenSource Lovable that runs locally with Claude Code Hey, HN! I'm Aaron. I built an open-source Lovable for Claude Code users.Platforms like Lovable, Replit Agent, and Bolt require separate API keys and $25+/month subscriptions. But if you’re already subscribed to Claude Pro or Cursor, you can use those plans directly without extra costs.Claudable runs entirely locally through Claude Code (Cursor CLI also supported) and provides:- Instant UI preview (similar to Lovable)- Web-optimized, production-ready designs- Direct Git integration- One-click Vercel deployment- Zero additional API costsIt’s open source and available today. I’m actively developing it and would love community feedback on what features to prioritize next.GitHub: <a href="https://github.com/opactorai/Claudable" rel="nofollow">https://github.com/opactorai/Claudable</a>Happy to answer any questions!

Show HN: Memori – Open-Source Memory Engine for AI Agents

Show HN: Memori – Open-Source Memory Engine for AI Agents Hey HN! I'm Arindam, part of the team behind Memori (<a href="https://memori.gibsonai.com/" rel="nofollow">https://memori.gibsonai.com/</a>).Memori adds a stateful memory engine to AI agents, enabling them to stay consistent, recall past work, and improve over time. With Memori, agents don’t lose track of multi-step workflows, repeat tool calls, or forget user preferences. Instead, they build up human-like memory that makes them more reliable and efficient across sessions.We’ve also put together demo apps (a personal diary assistant, a research agent, and a travel planner) so you can see memory in action.Current LLMs are stateless — they forget everything between sessions. This leads to repetitive interactions, wasted tokens, and inconsistent results. When building AI agents, this problem gets even worse: without memory, they can’t recover from failures, coordinate across steps, or apply simple rules like “always write tests.”We realized that for AI agents to work in production, they need memory. That’s why we built Memori.Memori uses a multi-agent architecture to capture conversations, analyze them, and decide which memories to keep active. It supports three modes:- Conscious Mode: short-term memory for recent, essential context. - Auto Mode: dynamic search across long-term memory. - Combined Mode: blends both for fast recall and deep retrieval.Under the hood, Memori is SQL-first. You can use SQLite, PostgreSQL, or MySQL to store memory with built-in full-text search, versioning, and optimization. This makes it simple to deploy, production-ready, and extensible.Memori is backed by GibsonAI’s database infrastructure, which supports:- Instant provisioning - Autoscaling on demand - Database branching & versioning - Query optimization - Point of recoveryThis means memory isn’t just stored, it’s reliable, efficient, and scales with real-world workloads.We’ve open-sourced Memori under the Apache 2.0 license so anyone can build with it. You can check out the GitHub repo here: <a href="https://github.com/GibsonAI/memori" rel="nofollow">https://github.com/GibsonAI/memori</a>, explore the docs, and join our community on Discord.We’d love to hear your thoughts. Please dive into the code, try out the demos, and share feedback, your input will help shape where we take Memori from here.

Show HN: Ten years of running every day, visualized

Show HN: Ten years of running every day, visualized Today marks ten years, 3653 consecutive days, of running at least one mile every day under the USRSA rules [1]. To celebrate, I built an interactive dashboard that turns a decade of GPX files into charts you can explore.Running has truly changed my life: I've made lifelong friends, explored beautiful places, and more importantly invested into my own health and fitness, which I'm starting to see the positive benefits as I get older.The stack is pretty simple: a NextJS app, with a Postgres database to keep all my running data, and all the stats are pre-computed and cached in Redis, so I effectively only hit the database once a day when a new run is ingested. On the fronted, I toyed with the idea of using D3 or pre-existing data viz libraries, but ended up rolling my own using SVGs directly, it gave me more control on the visualizations.I used the Strava bulk export to pre-populate the database, and I'm using their webhook API to do incremental updates. I have to tap into OpenWeatherMap and OpenCageDate to enrich the running data a little bit.Happy to answer anything about the stack, data pipeline, or how I stayed motivated for 10 years![1] <a href="https://www.runeveryday.com" rel="nofollow">https://www.runeveryday.com</a> Run Streak Association rules: ≥ 1 mile per day

No other tools from this source yet.