Launch HN: LlamaFarm (YC W22) – Open-source framework for distributed AI
Hacker News (score: 11)Description
The problem: We were building AI tools and kept falling into the same trap. AI demos die before production. We built a bunch of AI demos but they were impossible to get to production. It would work perfectly on our laptop, but when we deployed it, something broke, and RAG would degrade. If we were running our own model, it would quickly become out of date. The proof-of-concept that impressed the team couldn't handle real-world data.
Our solution: declarative AI-as-code. One YAML defines models, policies, data, evals, and deploy. Instead of one brittle giant, we orchestrate a Mixture of Experts—many small, specialized models you continuously fine-tune from real usage. With RAG for source-grounded answers, systems get cheaper, faster, and auditable.
There’s a short demo here: https://www.youtube.com/watch?v=W7MHGyN0MdQ and a more in-depth one at https://www.youtube.com/watch?v=HNnZ4iaOSJ4.
Ultimately, we want to deliver a single, signed bundle—models + retrieval + database + API + tests—that runs anywhere: cloud, edge, or air-gapped. No glue scripts. No surprise egress bills. Your data stays in your runtime.
We believe that the AI industry is evolving like computing did. Just as we went from mainframes to distributed systems and monolithic apps to microservices, AI is following the same path: models are getting smaller and better. Mixture of Experts is here to stay. Qwen3 is sick. Llama 3.2 runs on phones. Phi-3 fits on edge devices. Domain models beat GPT-5 on specific tasks.
RAG brings specialized data to your model: You don't need a 1T parameter model that "knows everything." You need a smart model that can read your data. Fine-tuning is democratizing: what cost $100k last year now costs $500. Every company will have custom models.
Data gravity is real: Your data wants to stay where it is: on-prem, in your AWS account, on employee laptops.
Bottom line: LlamaFarm turns AI from experiments into repeatable, secure releases, so teams can ship fast.
What we have working today: Full RAG pipeline: 15+ document formats, programmatic extraction (no LLM calls needed), vector-database embedding, universal model layer that runs the same code for 25+ providers, automatic failover, cost-based routing; Truly portable: Identical behavior from laptop → datacenter → cloud; Real deployment: Docker Compose works now with Kubernetes basics and cloud templates on the way.
Check out our readme/quickstart for easy install instructions: https://github.com/llama-farm/llamafarm?tab=readme-ov-file#-...
Or just grab a binary for your platform directly from the latest release: https://github.com/llama-farm/llamafarm/releases/latest
The vision is to be able to run, update, and continuously fine-tune dozens of models across environments with built-in RAG and evaluations, all wrapped in a self-healing runtime. We have an MVP of that today (with a lot more to do!).
We’d love to hear your feedback! Think we’re way off? Spot on? Want us to build something for your specific use case? We’re here for all your comments!
More from Hacker
Poking holes into bytecode with peephole optimisations
Poking holes into bytecode with peephole optimisations
Bare metal programming with RISC-V guide (2023)
Bare metal programming with RISC-V guide (2023)
Fly's Sprites.dev addresses dev environment sandboxes and API sandboxes together
Fly's Sprites.dev addresses dev environment sandboxes and API sandboxes together
Show HN: FP-pack – Functional pipelines in TypeScript without monads
Show HN: FP-pack – Functional pipelines in TypeScript without monads Hi HN,<p>I built fp-pack, a small TypeScript functional utility library focused on pipe-first composition.<p>The goal is to keep pipelines simple and readable, while still supporting early exits and side effects — without introducing monads like Option or Either.<p>Most code uses plain pipe/pipeAsync. For the few cases that need early termination, fp-pack provides a SideEffect-based pipeline that short-circuits safely.<p>I also wrote an “AI agent skills” document to help LLMs generate consistent fp-pack-style code.<p>Feedback, criticism, or questions are very welcome.
No other tools from this source yet.