Show HN: Sourcebot – Self-hosted Perplexity for your codebase
Hacker News (score: 23)Description
We’re Brendan and Michael, the creators of Sourcebot (https://www.sourcebot.dev/), a self-hosted code understanding tool for large codebases. We originally launched on HN 9 months ago with code search (https://news.ycombinator.com/item?id=41711032), and we’re excited to share our newest feature: Ask Sourcebot.
Ask Sourcebot is an agentic search tool that lets you ask complex questions about your entire codebase in natural language, and returns a structured response with inline citations back to your code. Some types of questions you might ask:
- “How does authentication work in this codebase? What library is being used? What providers can a user log in with?” (https://demo.sourcebot.dev/~/chat/cmdpjkrbw000bnn7s8of2dm11)
- “When should I use channels vs. mutexes in go? Find real usages of both and include them in your answer” (https://demo.sourcebot.dev/~/chat/cmdpiuqhu000bpg7s9hprio4w)
- “How are shards laid out in memory in the Zoekt code search engine?” (https://demo.sourcebot.dev/~/chat/cmdm9nkck000bod7sqy7c1efb)
- "How do I call C from Rust?" (https://demo.sourcebot.dev/~/chat/cmdpjy06g000pnn7ssf4nk60k)
You can try it yourself here on our demo site (https://demo.sourcebot.dev/~) or checkout our demo video (https://youtu.be/olc2lyUeB-Q).
How is this any different from existing tools like Cursor or Claude code?
- Sourcebot solely focuses on code understanding. We believe that, more than ever, the main bottleneck development teams face is not writing code, it’s acquiring the necessary context to make quality changes that are cohesive within the wider codebase. This is true regardless if the author is a human or an LLM.
- As opposed to being in your IDE or terminal, Sourcebot is a web app. This allows us to play to the strengths of the web: rich UX and ubiquitous access. We put a ton of work into taking the best parts of IDEs (code navigation, file explorer, syntax highlighting) and packaging them with a custom UX (rich Markdown rendering, inline citations, @ mentions) that is easily shareable between team members.
- Sourcebot can maintain an up-to date index of thousands of repos hosted on GitHub, GitLab, Bitbucket, Gerrit, and other hosts. This allows you to ask questions about repositories without checking them out locally. This is especially helpful when ramping up on unfamiliar parts of the codebase or working with systems that are typically spread across multiple repositories, e.g., micro services.
- You can BYOK (Bring Your Own API Key) to any supported reasoning model. We currently support 11 different model providers (like Amazon Bedrock and Google Vertex), and plan to add more.
- Sourcebot is self-hosted, fair source, and free to use.
Under the hood, we expose our existing regular expression search, code navigation, and file reading APIs to a LLM as tool calls. We instruct the LLM via a system prompt to gather the necessary context via these tools to sufficiently answer the users question, and then to provide a concise, structured response. This includes inline citations, which are just structured data that the LLM can embed into it’s response and can then be identified on the client and rendered appropriately. We built this on some amazing libraries like the Vercel AI SDK v5, CodeMirror, react-markdown, and Slate.js, among others.
This architecture is intentionally simple. We decided not to introduce any additional techniques like vector embeddings, multi-agent graphs, etc. since we wanted to push the limits of what we could do with what we had on hand. We plan on revisiting our approach as we get user feedback on what works (and what doesn’t).
We are really excited about pushing the envelope of code understanding. Give it a try: https://github.com/sourcebot-dev/sourcebot. Cheers!
More from Hacker
Show HN: Feather – a fresh Tcl reimplementation (WASM, Go)
Show HN: Feather – a fresh Tcl reimplementation (WASM, Go) Hey HN!<p>First time showing something here, but I've been furiously working over the holidays on Feather, a from scratch reimplementation of TCL designed for embedding in modern applications.<p>It's starting out as a faithful reimplementation of TCL <i>without</i> I/O, OOP features, or coroutines.<p>TCL has a special place in my heart because the syntax is so elegant for interactive use, and defining domain specific languages.<p>My motiviation is twofold: faster feedback loops for AI, and moldable software for users.<p>It turns out giving AI agents access to the runtime state of your program makes for really fast feedback loops, but embedding existing options in a world where shipping binaries for each platform is commonplace is tricky.<p>Embedding the real TCL is tricky because it comes with its own event loop (in 2025 you alreay have one), a GUI framework (you have a web framework already, or develop on mobile), and has access to the filesystem (don't forget to delete all commands with file system access!).<p>Feather just doesn't ship with those - expose only what you need from your application.<p>A WASM build comes out of the box and clocks in at ~120kb plus 70kb for connecting it to the browser or node.js.<p>And if embedding becomes easy, you can put a REPL everywhere: in mobile apps, in desktop software, as a control plane into web servers.<p>I want to imagine a world where all software is scriptable just like Emacs and nvim, with agents doing the actual work.
Comptime – C# meta-programming with compile-time code generation and evaluation
Comptime – C# meta-programming with compile-time code generation and evaluation
VS Code deactivates IntelliCode in favor of the paid Copilot
VS Code deactivates IntelliCode in favor of the paid Copilot
Fate: A modern data client for React and tRPC
Fate: A modern data client for React and tRPC
No other tools from this source yet.