Show HN: BloomSearch – Keyword search with hierarchical bloom filters
Show HN (score: 8)Description
They're an _amazing_ data structure that, at a fixed size, tracks potential set membership. That means unlike normal b-tree indexes, they don't grow with the number of unique items in the dataset.
This makes them great for "needle in a haystack" search (logs, document) as implementations like VictoriaMetrics and Bing's BitFunnel show. I've used them in the past, but they've never been center-stage in my projects.
I wanted high cardinality keyword search for ANOTHER project... and, well, down the yak-shaving rabbit hole we go!
BloomSearch brings this into an extensible Go package:
- Very memory efficient via bloom filters and streaming row scans
- DataStore and MetaStore interfaces for any backend (can be same or separate)
- Hierarchical pruning via partitions, minmax indexes, and of course bloom filters
- Search by field, token, or field:token with complex combinators
- Disaggregated storage and compute for unbound ingest and query throughput
And of course, you know I had to make a custom file format ^-^ (FILE_FORMAT.MD)
BloomSearch is optimized for massive concurrency, arbitrary cardinality and dataset size, and super low memory usage. There's still a lot on the table too in terms of size and performance optimizations, but I'm already super pleased with it. With distributed query processing I'm targeting >100B rows/s over large datasets.
I'm also excited to replace our big logging bill ~$0.003/GB for log storage with infinite retention and guilt-free querying :P
More from Show
Show HN: ccrider - Search and Resume Your Claude Code Sessions – TUI / MCP / CLI
Show HN: ccrider - Search and Resume Your Claude Code Sessions – TUI / MCP / CLI I built a tool that stores your full Claude Code history to let you easily find and resume sessions. It has TUI, CLI and MCP interfaces. It's a single Go binary, and the session history is synced to SQLite each time you use it.<p>Default mode is the TUI with a session browser and full-text search. Once a session is selected you can browse and search within it, resume it or export to markdown.<p>The MCP server provides tools to let Claude search back through the session for pre-compact context or pull from prior sessions. I use this constantly.<p>I've seen elaborate continuity systems to give Claude Code access to history but this simple approach has been very effective.<p>Installation:<p>macOS: brew install neilberkman/tap/ccrider<p>Linux/other: git clone <a href="https://github.com/neilberkman/ccrider" rel="nofollow">https://github.com/neilberkman/ccrider</a> && cd ccrider && go build<p>MCP server: claude mcp add --scope user ccrider $(which ccrider) serve-mcp<p>Source: <a href="https://github.com/neilberkman/ccrider" rel="nofollow">https://github.com/neilberkman/ccrider</a>
Show HN: OSS sustain guard – Sustainability signals for OSS dependencies
Show HN: OSS sustain guard – Sustainability signals for OSS dependencies Hi HN, I made OSS Sustain Guard.<p>After every high-profile OSS incident, I wonder about the packages I rely on right now. I can skim issues/PRs and activity on GitHub, but that doesn’t scale when you have tens or hundreds of dependencies. I built this to surface sustainability signals (maintainer redundancy, activity trends, funding links, etc.) and create awareness. It’s meant to start a respectful conversation, not to judge projects. These are signals, not truth; everything is inferred from public data (internal mirrors/private work won’t show up).<p>Quick start: pip install oss-sustain-guard export GITHUB_TOKEN=... os4g check<p>It uses GitHub GraphQL with local caching (no telemetry; token not uploaded/stored), and supports multiple ecosystems (Python/JS/Rust/Go/Java/etc.).<p>Repo: <a href="https://github.com/onukura/oss-sustain-guard" rel="nofollow">https://github.com/onukura/oss-sustain-guard</a><p>I’d love feedback on metric choices/thresholds and wording that stays respectful. If you have examples where these signals break down, please share.
Show HN: Open database of link metadata for large-scale analysis
Show HN: Open database of link metadata for large-scale analysis
Show HN: TinyPDF – 3KB PDF library (70x smaller than jsPDF)
Show HN: TinyPDF – 3KB PDF library (70x smaller than jsPDF) I needed to generate invoices in a Node.js app. jsPDF is 229KB. I only needed text, rectangles, lines, and JPEG images.<p><pre><code> So I wrote tinypdf: <400 lines of TypeScript, zero dependencies, 3.3KB minified+gzipped. What it does: - Text (Helvetica, colors, alignment) - Rectangles and lines - JPEG images - Multiple pages, custom sizes What it doesn't do: - Custom fonts, PNG/SVG, forms, encryption, HTML-to-PDF That's it. The 95% use case for invoices, receipts, reports, tickets, and labels. GitHub: https://github.com/Lulzx/tinypdf npm: npm install tinypdf</code></pre>
No other tools from this source yet.