Show HN: BloomSearch – Keyword search with hierarchical bloom filters

Show HN (score: 8)
Found: July 13, 2025
ID: 343

Description

Other
Show HN: BloomSearch – Keyword search with hierarchical bloom filters Hey HN! I got nerd-sniped by Bloom Filters this weekend, specifically for searching datasets with high "cardinality" (number of unique items).

They're an _amazing_ data structure that, at a fixed size, tracks potential set membership. That means unlike normal b-tree indexes, they don't grow with the number of unique items in the dataset.

This makes them great for "needle in a haystack" search (logs, document) as implementations like VictoriaMetrics and Bing's BitFunnel show. I've used them in the past, but they've never been center-stage in my projects.

I wanted high cardinality keyword search for ANOTHER project... and, well, down the yak-shaving rabbit hole we go!

BloomSearch brings this into an extensible Go package:

- Very memory efficient via bloom filters and streaming row scans

- DataStore and MetaStore interfaces for any backend (can be same or separate)

- Hierarchical pruning via partitions, minmax indexes, and of course bloom filters

- Search by field, token, or field:token with complex combinators

- Disaggregated storage and compute for unbound ingest and query throughput

And of course, you know I had to make a custom file format ^-^ (FILE_FORMAT.MD)

BloomSearch is optimized for massive concurrency, arbitrary cardinality and dataset size, and super low memory usage. There's still a lot on the table too in terms of size and performance optimizations, but I'm already super pleased with it. With distributed query processing I'm targeting >100B rows/s over large datasets.

I'm also excited to replace our big logging bill ~$0.003/GB for log storage with infinite retention and guilt-free querying :P

More from Show

Show HN: ccrider - Search and Resume Your Claude Code Sessions – TUI / MCP / CLI

Show HN: ccrider - Search and Resume Your Claude Code Sessions – TUI / MCP / CLI I built a tool that stores your full Claude Code history to let you easily find and resume sessions. It has TUI, CLI and MCP interfaces. It&#x27;s a single Go binary, and the session history is synced to SQLite each time you use it.<p>Default mode is the TUI with a session browser and full-text search. Once a session is selected you can browse and search within it, resume it or export to markdown.<p>The MCP server provides tools to let Claude search back through the session for pre-compact context or pull from prior sessions. I use this constantly.<p>I&#x27;ve seen elaborate continuity systems to give Claude Code access to history but this simple approach has been very effective.<p>Installation:<p>macOS: brew install neilberkman&#x2F;tap&#x2F;ccrider<p>Linux&#x2F;other: git clone <a href="https:&#x2F;&#x2F;github.com&#x2F;neilberkman&#x2F;ccrider" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;neilberkman&#x2F;ccrider</a> &amp;&amp; cd ccrider &amp;&amp; go build<p>MCP server: claude mcp add --scope user ccrider $(which ccrider) serve-mcp<p>Source: <a href="https:&#x2F;&#x2F;github.com&#x2F;neilberkman&#x2F;ccrider" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;neilberkman&#x2F;ccrider</a>

Show HN: OSS sustain guard – Sustainability signals for OSS dependencies

Show HN: OSS sustain guard – Sustainability signals for OSS dependencies Hi HN, I made OSS Sustain Guard.<p>After every high-profile OSS incident, I wonder about the packages I rely on right now. I can skim issues&#x2F;PRs and activity on GitHub, but that doesn’t scale when you have tens or hundreds of dependencies. I built this to surface sustainability signals (maintainer redundancy, activity trends, funding links, etc.) and create awareness. It’s meant to start a respectful conversation, not to judge projects. These are signals, not truth; everything is inferred from public data (internal mirrors&#x2F;private work won’t show up).<p>Quick start: pip install oss-sustain-guard export GITHUB_TOKEN=... os4g check<p>It uses GitHub GraphQL with local caching (no telemetry; token not uploaded&#x2F;stored), and supports multiple ecosystems (Python&#x2F;JS&#x2F;Rust&#x2F;Go&#x2F;Java&#x2F;etc.).<p>Repo: <a href="https:&#x2F;&#x2F;github.com&#x2F;onukura&#x2F;oss-sustain-guard" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;onukura&#x2F;oss-sustain-guard</a><p>I’d love feedback on metric choices&#x2F;thresholds and wording that stays respectful. If you have examples where these signals break down, please share.

Show HN: Open database of link metadata for large-scale analysis

Show HN: Open database of link metadata for large-scale analysis

Show HN: TinyPDF – 3KB PDF library (70x smaller than jsPDF)

Show HN: TinyPDF – 3KB PDF library (70x smaller than jsPDF) I needed to generate invoices in a Node.js app. jsPDF is 229KB. I only needed text, rectangles, lines, and JPEG images.<p><pre><code> So I wrote tinypdf: &lt;400 lines of TypeScript, zero dependencies, 3.3KB minified+gzipped. What it does: - Text (Helvetica, colors, alignment) - Rectangles and lines - JPEG images - Multiple pages, custom sizes What it doesn&#x27;t do: - Custom fonts, PNG&#x2F;SVG, forms, encryption, HTML-to-PDF That&#x27;s it. The 95% use case for invoices, receipts, reports, tickets, and labels. GitHub: https:&#x2F;&#x2F;github.com&#x2F;Lulzx&#x2F;tinypdf npm: npm install tinypdf</code></pre>

No other tools from this source yet.