Show HN: Tokenflood – simulate arbitrary loads on instruction-tuned LLMs
Hacker News (score: 18)Description
https://github.com/twerkmeister/tokenflood
=== What is it and what problems does it solve? ===
Tokenflood is a load testing tool for instruction-tuned LLMs hat can simulate arbitrary LLM loads in terms of prompt, prefix, and output lengths and requests per second. Instead of first collecting prompt data for different load types, you can configure the desired parameters for your load test and you are good to go. It also let's you assess the latency effects of potential prompt parameter changes before spending the time and effort to implement them.
I believe it's really useful for developing latency sensitive LLM applications and * load testing self-hosted LLM model setups * Assessing the latency benefit of changes to prompt parameters before implementing those changes * Assessing latency and intraday variation of latency on hosted LLM services before sending your traffic there
=== Why did I built it? ===
Over the course of the past year, part of my work has been helping my clients to meet their latency, throughput and cost targets for LLMs (PTUs, anyone? ). That process involved making numerous choices about cloud providers, hardware, inference software, models, configurations and prompt changes. During that time I found myself doing similar tests over and over with a collection of adhoc scripts. I finally had some time on my hands and wanted to properly put it together in one tool.
=== What am I looking for? ===
I am sharing this for three reasons: Hoping this can make other's work for latency-sensitive LLM applications simpler, learning and improving from feedback, and finding new projects to work on.
So please check it out on github (https://github.com/twerkmeister/tokenflood), comment, and reach out at thomas@werkmeister.me or on linkedin(https://www.linkedin.com/in/twerkmeister/) for professional inquiries.
=== Pics ===
image of cli interface: https://github.com/twerkmeister/tokenflood/blob/main/images/...
result image: https://github.com/twerkmeister/tokenflood/blob/main/images/...
More from Hacker
Show HN: Linnix – eBPF observability that predicts failures before they happen
Show HN: Linnix – eBPF observability that predicts failures before they happen I kept missing incidents until it was too late. By the time my monitoring alerted me, servers/nodes were already unrecoverable.<p>So I built Linnix. It watches your Linux systems at the kernel level using eBPF and tries to catch problems before they cascade into outages.<p>The idea is simple: instead of alerting you after your server runs out of memory, it notices when memory allocation patterns look weird and tells you "hey, this looks bad."<p>It uses a local LLM to spot patterns. Not trying to build AGI here - just pattern matching on process behavior. Turns out LLMs are actually pretty good at this.<p>Example: it flagged higher memory consumption over a short period and alerted me before it was too late. Turned out to be a memory leak that would've killed the process.<p>Quick start if you want to try it:<p><pre><code> docker pull ghcr.io/linnix-os/cognitod:latest docker-compose up -d </code></pre> Setup takes about 5 minutes. Everything runs locally - your data doesn't leave your machine.<p>The main difference from tools like Prometheus: most monitoring parses /proc files. This uses eBPF to get data directly from the kernel. More accurate, way less overhead.<p>Built it in Rust using the Aya framework. No libbpf, no C - pure Rust all the way down. Makes the kernel interactions less scary.<p>Current state: - Works on any Linux 5.8+ with BTF - Monitors Docker/Kubernetes containers - Exports to Prometheus - Apache 2.0 license<p>Still rough around the edges. Actively working on it.<p>Would love to know: - What kinds of failures do you wish you could catch earlier? - Does this seem useful for your setup?<p>GitHub: <a href="https://github.com/linnix-os/linnix" rel="nofollow">https://github.com/linnix-os/linnix</a><p>Happy to answer questions about how it works.
Show HN: Turn your OpenAPI spec into negative tests
Show HN: Turn your OpenAPI spec into negative tests
Targetting specific characters with CSS rules
Targetting specific characters with CSS rules
Show HN: Spotilyrics – See synchronized Spotify lyrics inside VS Code
Show HN: Spotilyrics – See synchronized Spotify lyrics inside VS Code
No other tools from this source yet.