Show HN: Tokenflood – simulate arbitrary loads on instruction-tuned LLMs
Hacker News (score: 18)Description
https://github.com/twerkmeister/tokenflood
=== What is it and what problems does it solve? ===
Tokenflood is a load testing tool for instruction-tuned LLMs hat can simulate arbitrary LLM loads in terms of prompt, prefix, and output lengths and requests per second. Instead of first collecting prompt data for different load types, you can configure the desired parameters for your load test and you are good to go. It also let's you assess the latency effects of potential prompt parameter changes before spending the time and effort to implement them.
I believe it's really useful for developing latency sensitive LLM applications and * load testing self-hosted LLM model setups * Assessing the latency benefit of changes to prompt parameters before implementing those changes * Assessing latency and intraday variation of latency on hosted LLM services before sending your traffic there
=== Why did I built it? ===
Over the course of the past year, part of my work has been helping my clients to meet their latency, throughput and cost targets for LLMs (PTUs, anyone? ). That process involved making numerous choices about cloud providers, hardware, inference software, models, configurations and prompt changes. During that time I found myself doing similar tests over and over with a collection of adhoc scripts. I finally had some time on my hands and wanted to properly put it together in one tool.
=== What am I looking for? ===
I am sharing this for three reasons: Hoping this can make other's work for latency-sensitive LLM applications simpler, learning and improving from feedback, and finding new projects to work on.
So please check it out on github (https://github.com/twerkmeister/tokenflood), comment, and reach out at thomas@werkmeister.me or on linkedin(https://www.linkedin.com/in/twerkmeister/) for professional inquiries.
=== Pics ===
image of cli interface: https://github.com/twerkmeister/tokenflood/blob/main/images/...
result image: https://github.com/twerkmeister/tokenflood/blob/main/images/...
More from Hacker
Show HN: Tusk Drift – Turn production traffic into API tests
Show HN: Tusk Drift – Turn production traffic into API tests Hi HN! In the past few months my team and I have been working on Tusk Drift, a system that records real API traffic from your service, then replays those requests as deterministic tests. Outbound I/O (databases, HTTP calls, etc.) gets automatically mocked using the recorded data.<p>Problem we're trying to solve: Writing API tests is tedious, and hand-written mocks drift from reality. We wanted tests that stay realistic because they come from real traffic.<p>versus mocking libraries: Tools like VCR/Nock intercept HTTP within your tests. Tusk Drift records full request/response traces externally (HTTP, DB, Redis, etc.) and replays them against your running service, no test code or fixtures to write/maintain.<p>How it works:<p>1. Add a lightweight SDK (we currently support Python and Node.js)<p>2. Record traffic in any environment.<p>3. Run `tusk run`, the CLI sandboxes your service and serves mocks via Unix socket<p>We run this in CI on every PR. Also been using it as a test harness for AI coding agents, they can make changes, run `tusk run`, and get immediate feedback without needing live dependencies.<p>Source: <a href="https://github.com/Use-Tusk/tusk-drift-cli" rel="nofollow">https://github.com/Use-Tusk/tusk-drift-cli</a><p>Demo: <a href="https://github.com/Use-Tusk/drift-node-demo" rel="nofollow">https://github.com/Use-Tusk/drift-node-demo</a><p>Happy to answer questions!
Show HN: Yolobox – Run AI coding agents with full sudo without nuking home dir
Show HN: Yolobox – Run AI coding agents with full sudo without nuking home dir
Sandbox: Run untrusted AI code safely, fast
Sandbox: Run untrusted AI code safely, fast
Show HN: Bithoven – A high-level, imperative language for Bitcoin Smart Contract
Show HN: Bithoven – A high-level, imperative language for Bitcoin Smart Contract Hey HN! I’m a researcher working on Bitcoin smart contracts, and today I’m releasing Bithoven—a high-level imperative language that compiles to native Bitcoin Script (Legacy, SegWit, and Taproot).<p>The Goal:<p>Raw Bitcoin Script is notoriously difficult to reason about. Writing raw Bitcoin Script today feels like writing Assembly in the 1970s. You have to mentally juggle the stack (`OP_SWAP`, `OP_ROT`), manually manage distinct execution branches, and pray you didn't leave a stack item unconsumed (which crashes the script). My goal was to bridge the gap between complex contract logic and raw opcodes, allowing developers to write readable, compile-time-safe code.<p>Key Features:<p>- Imperative Syntax: Write logic using familiar if/else and return statements instead of mental stack juggling.<p>- Type Safety: First-class support for bool, signature, string, and number types to prevent runtime errors.<p>- Targeted Compilation: Support for Legacy, SegWit, and Taproot compilation targets.<p>- Native Primitives: Built-in keywords for timelocks (older, after) and cryptography (sha256, checksig).<p>You can try it in the browser here (runs via WASM): <a href="https://bithoven-lang.github.io/bithoven/ide/" rel="nofollow">https://bithoven-lang.github.io/bithoven/ide/</a><p>Here is an example of a Hashed Time-Locked Contract (HTLC):<p><pre><code> (condition: bool, sig_alice: signature) (condition: bool, preimage: string, sig_bob: signature) { if condition { // Relative locktime (Sequence) older 1000; return checksig (sig_alice, alice_pk); } else { // Hashlock verification verify sha256 sha256 preimage == hash; return checksig (sig_bob, bob_pk); } } </code></pre> The project is free open source and the academic paper is currently under review. I’d love to hear any feedback. Thanks for checking it out!
No other tools from this source yet.