laude-institute/terminal-bench

GitHub Trending
Found: August 20, 2025
ID: 962

Description

Other
A benchmark for LLMs on complicated tasks in the terminal

More from GitHub

HKUDS/RAG-Anything

"RAG-Anything: All-in-One RAG Framework"

zilliztech/claude-context

Code search MCP for Claude Code. Make entire codebase the context for any coding agent.

deepseek-ai/DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Tracer-Cloud/opensre

Build your own AI SRE agents. The open source toolkit for the AI era ✨

No other tools from this source yet.