Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs

Hacker News (score: 324)

Found: March 10, 2026

ID: 3715

Description

Other

Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1. As of 2026, the top 4 models on that leaderboard are still descendants.

The weird finding: single-layer duplication does nothing. Too few layers, nothing. Too many, it gets worse. Only circuit-sized blocks of ~7 layers work. This suggests pretraining carves out discrete functional circuits in the layer stack that only work when preserved whole.

The whole thing was developed on 2x RTX 4090s in my basement. I'm now running current models (GLM-4.7, Qwen3.5, MiniMax M2.5) on a dual GH200 rig (see my other post). Code and new models coming soon.

Happy to answer questions.

More from Hacker

No other tools from this source yet.

Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs

Description

More from Hacker

DevTools Assistant