–
Models Tested
–
Fastest tok/s
–
Top Score /100
3
Size Categories
Rankings
Scores are normalized within each size tier — a Small model's #1 rank is against other Small models only
Show:
HumanEval · code
GSM8K · reasoning
MMLU · knowledge
Translation
Large
50B+ parameters
Medium
10–50B parameters
Small
≤10B parameters
Large
50B+ parameters
Medium
10–50B parameters
Small
≤10B parameters
What We Test
Industry-standard benchmarks — no invented metrics
HumanEval
Code quality
Functions are executed against real test cases. Syntax and edge-case logic must pass.
GSM8K
Reasoning
Math word problems and multi-step logic. Verified correct answers only, no partial credit.
MMLU
Knowledge & instructions
Instruction following, format compliance, and multi-domain knowledge checks.
Translation
Multilingual
English ↔ Russian, English ↔ Spanish. Scored via script detection and vocabulary matching.
Throughput
Speed
Tokens/second averaged over short, medium, and long prompts. Ranked per size tier.
Daily Updates
Automated
GitHub Actions runs every 24 hours. Results reflect the current state of each model.
Lexentia Proof