ModelLens

🤖

–

Models Tested

⚡

–

Fastest tok/s

🎯

–

Top Score /100

📐

Size Categories

Rankings

Scores are normalized within each size tier — a Small model's #1 rank is against other Small models only

HumanEval · code GSM8K · reasoning MMLU · knowledge Translation

Large 50B+ parameters

Medium 10–50B parameters

Small ≤10B parameters

Large 50B+ parameters

Medium 10–50B parameters

Small ≤10B parameters

What We Test

Industry-standard benchmarks — no invented metrics

💻

HumanEval

Code quality

Functions are executed against real test cases. Syntax and edge-case logic must pass.

🧮

GSM8K

Reasoning

Math word problems and multi-step logic. Verified correct answers only, no partial credit.

📚

MMLU

Knowledge & instructions

Instruction following, format compliance, and multi-domain knowledge checks.

🌐

Translation

Multilingual

English ↔ Russian, English ↔ Spanish. Scored via script detection and vocabulary matching.

⚡

Throughput

Speed

Tokens/second averaged over short, medium, and long prompts. Ranked per size tier.

🔄

Daily Updates

Automated

GitHub Actions runs every 24 hours. Results reflect the current state of each model.

Latest AI News

View all →

Rankings

What We Test

Latest AI News

Model Comparison