Model rankings

How the EU-hosted models compare.

An indicative leaderboard of every model in the Pryvan catalogue, composited across seven capability dimensions. All hosted inside the EU, ranked so you can choose with eyes open.

Overall leaderboard

DeepSeekTop pick

81.4

Qwen

80.1

Mistral

78.7

Llama

76.7

Composite · 0–100

Rankings by capability

General knowledge

MMLU

Broad factual and academic knowledge across domains.

DeepSeek88

Reasoning

GPQA

Multi-step logical and graduate-level problem solving.

DeepSeek59

Coding

HumanEval / LiveCodeBench

Code generation, review and agentic software tasks.

Mistral92

Math

MATH

Arithmetic, algebra and competition-level mathematics.

DeepSeek90

Multilingual

EU-language eval

Quality across European and other languages.

Mistral90

Long context

Long-context recall

Recall and reasoning over very large inputs.

Qwen88

Efficiency

Cost / performance

Quality per unit of compute, cost-effectiveness.

Mistral90

How we score

Scores are 0-100, composited from public benchmarks (MMLU, GPQA, HumanEval, MATH and others) and a qualitative read on EU-language and long-context behaviour. They are indicative, not a guarantee for your data, and they shift as new model versions ship. Use them to shortlist, then test on your own workload.

One arrow. One direction. Forward.

Bring AI into your business, without giving up your data.

Join the waitlist. We're onboarding GDPR-sensitive SMEs across Europe.