logo
AI Responsibility

HIRE Leaderboard

The HIRE Benchmark measures how accurately AI models evaluate job candidates. Scores range from 0 to 1, with higher being better. This authoritative ranking helps organizations choose the most effective AI models for candidate evaluation.
#ModelScore
1GPT-4.5 Preview0.86
2Gemini 2.5 FlashNew0.86
3Gemini 2.5 Pro0.84
4Sonnet 3.7 Thinking0.82
5GPT 4.1 Mini0.82
6GPT 4.10.81
7Gemini 2.0 Flash0.80
8Qwen 3 32BNew0.80
9Sonnet 3.70.79
10Sonnet 3.50.78
11Llama 4 Maverick0.77
12GPT-4o Mini0.75
13Llama 4 Scout0.73
14GPT-4o0.71
15Llama 3.3 70b0.69
16GPT 4.1 Nano0.59

Performance Visualization