AI Responsibility
HIRE Leaderboard
The HIRE Benchmark measures how accurately AI models evaluate job candidates. Scores range from 0 to 1, with higher being better. This authoritative ranking helps organizations choose the most effective AI models for candidate evaluation.
# | Model | Score |
---|---|---|
1 | GPT-4.5 Preview | 0.86 |
2 | Gemini 2.5 FlashNew | 0.86 |
3 | Gemini 2.5 Pro | 0.84 |
4 | Sonnet 3.7 Thinking | 0.82 |
5 | GPT 4.1 Mini | 0.82 |
6 | GPT 4.1 | 0.81 |
7 | Gemini 2.0 Flash | 0.80 |
8 | Qwen 3 32BNew | 0.80 |
9 | Sonnet 3.7 | 0.79 |
10 | Sonnet 3.5 | 0.78 |
11 | Llama 4 Maverick | 0.77 |
12 | GPT-4o Mini | 0.75 |
13 | Llama 4 Scout | 0.73 |
14 | GPT-4o | 0.71 |
15 | Llama 3.3 70b | 0.69 |
16 | GPT 4.1 Nano | 0.59 |