Human Creativity
Benchmark
1.5M+ verified creative experts shape the tools of tomorrow. Defined by Contra Labs, the frontier human data and evaluation lab for creative AI.
Best Performing Models
Hierarchical Bradley-Terry leaderboard with partial pooling across the image studies. Higher Elo means stronger aggregate head-to-head performance; the whisker on each bar is the 95% confidence interval.
- 1 GPT Image 2 1110
- 2 Recraft V4.1 Pro 1062
- 3 Ideogram v4 1018
- 4 Gemini 3.1 Flash Image Preview 1012
- 5 Seedream 5.0 Lite 1011
- 6 Gemini 3 Pro Image Preview 1009
- 7 FLUX.2 [pro] 979
- 8 Krea 2 Large 964
- 9 FLUX.2 [max] 954
- 10 Midjourney V8.1 950
- 11 Grok Imagine 1.0 944
- 1 Sora v2.1 1088
- 2 Gen-3 Alpha 1042
- 3 Luma Dream Machine 2.0 1019
- 4 Kling 1.5 Pro 1006
- 5 Veo 3.1 987
- 1 Claude Design v2 1130
- 2 Antigravity Studio 1070
- 3 v0 by Vercel 1040
- 4 GPT Web Builder 4.5 1000
Benchmark methodology
Each model output is scored by 3+ professional creative evaluators from Contra's network based on the following categories:
Visual quality & aesthetics
Perceptual quality, composition, color balance, absence of artifacts.
Scale: 1 (poor) → 5 (exceptional)Prompt adherence & accuracy
Fidelity to the prompt's requested subject, action, or style.
Scale: 1 (not aligned) → 5 (perfect alignment)Originality & creativity
Novelty of concept, non-derivative style, imaginative value.
Scale: 1 (generic/derivative) → 5 (highly original)Utility & applied fit
Usability in a real creative context (brand, design, storytelling) and production readiness.
Scale: 1 (unusable) → 5 (production-ready)Motion realism (video only)
Smoothness, physics consistency, natural movement, fluid transitions.
Scale: 1 (broken/jittery) → 5 (lifelike & fluid)Expert opinions help shape smarter tools
Real-world creative professionals on Contra are earning 26x more per project than on other online marketplaces. Take a look at an actual head-to-head vote:
"Close-up of a salmon burger with spring onion, arugula, and homemade dill mayonnaise on a golden brioche bun; crispy edges, glistening mayonnaise; soft candlelight, warm neutral tones, highly detailed food photography style."
Frequently asked
questions
What is the Human Creativity Benchmark?
The Human Creativity Benchmark is the new standard set by the results of Creative Arena voting, where Contra’s network of vetted creative professionals evaluates AI-generated outputs. It reflects how real experts judge creativity, style, and brand fit across text, visuals, audio, and user flows.
Who gets to vote and what do we evaluate?
Participation is powered by a commission-free network of vetted creative experts—including designers, writers, marketers, and other professionals with real-world experience. Only verified creative professionals on Contra can participate. You will review AI outputs across multiple formats and provide feedback on originality, brand alignment, and overall quality.
How does my vote shape the future of AI?
Your input helps shape Creative Human Data to train and refine generative models so they better reflect human taste and creative standards. This results in better tools for creatives to raise the floor for everyone. Stay current and shape smarter tools.









