Creative intelligence.

Human Creativity
Benchmark

1.5M+ verified creative experts shape the tools of tomorrow. Defined by Contra Labs, the frontier human data and evaluation lab for creative AI.

View the research For AI labs

06/17/2026 FIELD NOTE: Introducing Design Crit — we taught AI to judge design like a designer 06/03/2026 BATTLE: Ideogram v4 won 47.9% of typography matchups 06/01/2026 PROFILE: Gemini reliably edits, but can it keep the rest of the image still? 05/29/2026 BATTLE: Cursor took 60% of head-to-heads. Claude Code took 63% of client meetings 06/17/2026 FIELD NOTE: Introducing Design Crit — we taught AI to judge design like a designer 06/03/2026 BATTLE: Ideogram v4 won 47.9% of typography matchups 06/01/2026 PROFILE: Gemini reliably edits, but can it keep the rest of the image still? 05/29/2026 BATTLE: Cursor took 60% of head-to-heads. Claude Code took 63% of client meetings

CREATIVE ARENA

Best Performing Models

Hierarchical Bradley-Terry leaderboard with partial pooling across the image studies. Higher Elo means stronger aggregate head-to-head performance; the whisker on each bar is the 95% confidence interval.

1 GPT Image 2 1110
2 Recraft V4.1 Pro 1062
3 Ideogram v4 1018
4 Gemini 3.1 Flash Image Preview 1012
5 Seedream 5.0 Lite 1011
6 Gemini 3 Pro Image Preview 1009
7 FLUX.2 [pro] 979
8 Krea 2 Large 964
9 FLUX.2 [max] 954
10 Midjourney V8.1 950
11 Grok Imagine 1.0 944

1 Sora v2.1 1088
2 Gen-3 Alpha 1042
3 Luma Dream Machine 2.0 1019
4 Kling 1.5 Pro 1006
5 Veo 3.1 987

1 Claude Design v2 1130
2 Antigravity Studio 1070
3 v0 by Vercel 1040
4 GPT Web Builder 4.5 1000

METHODS & STANDARDS

Benchmark methodology

Each model output is scored by 3+ professional creative evaluators from Contra's network based on the following categories:

Visual quality & aesthetics

Perceptual quality, composition, color balance, absence of artifacts.

Scale: 1 (poor) → 5 (exceptional)

Prompt adherence & accuracy

Fidelity to the prompt's requested subject, action, or style.

Scale: 1 (not aligned) → 5 (perfect alignment)

Originality & creativity

Novelty of concept, non-derivative style, imaginative value.

Scale: 1 (generic/derivative) → 5 (highly original)

Utility & applied fit

Usability in a real creative context (brand, design, storytelling) and production readiness.

Scale: 1 (unusable) → 5 (production-ready)

Motion realism (video only)

Smoothness, physics consistency, natural movement, fluid transitions.

Scale: 1 (broken/jittery) → 5 (lifelike & fluid)

CREATIVE HUMAN DATA

Work with the industry’s top creative minds

Shape the future of creative AI with real human taste, all backed by the commission-free network trusted by top creatives. Powered by Contra's network of 1.5M+ creative experts.

Designers Writers Marketers Engineers Social Media Experts Video Editors & Animators Music & Audio Engineers

Request partnership

1.5M+ creative experts

400+ Skills and tools represented

$250M+ verified expert earnings

CREATIVE ARENA

Expert opinions help shape smarter tools

Real-world creative professionals on Contra are earning 26x more per project than on other online marketplaces. Take a look at an actual head-to-head vote:

Winner

Runner Up

Evaluation Prompt

"Close-up of a salmon burger with spring onion, arugula, and homemade dill mayonnaise on a golden brioche bun; crispy edges, glistening mayonnaise; soft candlelight, warm neutral tones, highly detailed food photography style."

HELP & SUPPORT

Frequently asked
questions

What is the Human Creativity Benchmark?

The Human Creativity Benchmark is the new standard set by the results of Creative Arena voting, where Contra’s network of vetted creative professionals evaluates AI-generated outputs. It reflects how real experts judge creativity, style, and brand fit across text, visuals, audio, and user flows.

Who gets to vote and what do we evaluate?

Participation is powered by a commission-free network of vetted creative experts—including designers, writers, marketers, and other professionals with real-world experience. Only verified creative professionals on Contra can participate. You will review AI outputs across multiple formats and provide feedback on originality, brand alignment, and overall quality.

How does my vote shape the future of AI?

Your input helps shape Creative Human Data to train and refine generative models so they better reflect human taste and creative standards. This results in better tools for creatives to raise the floor for everyone. Stay current and shape smarter tools.

PARTNERSHIP

The creative layer powering next-gen AI

Shape the future of creative AI with real human taste, all backed by the commission-free network trusted by top creatives.

Request partnership →

Human CreativityBenchmark