Research Online Results

Pages List

List view

Marketing AI Performance Leaderboard

Introduction to Leaderboard

Overall Leaderboard June 2025

Copywriting Results

Data Analysis Results

Research Online Results

Strategic Planning Results

All charts and visualisations are created using Lava Metrics.

Early access to our Beta? 👉 Sign up here

Marketing AI Performance Leaderboard - June 2025 Results

Research Online Results

🔗 See the full prompts used for the Research Online tests!

Research Online Scores by LLM

🔗 See full results dashboard

What are the overall results?

🏆 Research Online Winner: Gemini: 2.5-Flash-Preview

❌ Research Online Loser: Qwen: qwen-max

Individual Test Winners and Losers

Losers by Category ❌

🔗 See full results dashboard

Winners by Category 🏆

🔗 See full results dashboard

Test Ranking (Best → Worst)

Ranking reflects the average performance per 'research online' test of all LLMs ordered highest to lowest.

Buyer Person Developement

Industry Overview Report

Content Gap Analysis

Competitor Analysis

Market Opportunities and Threats

FAQs

What does this Leaderboard represent?

We have designed tests that simulate a marketer’s interaction with native platform UIs (e.g., ChatGPT, Gemini) across several marketing domains:

Copywriting: Generating ad copy, email subject lines, and social media posts.

Internal Data Analysis: Interpreting sample CRM data to identify trends and insights.

Strategic Planning: Creating marketing plans based on given scenarios.

Online Research: Gathering information from the web to support marketing decisions.

🔗 Read more about the methodology here

How were the tests scored?

Each test output is evaluated by specialised AI “judges.”

Judges are themselves AI agents configured with specific evaluation criteria.

They parse the Test Answer, compare it against expected outcomes or benchmarks, and score on multiple dimensions (e.g., factual correctness, tone, format).

Final scores are normalized and aggregated to produce a single value per test.

🔗 Read more about the methodology here

Where can I see the full results?

https://benchmark.lavametrics.app/superset/dashboard/p/pRdMQdKrY32/

Made with Bullet