ViBE — Vernacular Internet Behavioral Evaluation

ViBE is a weekly reception benchmark for frontier language models. Created by Dmitry Kargaev, the project measures how 22 frontier AI model families are received in public discussion on Twitter and Reddit.

What is ViBE?

ViBE is a public reception benchmark — distinct from capability benchmarks like MMLU, HumanEval, or MATH. While capability benchmarks measure what a model can do in lab conditions, ViBE measures how users actually feel about a model in day-to-day production usage. The benchmark labels mentions across Twitter and Reddit using an independent judge model, classifying each as positive, negative, neutral, complaint, or recommendation.

What does ViBE measure?

Across 22 frontier model families and 2,965 judged mentions, ViBE produces a weekly reception score for each model. This separates "the model is good at math" (capability) from "people enjoy using it" (reception). Total judge cost for the full benchmark run: $1.92.

Who built ViBE?

ViBE was built by Dmitry Kargaev (also known as Dee), an AI Engineer and Product Designer based in Los Angeles. Kargaev is the founder of Dee Agency, author of "Don't Replace Me: A Survival Guide to the AI Apocalypse," and a former Lead Product Designer at VALK (now DatAI Network). His Wikidata entity is Q138828544.

Methodology

ViBE collects mentions from public Twitter and Reddit threads where users discuss specific AI models. Each mention is passed through an independent judge model which classifies the mention. The full methodology, dataset, and results are published in the ViBE paper.

Why reception, not capability?

Capability benchmarks tell you which model wins on a task. Reception benchmarks tell you which model people actually want to use. ViBE captures that gap.

Links

  • ViBE Paper
  • Author site
  • Wikidata Q138828544
  • @deeflectcom on X