Will Smith consuming spaghetti and different bizarre AI benchmarks that took off in 2024

Date:

Share post:

When an organization releases a brand new AI video generator, it’s not lengthy earlier than somebody makes use of it to make a video of actor Will Smith consuming spaghetti.

It’s turn into one thing of a meme in addition to a benchmark: Seeing whether or not a brand new video generator can realistically render Smith slurping down a bowl of noodles. Smith himself parodied the pattern in an Instagram submit in February.

Will Smith and pasta is however certainly one of a number of weird “unofficial” benchmarks to take the AI neighborhood by storm in 2024. A 16-year-old developer constructed an app that provides AI management over Minecraft and checks its capability to design buildings. Elsewhere, a British programmer created a platform the place AI performs video games like Pictionary and Join 4 in opposition to one another.

It’s not like there aren’t extra tutorial checks of an AI’s efficiency. So why did the weirder ones blow up?

Picture Credit:Paul Calcraft

For one, lots of the industry-standard AI benchmarks don’t inform the common individual very a lot. Firms typically cite their AI’s capability to reply questions on Math Olympiad exams, or determine believable options to PhD-level issues. But most individuals — yours really included — use chatbots for issues like responding to emails and fundamental analysis.

Crowdsourced {industry} measures aren’t essentially higher or extra informative.

Take, for instance, Chatbot Area, a public benchmark many AI lovers and builders comply with obsessively. Chatbot Area lets anybody on the internet fee how effectively AI performs on explicit duties, like creating an online app or producing a picture. However raters have a tendency to not be consultant — most come from AI and tech {industry} circles — and forged their votes based mostly on private, hard-to-pin-down preferences.

LMSYS
The Chatbot Area interface.Picture Credit:LMSYS

Ethan Mollick, a professor of administration at Wharton, lately identified in a submit on X one other downside with many AI {industry} benchmarks: they don’t evaluate a system’s efficiency to that of the common individual.

“The fact that there are not 30 different benchmarks from different organizations in medicine, in law, in advice quality, and so on is a real shame, as people are using systems for these things, regardless,” Mollick wrote.

Bizarre AI benchmarks like Join 4, Minecraft, and Will Smith consuming spaghetti are most actually not empirical — and even all that generalizable. Simply because an AI nails the Will Smith take a look at doesn’t imply it’ll generate, say, a burger effectively.

Mcbench
Notice the typo; there’s no such mannequin as Claude 3.6 Sonnet.Picture Credit:Adonis Singh

One professional I spoke to about AI benchmarks prompt that the AI neighborhood deal with the downstream impacts of AI as an alternative of its capability in slim domains. That’s wise. However I’ve a sense that bizarre benchmarks aren’t going away anytime quickly. Not solely are they entertaining — who doesn’t like watching AI construct Minecraft castles? — however they’re simple to grasp. And as my colleague Max Zeff wrote about lately, the {industry} continues to grapple with distilling a know-how as advanced as AI into digestible advertising.

The one query in my thoughts is, which odd new benchmarks will go viral in 2025?

TechCrunch has an AI-focused e-newsletter! Join right here to get it in your inbox each Wednesday.

Related articles

Leica’s SL3-S mirrorless cameras boasts 6K ProRes video and sooner autofocus

Leica has unveiled its newest full-frame mirrorless digital camera, the 24-megapixel SL3-S with improved efficiency and video in...

Enterprises can now run real-time information by way of Google Cloud’s most superior VMs

Be a part of our day by day and weekly newsletters for the newest updates and unique content...

A rising EV startup star snags $100M and Tesla’s win-lose federal funding second

Welcome again to TechCrunch Mobility — your central hub for information and insights on the way forward for...

MoviePass made a movie trailer app for the Oculus Quest and Apple Imaginative and prescient Professional

In the event you're a cinephile who misses the previous Apple TV app for film trailers, MoviePass CEO...