Benchmark Model - Search News

OpenAI Life Science Benchmark Reveals AI Passes Only 1 in 3 Scientific Research Tasks

AI life science benchmark LifeSciBench, published June 17 by OpenAI with 173 PhD scientists, shows frontier models clear only ...

VentureBeat

LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A team of Abacus.AI, New York University, ...

The Next Web

Fable 5 was beating GPT 5.5 on every major benchmark. Then the US government pulled it offline.

Anthropic's Fable 5 led every major AI benchmark over OpenAI's GPT 5.5 before a US export control directive forced it offline three days after launch.

SiliconANGLE

OpenAI details o3 reasoning model with record-breaking benchmark scores

OpenAI today detailed o3, its new flagship large language model for reasoning tasks. The model’s introduction caps off a 12-day product announcement series that started with the launch of a new ...

1mon

Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark

Microsoft's new vulnerability-scanning system, codenamed MDASH, scored 88.45% on the CyberGym benchmark, surpassing single-model systems from Anthropic and OpenAI by using more than 100 specialized AI ...

VentureBeat

Microsoft’s GRIN-MoE AI model takes on coding and math, beating competitors in key benchmarks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has unveiled a groundbreaking artificial intelligence model, ...

TechCrunch

Meta’s vanilla Maverick AI model ranks below rivals on a popular chat benchmark

Earlier this week, Meta landed in hot water for using an experimental, unreleased version of its Llama 4 Maverick model to achieve a high score on a crowdsourced benchmark, LM Arena. The incident ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results