Using Benchmarks Measuring

Researchers develop new LiveBench benchmark for measuring AI models’ response accuracy

A group of researchers has developed a new benchmark, dubbed LiveBench, to ease the task of evaluating large language models’ question-answering capabilities. The researchers released the benchmark on ...

TV Tech

WunderKIND Ads Releases First Measurement Benchmarks For Programmatic CTV Pause Ads

As the pioneer for delivering CTV Pause Ads programmatically, WunderKIND Ads works with OpenGlass.TV to programmatically ...

SiliconANGLE

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.

Seeking Alpha

OpenAI introduces new benchmark to measure expert-level scientific reasoning

OpenAI (OPENAI) has introduced a new benchmark, FrontierScience, which is used to measure expert-level scientific reasoning across the fields of biology, chemistry and physics. The new benchmark ...

Business Wire

Simbian Announces Industry’s First Benchmark to Comprehensively Measure LLM Performance in Security Operations Centers

New “AI SOC LLM Leaderboard” Uniquely Measures LLMs in Realistic IT Environment to Give SOC Teams and Vendors Guidance to Pick the Best LLM for Their Organization Simbian's industry-first benchmark ...

Ars Technica

There’s a new benchmark in town for measuring performance on Windows 95 PCs

If you’re still using a computer you bought during the Clinton administration, interesting news: Crystal Dew World, developers of apps like CrystalDiskInfo and CrystalDiskMark, have released an update ...

Business Wire

MLCommons Launches AILuminate, First-of-Its-Kind Benchmark to Measure the Safety of Large Language Models

SAN FRANCISCO--(BUSINESS WIRE)--MLCommons today released AILuminate, a first-of-its-kind safety test for large language models (LLMs). The v1.0 benchmark – which provides a series of safety grades for ...

Business Insider

Holafly and TeleSemana.com launch the Holafly Global eSIM Index 2026, the first comprehensive benchmark measuring eSIM readiness across 50 markets

DUBLIN, May 13, 2026 (GLOBE NEWSWIRE) -- Holafly, the global leader in travel eSIMs, today announced the launch of the Holafly Global eSIM Index 2026—a first-of-its-kind study that evaluates the ...

TechCrunch

Why most AI benchmarks tell us so little

On Tuesday, startup Anthropic released a family of generative AI models that it claims achieve best-in-class performance. Just a few days later, rival Inflection AI unveiled a model that it asserts ...

SD Times

Beyond Benchmarks: Measuring the True Cost of AI-Generated Code

Value stream management involves people in the organization to examine workflows and other processes to ensure they are deriving the maximum value from their efforts while eliminating waste — of ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results