Using Benchmarks Measuring

Researchers develop new LiveBench benchmark for measuring AI models’ response accuracy

A group of researchers has developed a new benchmark, dubbed LiveBench, to ease the task of evaluating large language models’ question-answering capabilities. The researchers released the benchmark on ...

4don MSN

Returns? Many enterprises lack benchmarks for measuring success of AI: Wedbush

As enterprises actively pursue the deployment of artificial intelligence tools, many of these businesses have not created ...

SiliconANGLE

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.

VentureBeat

LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A team of Abacus.AI, New York University, ...

HotHardware

Geekbench AI Cross-Platform Benchmark Preview: Measuring AI Throughput

The Geekbench suite of system benchmarks have their limitations, but they present a reasonable impression of overall performance for a wide variety of productivity, content creation, and ...

TechCrunch

Why most AI benchmarks tell us so little

On Tuesday, startup Anthropic released a family of generative AI models that it claims achieve best-in-class performance. Just a few days later, rival Inflection AI unveiled a model that it asserts ...

MIT Technology Review

The way we measure progress in AI is terrible

Many of the most popular benchmarks for AI models are outdated or poorly designed. Every time a new AI model is released, it’s typically touted as acing its performance against a series of benchmarks.

Ars Technica

There’s a new benchmark in town for measuring performance on Windows 95 PCs

If you’re still using a computer you bought during the Clinton administration, interesting news: Crystal Dew World, developers of apps like CrystalDiskInfo and CrystalDiskMark, have released an update ...

SD Times

Beyond Benchmarks: Measuring the True Cost of AI-Generated Code

Value stream management involves people in the organization to examine workflows and other processes to ensure they are deriving the maximum value from their efforts while eliminating waste — of ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results