New World Benchmark - Search News

HackerRank Introduces New Benchmark to Assess Advanced AI Models

Industry Leader Known for Software Development Skills Expertise Introduces Real-World Benchmark of AI Software Development Capabilities CUPERTINO, Calif., Feb. 11, 2025 (GLOBE NEWSWIRE) -- HackerRank, ...

MIT Technology Review

These new AI benchmarks could help make models less biased

They could offer a more nuanced way to measure AI’s bias and its understanding of the world. New AI benchmarks could help developers reduce bias in AI models, potentially making them fairer and less ...

Nasdaq

CrowdStrike and Meta Deliver New Benchmarks for the Evaluation of AI Performance in Cybersecurity

New benchmarks define how LLMs should be tested in the SOC – measuring real threats, workflows, and outcomes to help defenders Cyber defenders face an overwhelming challenge from the influx of ...

VentureBeat

Google DeepMind researchers introduce new benchmark to improve LLM factuality, reduce hallucinations

Hallucinations, or factually inaccurate responses, continue to plague large language models (LLMs). Models falter particularly when they are given more complex tasks and when users are looking for ...

Forbes

Beyond The Llama Drama: 4 New Benchmarks For Large Language Models

Forbes contributors publish independent expert analyses and insights. AI researcher working with the UN and others to drive social change. Apr 13, 2025, 07:56pm EDT The April 2025 drama around Llama's ...

Seeking Alpha

OpenAI introduces new benchmark to measure expert-level scientific reasoning

OpenAI (OPENAI) has introduced a new benchmark, FrontierScience, which is used to measure expert-level scientific reasoning across the fields of biology, chemistry and physics. The new benchmark ...

InfoWorld

Why benchmarks are key to AI progress

Researchers are racing to develop more challenging, interpretable, and fair assessments of AI models that reflect real-world use cases. The stakes are high. Benchmarks are often reduced to leaderboard ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results