Tag

Benchmarking

All articles tagged with #benchmarking

MacBook Air M5 Surpasses M4 in Geekbench by Up to 15%
technology1 month ago

MacBook Air M5 Surpasses M4 in Geekbench by Up to 15%

Geekbench 6 results show the MacBook Air with the 10-core M5 chip scoring 17,073 in multi-core, about 15% faster than the M4 Air (14,731); the gain aligns with Apple’s claims and places the M5 Air ahead of the older M3 Pro MacBook Pro by up to 16%, though it remains slower than some M4 Pro/newer Pro models. The M5 MacBook Air is available to pre-order now and launches March 11.

The World’s Toughest AI Exam Tests Reasoning, Not AGI Yet
technology1 month ago

The World’s Toughest AI Exam Tests Reasoning, Not AGI Yet

A new benchmark called Humanity’s Last Exam aims to measure how close today’s AI models come to human-level knowledge by presenting 2,500 carefully vetted, PhD-level questions across 100+ subjects. Launched in 2025, it has been attempted by top models like GPT-4o, Google Gemini The top score reported so far is 48.4% (Gemini 3 Deep Think), far below typical human expert performance (~90%). The test prioritizes precise, non-searchable knowledge and verifiable answers, filtering out questions AI could answer via web search. While a high score would indicate expert-level capability in specific domains, researchers say it does not by itself signal AGI or autonomous, general intelligence.

technology2 months ago

Panther Lake Benchmark: Windows 11 vs Ubuntu 26.04 on MSI Laptop Show Parity

Phoronix compares Windows 11 Home and Ubuntu 26.04 on an MSI Prestige 14 Flip Panther Lake laptop (Intel Core Ultra X7 358H with Arc B390). Using the same balanced power profile, the test found near-identical performance between Windows and Linux, though MSI’s Linux power limits were initially conservative relative to Intel’s guidance; Linux remains competitive on Panther Lake with the latest kernel and Mesa drivers across CPU and graphics benchmarks.

Windows 11 vs Windows 10: Gaming Performance Showdown
technology8 months ago

Windows 11 vs Windows 10: Gaming Performance Showdown

A comparison of gaming performance between Windows 10 and Windows 11 on a high-end gaming PC shows that performance is largely similar, with minor variations in minimum FPS in some games. The article suggests that upgrading to Windows 11 won't significantly impact gaming performance and discusses options for users who wish to delay upgrading beyond the end-of-life support date for Windows 10 in October 2025.

Anthropic revokes OpenAI's access to Claude over unauthorized tool usage
technology8 months ago

Anthropic revokes OpenAI's access to Claude over unauthorized tool usage

Anthropic revoked OpenAI's access to its Claude large language models after discovering that OpenAI was using the models to benchmark and develop its own competing AI, violating the terms of service. While OpenAI can still perform safety evaluations, its ability to use Anthropic's tools for development has been cut off, highlighting tensions in AI model sharing and competition.

The Rise and Challenges of Agentic AI in Enterprises
technology9 months ago

The Rise and Challenges of Agentic AI in Enterprises

AI agents currently perform poorly in office tasks, with success rates around 30-35%, and many marketed as 'agentic AI' are not truly autonomous. Studies by CMU and Salesforce highlight significant limitations and failures, with Gartner predicting most agentic AI projects will be canceled by 2027 due to high costs and unclear value, though adoption is expected to grow by 2028.

"Samsung Galaxy S24: Exynos vs Snapdragon, Free Offers, Upgrade Advice, and User Review"
technology2 years ago

"Samsung Galaxy S24: Exynos vs Snapdragon, Free Offers, Upgrade Advice, and User Review"

Samsung's Exynos 2400 chipset in the Galaxy S24 competes well against last year's Snapdragon 8 Gen 2 in CPU performance but lags behind in GPU tests due to thermal throttling. The Exynos 2400 shows promise for future gaming with ray tracing capabilities, but overall, the Snapdragon 8 Gen 3 in the Galaxy S24 Ultra outperforms it. Customers seeking peak performance should consider the S24 Ultra, especially for gaming.

technology2 years ago

"Nvidia RTX 4080 Super: Initial Linux Benchmarks and Founders Edition Review"

A comprehensive comparison of AMD Radeon RX 7000 series and NVIDIA GeForce RTX 40 series performance under Linux has been conducted, including the first look at the GeForce RTX 4070 series and RTX 4080 SUPER performance. The article provides details on the specifications and performance of the newly received NVIDIA graphics cards for Linux benchmarking, such as the GeForce RTX 4070, RTX 4070 SUPER, RTX 4070 Ti SUPER, and RTX 4080 SUPER.

technology2 years ago

"GeForce RTX 4070 Super: Benchmarks, Reviews, and Sales Analysis"

Preliminary benchmarks comparing the upcoming Nvidia GeForce RTX 4070 Ti Super with its counterparts, including the RTX 4080 and AMD's Radeon RX 7900 series, suggest competitive performance dynamics. In OpenCL benchmarks, the RTX 4070 Ti Super trails the RTX 4080 by 5% but surpasses it by 7% in Vulkan benchmarks. Compared to the previous generation, the RTX 4070 Ti Super shows potential performance increases of 10-15% over the RTX 4070 Ti and a 5-10% lag behind the RTX 4080. Nvidia's strategy with the RTX 4070 Ti Super seems aimed at competing with AMD's RX 7900 series, prompting AMD to introduce promotional pricing for some of its Radeon RX 7900 series models. While real-world performance benchmarks are pending, the RTX 4070 Ti Super is expected to narrow the gap with the RX 7900 XT and may reach parity due to its enhanced capabilities.

Amazon's AI Advancements: Human Benchmarking, Model Choice, and a Leap Forward in 2024
technology2 years ago

Amazon's AI Advancements: Human Benchmarking, Model Choice, and a Leap Forward in 2024

Amazon is introducing Model Evaluation on Bedrock, a preview feature that allows users to test and evaluate AI models. The platform includes automated evaluation and human evaluation components, enabling developers to assess model performance on metrics like accuracy and toxicity. Users can choose to work with an AWS human evaluation team or their own, and can bring their own data into the benchmarking platform. The goal is to provide companies with a way to measure the impact of AI models on their projects and guide development decisions. AWS will only charge for model inference used during the evaluation.

Google Pixel 8 Series Faces Benchmark Controversy and Camera Challenges
technology2 years ago

Google Pixel 8 Series Faces Benchmark Controversy and Camera Challenges

Google has blocked the installation of benchmarking tools like GeekBench and 3DMark on its new Pixel 8 Series smartphones, indicating a focus on AI-driven efficiency rather than raw performance. The Tensor G3 chip in the Pixel 8 Series features a unique 9-core CPU architecture and a 10-core GPU with ray-tracing acceleration capabilities. Despite Google's restrictions, users have found workarounds to run benchmarking tools, revealing performance gaps compared to competitors like the Qualcomm Snapdragon 8 Gen2 chip. Google's approach challenges the importance of benchmark scores in evaluating smartphone quality, emphasizing the value of AI capabilities.