Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@Techmeme@techhub.social
2025-11-18 16:30:55

Google says Gemini 3 Pro scores 1,501 on LMArena, above 2.5 Pro, and demonstrates PhD-level reasoning with top scores on Humanity's Last Exam and GPQA Diamond (Abner Li/9to5Google)
9to5google.com/2025/11/18/gemi

@cosmos4u@scicomm.xyz
2025-11-17 07:46:18

Is #AI really just dumb statistics? "Olympiad-level physics problem-solving presents a significant challenge for both humans and artificial intelligence (AI), as it requires a sophisticated integration of precise calculation, abstract reasoning, and a fundamental grasp of physical principles," says the (abstract of the) paper arxiv.org/abs/2511.10515: "The Chinese Physics Olympiad (CPhO), renowned for its complexity and depth, serves as an ideal and rigorous testbed for these advanced capabilities. In this paper, we introduce LOCA-R (LOgical Chain Augmentation for Reasoning), an improved version of the LOCA framework adapted for complex reasoning, and apply it to the CPhO 2025 theory examination. LOCA-R achieves a near-perfect score of 313 out of 320 points, solidly surpassing the highest-scoring human competitor and significantly outperforming all baseline methods." Oops ...?

@jonippolito@digipres.club
2025-12-09 14:11:46

We've updated the What Uses More app to reflect last week's finding by Luccioni and Gamazaychikov that "reasoning" mode increases energy and water usage by 30x. The study casts doubt on the improved efficiency AI companies are claiming for newer models

A screenshot from the What Uses More app, showing a chart with 30x more energy usage for reasoning models.