Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@Techmeme@techhub.social
2025-11-18 16:30:55

Google says Gemini 3 Pro scores 1,501 on LMArena, above 2.5 Pro, and demonstrates PhD-level reasoning with top scores on Humanity's Last Exam and GPQA Diamond (Abner Li/9to5Google)
9to5google.com/2025/11/18/gemi

@cosmos4u@scicomm.xyz
2025-11-17 07:46:18

Is #AI really just dumb statistics? "Olympiad-level physics problem-solving presents a significant challenge for both humans and artificial intelligence (AI), as it requires a sophisticated integration of precise calculation, abstract reasoning, and a fundamental grasp of physical principles," says the (abstract of the) paper arxiv.org/abs/2511.10515: "The Chinese Physics Olympiad (CPhO), renowned for its complexity and depth, serves as an ideal and rigorous testbed for these advanced capabilities. In this paper, we introduce LOCA-R (LOgical Chain Augmentation for Reasoning), an improved version of the LOCA framework adapted for complex reasoning, and apply it to the CPhO 2025 theory examination. LOCA-R achieves a near-perfect score of 313 out of 320 points, solidly surpassing the highest-scoring human competitor and significantly outperforming all baseline methods." Oops ...?

@jonippolito@digipres.club
2025-12-09 14:11:46

We've updated the What Uses More app to reflect last week's finding by Luccioni and Gamazaychikov that "reasoning" mode increases energy and water usage by 30x. The study casts doubt on the improved efficiency AI companies are claiming for newer models

A screenshot from the What Uses More app, showing a chart with 30x more energy usage for reasoning models.
@mlawton@mstdn.social
2025-11-04 03:55:14

94.1% accuracy is definitely the exception to the rule for me, but the moves looked clear and obvious. I had wondered about whether patience against the pinned queen was accurate but reasoned it had to be.
Opponent allowing the pin on the queen was their undoing, obviously, but they still played with 82.5% accuracy. In most of my games, I'd be delighted to score that high.
#chess

An animated GIF replay of the game. I have the black pieces and start out in a Caro-Kann defense. 1. e4 c6 2. d4 d5 3. exd5 cxd5 4. Bd3 Nf6 5. Bb5+ Nc6 6. Bxc6+ bxc6 7. Qd2 e6 8. Qc3 a5 9. h4 Bb4 10. a4 Ba6 11. f4 Ne4 12. Rh3 O-O 13. g4 Qb6 14. h5 Qxd4 15. Re3 Nxc3 16. bxc3 Bxc3+ 17. Nxc3 Qxf4 18. Nh3 Qf1+ 19. Kd2 d4 20. Rd3 Bxd3 21. cxd3 dxc3+ 22. Kxc3 Rfd8 23. Nf4 e5 24. Kb3 exf4 25. Rb1 Qxd3+ 26. Ka2 Qc4+ 27. Ka1 Rab8 28. Ba3 Qc3+ 29. Ka2 Rxb1 30. Kxb1 Qxa3 31. Kc2 Rb8 32. Kd1 Rb2 33. g5 Qa1…