Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@Techmeme@techhub.social
2025-11-18 16:30:55

Google says Gemini 3 Pro scores 1,501 on LMArena, above 2.5 Pro, and demonstrates PhD-level reasoning with top scores on Humanity's Last Exam and GPQA Diamond (Abner Li/9to5Google)
9to5google.com/2025/11/18/gemi

@cosmos4u@scicomm.xyz
2025-11-17 07:46:18

Is #AI really just dumb statistics? "Olympiad-level physics problem-solving presents a significant challenge for both humans and artificial intelligence (AI), as it requires a sophisticated integration of precise calculation, abstract reasoning, and a fundamental grasp of physical principles," says the (abstract of the) paper arxiv.org/abs/2511.10515: "The Chinese Physics Olympiad (CPhO), renowned for its complexity and depth, serves as an ideal and rigorous testbed for these advanced capabilities. In this paper, we introduce LOCA-R (LOgical Chain Augmentation for Reasoning), an improved version of the LOCA framework adapted for complex reasoning, and apply it to the CPhO 2025 theory examination. LOCA-R achieves a near-perfect score of 313 out of 320 points, solidly surpassing the highest-scoring human competitor and significantly outperforming all baseline methods." Oops ...?

@arXiv_csCL_bot@mastoxiv.page
2025-10-03 10:45:51

What MLLMs Learn about When they Learn about Multimodal Reasoning: Perception, Reasoning, or their Integration?
Jiwan Chung, Neel Joshi, Pratyusha Sharma, Youngjae Yu, Vibhav Vineet
arxiv.org/abs/2510.01719

@arXiv_csAI_bot@mastoxiv.page
2025-10-09 09:56:51

Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces
Minju Gwak, Guijin Son, Jaehyung Kim
arxiv.org/abs/2510.06953

@jonippolito@digipres.club
2025-12-09 14:11:46

We've updated the What Uses More app to reflect last week's finding by Luccioni and Gamazaychikov that "reasoning" mode increases energy and water usage by 30x. The study casts doubt on the improved efficiency AI companies are claiming for newer models

A screenshot from the What Uses More app, showing a chart with 30x more energy usage for reasoning models.
@arXiv_csIR_bot@mastoxiv.page
2025-10-14 10:33:38

Comparative Explanations via Counterfactual Reasoning in Recommendations
Yi Yu, Zhenxing Hu
arxiv.org/abs/2510.10920 arxiv.org/pdf/2510.109…

@arXiv_statML_bot@mastoxiv.page
2025-10-07 10:51:32

Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning
Marcel Wien\"obst, Leonard Henckel, Sebastian Weichwald
arxiv.org/abs/2510.04970

@arXiv_csCV_bot@mastoxiv.page
2025-10-03 10:05:31

MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs
Jiyao Liu, Jinjie Wei, Wanying Qu, Chenglong Ma, Junzhi Ning, Yunheng Li, Ying Chen, Xinzhe Luo, Pengcheng Chen, Xin Gao, Ming Hu, Huihui Xu, Xin Wang, Shujian Gao, Dingkang Yang, Zhongying Deng, Jin Ye, Lihao Liu, Junjun He, Ningsheng Xu
arxiv…

@mlawton@mstdn.social
2025-11-04 03:55:14

94.1% accuracy is definitely the exception to the rule for me, but the moves looked clear and obvious. I had wondered about whether patience against the pinned queen was accurate but reasoned it had to be.
Opponent allowing the pin on the queen was their undoing, obviously, but they still played with 82.5% accuracy. In most of my games, I'd be delighted to score that high.
#chess

An animated GIF replay of the game. I have the black pieces and start out in a Caro-Kann defense. 1. e4 c6 2. d4 d5 3. exd5 cxd5 4. Bd3 Nf6 5. Bb5+ Nc6 6. Bxc6+ bxc6 7. Qd2 e6 8. Qc3 a5 9. h4 Bb4 10. a4 Ba6 11. f4 Ne4 12. Rh3 O-O 13. g4 Qb6 14. h5 Qxd4 15. Re3 Nxc3 16. bxc3 Bxc3+ 17. Nxc3 Qxf4 18. Nh3 Qf1+ 19. Kd2 d4 20. Rd3 Bxd3 21. cxd3 dxc3+ 22. Kxc3 Rfd8 23. Nf4 e5 24. Kb3 exf4 25. Rb1 Qxd3+ 26. Ka2 Qc4+ 27. Ka1 Rab8 28. Ba3 Qc3+ 29. Ka2 Rxb1 30. Kxb1 Qxa3 31. Kc2 Rb8 32. Kd1 Rb2 33. g5 Qa1…
@seeingwithsound@mas.to
2025-09-30 18:36:36

The illusion of readiness: Stress testing large frontier models on multimodal medical benchmarks #AI