RadReason: Radiology Report Evaluation Metric with Reasons and Sub-Scores
Yingshu Li, Yunyi Liu, Lingqiao Liu, Lei Wang, Luping Zhou
https://arxiv.org/abs/2508.15464 https://
Can LLMs Judge Debates? Evaluating Non-Linear Reasoning via Argumentation Theory Semantics
Reza Sanayei, Srdjan Vesic, Eduardo Blanco, Mihai Surdeanu
https://arxiv.org/abs/2509.15739
Google says Gemini 3 Pro scores 1,501 on LMArena, above 2.5 Pro, and demonstrates PhD-level reasoning with top scores on Humanity's Last Exam and GPQA Diamond (Abner Li/9to5Google)
https://9to5google.com/2025/11/18/gemini-3-launch/
Is #AI really just dumb statistics? "Olympiad-level physics problem-solving presents a significant challenge for both humans and artificial intelligence (AI), as it requires a sophisticated integration of precise calculation, abstract reasoning, and a fundamental grasp of physical principles," says the (abstract of the) paper https://arxiv.org/abs/2511.10515: "The Chinese Physics Olympiad (CPhO), renowned for its complexity and depth, serves as an ideal and rigorous testbed for these advanced capabilities. In this paper, we introduce LOCA-R (LOgical Chain Augmentation for Reasoning), an improved version of the LOCA framework adapted for complex reasoning, and apply it to the CPhO 2025 theory examination. LOCA-R achieves a near-perfect score of 313 out of 320 points, solidly surpassing the highest-scoring human competitor and significantly outperforming all baseline methods." Oops ...?
Re-FRAME the Meeting Summarization SCOPE: Fact-Based Summarization and Personalization via Questions
Frederic Kirstein, Sonu Kumar, Terry Ruas, Bela Gipp
https://arxiv.org/abs/2509.15901
Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces
Minju Gwak, Guijin Son, Jaehyung Kim
https://arxiv.org/abs/2510.06953 https://
Comparative Explanations via Counterfactual Reasoning in Recommendations
Yi Yu, Zhenxing Hu
https://arxiv.org/abs/2510.10920 https://arxiv.org/pdf/2510.109…
Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning
Marcel Wien\"obst, Leonard Henckel, Sebastian Weichwald
https://arxiv.org/abs/2510.04970 https…
Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency
Colin Hong, Xu Guo, Anand Chaanan Singh, Esha Choukse, Dmitrii Ustiugov
https://arxiv.org/abs/2509.13990 htt…
What MLLMs Learn about When they Learn about Multimodal Reasoning: Perception, Reasoning, or their Integration?
Jiwan Chung, Neel Joshi, Pratyusha Sharma, Youngjae Yu, Vibhav Vineet
https://arxiv.org/abs/2510.01719