Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:26:31

Cin\'{e}aste: A Fine-grained Contextual Movie Question Answering Benchmark
Nisarg A. Shah, Amir Ziai, Chaitanya Ekanadham, Vishal M. Patel
arxiv.org/abs/2509.14227

@arXiv_csCL_bot@mastoxiv.page
2025-09-19 10:32:21

A Multi-To-One Interview Paradigm for Efficient MLLM Evaluation
Ye Shen, Junying Wang, Farong Wen, Yijin Guo, Qi Jia, Zicheng Zhang, Guangtao Zhai
arxiv.org/abs/2509.14886

@arXiv_csRO_bot@mastoxiv.page
2025-09-16 11:52:57

ParaEQsA: Parallel and Asynchronous Embodied Questions Scheduling and Answering
Haisheng Wang, Weiming Zhi
arxiv.org/abs/2509.11663 arxiv.o…

@arXiv_quantph_bot@mastoxiv.page
2025-09-18 09:54:11

Rare Event Simulation of Quantum Error-Correcting Circuits
Carolyn Mayer, Anand Ganti, Uzoma Onunkwo, Tzvetan Metodi, Benjamin Anker, Jacek Skryzalin
arxiv.org/abs/2509.13678

@unchartedworlds@scicomm.xyz
2025-09-14 09:09:54
Content warning: LLM training frameworks, interesting

Interesting explanation of LLM training frameworks and the incentives for confident guessing.
"The authors examined ten major AI benchmarks, including those used by Google, OpenAI and also the top leaderboards that rank AI models. This revealed that nine benchmarks use binary grading systems that award zero points for AIs expressing uncertainty.
" ... When an AI system says “I don’t know”, it receives the same score as giving completely wrong information. The optimal strategy under such evaluation becomes clear: always guess. ...
"More sophisticated approaches like active learning, where AI systems ask clarifying questions to reduce uncertainty, can improve accuracy but further multiply computational requirements. ...
"Users want systems that provide confident answers to any question. Evaluation benchmarks reward systems that guess rather than express uncertainty. Computational costs favour fast, overconfident responses over slow, uncertain ones."
=
My comment: "Fast, overconfident responses" sounds a bit similar to "bullshit", does it not?
#ChatGPT #LLMs #SoCalledAI

@arXiv_csCL_bot@mastoxiv.page
2025-09-17 10:11:00

HistoryBankQA: Multilingual Temporal Question Answering on Historical Events
Biswadip Mandal, Anant Khandelwal, Manish Gupta
arxiv.org/abs/2509.12720

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:50:41

VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage
A. Alfarano (University of Zurich, Max Planck Society), L. Venturoli (University of Zurich, Max Planck Society), D. Negueruela del Castillo (University of Zurich, Max Planck Society)
arxiv.org/abs/2510.12750

@arXiv_csCL_bot@mastoxiv.page
2025-09-16 12:10:57

HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems
Spandan Anaokar, Shrey Ganatra, Harshvivek Kashid, Swapnil Bhattacharyya, Shruti Nair, Reshma Sekhar, Siddharth Manohar, Rahul Hemrajani, Pushpak Bhattacharyya
arxiv.org/abs/2509.11619

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:31:50

ReTraceQA: Evaluating Reasoning Traces of Small Language Models in Commonsense Question Answering
Francesco Maria Molfese, Luca Moroni, Ciro Porcaro, Simone Conia, Roberto Navigli
arxiv.org/abs/2510.09351

@arXiv_csCL_bot@mastoxiv.page
2025-09-12 09:41:09

Agentic LLMs for Question Answering over Tabular Data
Rishit Tyagi, Mohit Gupta, Rahul Bouri
arxiv.org/abs/2509.09234 arxiv.org/pdf/2509.09…