Answer Matching Outperforms Multiple Choice for Language Model Evaluation
Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping
https://arxiv.org/abs/2507.02856
OntoRAG: Enhancing Question-Answering through Automated Ontology Derivation from Unstructured Knowledge Bases
Yash Tiwari, Owais Ahmad Lone, Mayukha Pal
https://arxiv.org/abs/2506.00664
Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering
Linhao Ye, Lang Yu, Zhikai Lei, Qin Chen, Jie Zhou, Liang He
https://arxiv.org/abs/2506.00491
This https://arxiv.org/abs/2505.19028 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCV_…
Self-ensemble: Mitigating Confidence Distortion for Large Language Models
Zicheng Xu, Guanchu Wang, Guangyao Zheng, Yu-Neng Chuang, Alexander Szalay, Xia Hu, Vladimir Braverman
https://arxiv.org/abs/2506.01951
CoP: Agentic Red-teaming for Large Language Models using Composition of Principles
Chen Xiong, Pin-Yu Chen, Tsung-Yi Ho
https://arxiv.org/abs/2506.00781 ht…
iQUEST: An Iterative Question-Guided Framework for Knowledge Base Question Answering
Shuai Wang, Yinan Yu
https://arxiv.org/abs/2506.01784 https://
This https://arxiv.org/abs/2505.18492 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model
Wencheng Zhang, Shiqin Qiao, Lingjie Luo, Yinfeng Li, Chuanyang Zheng, Qian Xu, Meng Li, Yong Gui, Yijun He, Jianing Qiu, Jindong Hong, Jiankai Sun
https://arxiv.org/abs/2507.02822
Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs
Chenjun Xu, Bingbing Wen, Bin Han, Robert Wolfe, Lucy Lu Wang, Bill Howe
https://arxiv.org/abs/2506.00582