Training data generation for context-dependent rubric-based short answer grading
Pavel \v{S}indel\'a\v{r}, D\'avid Slivka, Christopher Bouma, Filip Pr\'a\v{s}il, Ond\v{r}ej Bojar
https://arxiv.org/abs/2603.28537 https://arxiv.org/pdf/2603.28537 https://arxiv.org/html/2603.28537
arXiv:2603.28537v1 Announce Type: new
Abstract: Every 4 years, the PISA test is administered by the OECD to test the knowledge of teenage students worldwide and allow for comparisons of educational systems. However, having to avoid language differences and annotator bias makes the grading of student answers challenging. For these reasons, it would be interesting to compare methods of automatic student answer grading. To train some of these methods, which require machine learning, or to compute parameters or select hyperparameters for those that do not, a large amount of domain-specific data is needed. In this work, we explore a small number of methods for creating a large-scale training dataset using only a relatively small confidential dataset as a reference, leveraging a set of very simple derived text formats to preserve confidentiality. Using these methods, we successfully created three surrogate datasets that are, at the very least, superficially more similar to the reference dataset than purely the result of prompt-based generation. Early experiments suggest one of these approaches might also lead to improved model training.
toXiv_bot_toot
Meine Motorik ist so im Eimer, oder auch: Hört mich, wie ich meinen Computer anschreie "Höre ich mal auf, mich zu verradieren?!"
Noch ein paar der zuletzt hier besonders häufig geteilten #News:
IT-Angriff betrifft IT der Beweisstückstelle der Polizei
An LLM is a machine for giving confident plausible answers to whatever question is thrown at it.
The methods it uses to do this mean that (for questions that are not too challenging) there is a good chance that the answer is correct as well as confident.
Confident is guaranteed, correct is not. Is that what you need?
Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[1/5]:
- Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
Ru Wang, Wei Huang, Selena Song, Haoyu Zhang, Qian Niu, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo
https://arxiv.org/abs/2502.18273 https://mastoxiv.page/@arXiv_csCL_bot/114069031700102129
- Benchmarking NLP-supported Language Sample Analysis for Swiss Children's Speech
Anja Ryser, Yingqiang Gao, Sarah Ebling
https://arxiv.org/abs/2504.00780 https://mastoxiv.page/@arXiv_csCL_bot/114267149909002069
- Cultural Biases of Large Language Models and Humans in Historical Interpretation
Fabio Celli, Georgios Spathulas
https://arxiv.org/abs/2504.02572 https://mastoxiv.page/@arXiv_csCL_bot/114278467094094490
- BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Jiageng Wu, et al.
https://arxiv.org/abs/2504.19467 https://mastoxiv.page/@arXiv_csCL_bot/114420036189999973
- Understanding the Anchoring Effect of LLM with Synthetic Data: Existence, Mechanism, and Potentia...
Yiming Huang, Biquan Bie, Zuqiu Na, Weilin Ruan, Songxin Lei, Yutao Yue, Xinlei He
https://arxiv.org/abs/2505.15392 https://mastoxiv.page/@arXiv_csCL_bot/114550277171100272
- Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods
Raza, Qureshi, Farooq, Lotif, Chadha, Pandya, Emmanouilidis
https://arxiv.org/abs/2505.17870 https://mastoxiv.page/@arXiv_csCL_bot/114572956853819813
- LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
Fu, Jiang, Hong, Li, Guo, Yang, Chen, Zhang
https://arxiv.org/abs/2506.14493 https://mastoxiv.page/@arXiv_csCL_bot/114703502552989170
- GHTM: A Graph-based Hybrid Topic Modeling Approach with a Benchmark Dataset for the Low-Resource ...
Farhana Haque, Md. Abdur Rahman, Sumon Ahmed
https://arxiv.org/abs/2508.00605 https://mastoxiv.page/@arXiv_csCL_bot/114969875643478303
- Link Prediction for Event Logs in the Process Industry
Anastasia Zhukova, Thomas Walton, Christian E. Lobm\"uller, Bela Gipp
https://arxiv.org/abs/2508.09096 https://mastoxiv.page/@arXiv_csCL_bot/115020938764936882
- AirQA: A Comprehensive QA Dataset for AI Research with Instance-Level Evaluation
Huang, Cao, Zhang, Kang, Wang, Wang, Luo, Zheng, Qian, Chen, Yu
https://arxiv.org/abs/2509.16952 https://mastoxiv.page/@arXiv_csCL_bot/115253526588472475
- Multi-View Attention Multiple-Instance Learning Enhanced by LLM Reasoning for Cognitive Distortio...
Jun Seo Kim, Hyemi Kim, Woo Joo Oh, Hongjin Cho, Hochul Lee, Hye Hyeon Kim
https://arxiv.org/abs/2509.17292 https://mastoxiv.page/@arXiv_csCL_bot/115253586227941157
- Dual-Space Smoothness for Robust and Balanced LLM Unlearning
Han Yan, Zheyuan Liu, Meng Jiang
https://arxiv.org/abs/2509.23362 https://mastoxiv.page/@arXiv_csCL_bot/115293308293558024
- The Rise of AfricaNLP: Contributions, Contributors, Community Impact, and Bibliometric Analysis
Tadesse Destaw Belay, et al.
https://arxiv.org/abs/2509.25477 https://mastoxiv.page/@arXiv_csCL_bot/115298213432594791
- Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Reco...
Srivastav, Zheng, Bezzam, Le Bihan, Koluguri, \.Zelasko, Majumdar, Moumen, Gandhi
https://arxiv.org/abs/2510.06961 https://mastoxiv.page/@arXiv_csCL_bot/115343748052193267
- Neuron-Level Analysis of Cultural Understanding in Large Language Models
Taisei Yamamoto, Ryoma Kumon, Danushka Bollegala, Hitomi Yanaka
https://arxiv.org/abs/2510.08284 https://mastoxiv.page/@arXiv_csCL_bot/115349533441895984
- CLMN: Concept based Language Models via Neural Symbolic Reasoning
Yibo Yang
https://arxiv.org/abs/2510.10063 https://mastoxiv.page/@arXiv_csCL_bot/115372392366793754
- Schema for In-Context Learning
Chen, Chen, Wang, Leong, Fung, Bernales, Aspuru-Guzik
https://arxiv.org/abs/2510.13905 https://mastoxiv.page/@arXiv_csCL_bot/115389057899856601
- Evaluating Latent Knowledge of Public Tabular Datasets in Large Language Models
Matteo Silvestri, Fabiano Veglianti, Flavio Giorgi, Fabrizio Silvestri, Gabriele Tolomei
https://arxiv.org/abs/2510.20351 https://mastoxiv.page/@arXiv_csCL_bot/115428615784704418
- LuxIT: A Luxembourgish Instruction Tuning Dataset from Monolingual Seed Data
Julian Valline, Cedric Lothritz, Siwen Guo, Jordi Cabot
https://arxiv.org/abs/2510.24434 https://mastoxiv.page/@arXiv_csCL_bot/115457025096322944
- Surfacing Subtle Stereotypes: A Multilingual, Debate-Oriented Evaluation of Modern LLMs
Muhammed Saeed, Muhammad Abdul-mageed, Shady Shehata
https://arxiv.org/abs/2511.01187 https://mastoxiv.page/@arXiv_csCL_bot/115491321130591723
toXiv_bot_toot
Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[4/5]:
- Retrieving Climate Change Disinformation by Narrative
Upravitelev, Solopova, Jakob, Sahitaj, M\"oller, Schmitt
https://arxiv.org/abs/2603.22015 https://mastoxiv.page/@arXiv_csCL_bot/116283633674519408
- PaperVoyager : Building Interactive Web with Visual Language Models
Dasen Dai, Biao Wu, Meng Fang, Wenhao Wang
https://arxiv.org/abs/2603.22999 https://mastoxiv.page/@arXiv_csCL_bot/116289015432093128
- Continual Robot Skill and Task Learning via Dialogue
Weiwei Gu, Suresh Kondepudi, Anmol Gupta, Lixiao Huang, Nakul Gopalan
https://arxiv.org/abs/2409.03166 https://mastoxiv.page/@arXiv_csRO_bot/113089412115632702
- Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs
Zara Siddique, Irtaza Khalid, Liam D. Turner, Luis Espinosa-Anke
https://arxiv.org/abs/2503.05371 https://mastoxiv.page/@arXiv_csLG_bot/114136994263573386
- SkillFlow: Scalable and Efficient Agent Skill Retrieval System
Fangzhou Li, Pagkratios Tagkopoulos, Ilias Tagkopoulos
https://arxiv.org/abs/2504.06188 https://mastoxiv.page/@arXiv_csAI_bot/114306773220502860
- Large Language Models for Computer-Aided Design: A Survey
Licheng Zhang, Bach Le, Naveed Akhtar, Siew-Kei Lam, Tuan Ngo
https://arxiv.org/abs/2505.08137 https://mastoxiv.page/@arXiv_csLG_bot/114504972217393639
- Structured Agent Distillation for Large Language Model
Liu, Kong, Dong, Yang, Li, Tang, Yuan, Niu, Zhang, Zhao, Lin, Huang, Wang
https://arxiv.org/abs/2505.13820 https://mastoxiv.page/@arXiv_csLG_bot/114544636506163783
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
Fan, Zhang, Li, Zhang, Chen, Hu, Wang, Qu, Zhou, Wang, Yan, Xu, Theiss, Chen, Li, Tu, Wang, Ranjan
https://arxiv.org/abs/2505.20279 https://mastoxiv.page/@arXiv_csCV_bot/114578817567171199
- Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification
Bhattacharjee, Tian, Rubin, Lo, Merchant, Hanson, Gounley, Tandon
https://arxiv.org/abs/2506.04450 https://mastoxiv.page/@arXiv_csCR_bot/114635189706505648
- L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search
Ziqi Wang, Boqin Yuan
https://arxiv.org/abs/2509.00761 https://mastoxiv.page/@arXiv_csAI_bot/115140304787881576
- Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
Han, Huang, Liao, Jiang, Lu, Zhao, Wang, Zhou, Jiang, Liang, Zhou, Sun, Yu, Xiao
https://arxiv.org/abs/2509.23392 https://mastoxiv.page/@arXiv_csAI_bot/115293169353788311
- Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
Leander Girrbach, Stephan Alaniz, Genevieve Smith, Trevor Darrell, Zeynep Akata
https://arxiv.org/abs/2510.03721 https://mastoxiv.page/@arXiv_csCV_bot/115332690912652473
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Zhang, Hu, Upasani, Ma, Hong, Kamanuru, Rainton, Wu, Ji, Li, Thakker, Zou, Olukotun
https://arxiv.org/abs/2510.04618 https://mastoxiv.page/@arXiv_csLG_bot/115332999596603375
- Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling
Giannone, Xu, Nayak, Awhad, Sudalairaj, Xu, Srivastava
https://arxiv.org/abs/2510.05825 https://mastoxiv.page/@arXiv_csLG_bot/115338159696513898
- Complete asymptotic type-token relationship for growing complex systems with inverse power-law co...
Pablo Rosillo-Rodes, Laurent H\'ebert-Dufresne, Peter Sheridan Dodds
https://arxiv.org/abs/2511.02069 https://mastoxiv.page/@arXiv_physicssocph_bot/115496283627867809
- ViPRA: Video Prediction for Robot Actions
Sandeep Routray, Hengkai Pan, Unnat Jain, Shikhar Bahl, Deepak Pathak
https://arxiv.org/abs/2511.07732 https://mastoxiv.page/@arXiv_csRO_bot/115535941444003568
- AISAC: An Integrated multi-agent System for Transparent, Retrieval-Grounded Scientific Assistance
Chandrachur Bhattacharya, Sibendu Som
https://arxiv.org/abs/2511.14043
- VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
Yufei Yin, Qianke Meng, Minghao Chen, Jiajun Ding, Zhenwei Shao, Zhou Yu
https://arxiv.org/abs/2512.12360 https://mastoxiv.page/@arXiv_csCV_bot/115729238732682644
- RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering
L\'eo Butsanets, Charles Corbi\`ere, Julien Khlaut, Pierre Manceron, Corentin Dancette
https://arxiv.org/abs/2512.17396 https://mastoxiv.page/@arXiv_csCV_bot/115762705911757243
- Measuring all the noises of LLM Evals
Sida Wang
https://arxiv.org/abs/2512.21326 https://mastoxiv.page/@arXiv_csLG_bot/115779597137785637
toXiv_bot_toot