Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_csCL_bot@mastoxiv.page
2026-03-31 10:12:22

Training data generation for context-dependent rubric-based short answer grading
Pavel \v{S}indel\'a\v{r}, D\'avid Slivka, Christopher Bouma, Filip Pr\'a\v{s}il, Ond\v{r}ej Bojar
arxiv.org/abs/2603.28537 arxiv.org/pdf/2603.28537 arxiv.org/html/2603.28537
arXiv:2603.28537v1 Announce Type: new
Abstract: Every 4 years, the PISA test is administered by the OECD to test the knowledge of teenage students worldwide and allow for comparisons of educational systems. However, having to avoid language differences and annotator bias makes the grading of student answers challenging. For these reasons, it would be interesting to compare methods of automatic student answer grading. To train some of these methods, which require machine learning, or to compute parameters or select hyperparameters for those that do not, a large amount of domain-specific data is needed. In this work, we explore a small number of methods for creating a large-scale training dataset using only a relatively small confidential dataset as a reference, leveraging a set of very simple derived text formats to preserve confidentiality. Using these methods, we successfully created three surrogate datasets that are, at the very least, superficially more similar to the reference dataset than purely the result of prompt-based generation. Early experiments suggest one of these approaches might also lead to improved model training.
toXiv_bot_toot

@june_thalia_michael@literatur.social
2026-02-26 18:33:26

Meine Motorik ist so im Eimer, oder auch: Hört mich, wie ich meinen Computer anschreie "Höre ich mal auf, mich zu verradieren?!"

@heiseonline@social.heise.de
2026-02-06 17:00:15

Noch ein paar der zuletzt hier besonders häufig geteilten #News:
IT-Angriff betrifft IT der Beweisstückstelle der Polizei

@bthalpin@mastodon.social
2026-02-13 14:11:09

An LLM is a machine for giving confident plausible answers to whatever question is thrown at it.
The methods it uses to do this mean that (for questions that are not too challenging) there is a good chance that the answer is correct as well as confident.
Confident is guaranteed, correct is not. Is that what you need?

@arXiv_csCL_bot@mastoxiv.page
2026-03-31 11:12:28

Replaced article(s) found for cs.CL. arxiv.org/list/cs.CL/new
[1/5]:
- Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
Ru Wang, Wei Huang, Selena Song, Haoyu Zhang, Qian Niu, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo
arxiv.org/abs/2502.18273 mastoxiv.page/@arXiv_csCL_bot/
- Benchmarking NLP-supported Language Sample Analysis for Swiss Children's Speech
Anja Ryser, Yingqiang Gao, Sarah Ebling
arxiv.org/abs/2504.00780 mastoxiv.page/@arXiv_csCL_bot/
- Cultural Biases of Large Language Models and Humans in Historical Interpretation
Fabio Celli, Georgios Spathulas
arxiv.org/abs/2504.02572 mastoxiv.page/@arXiv_csCL_bot/
- BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Jiageng Wu, et al.
arxiv.org/abs/2504.19467 mastoxiv.page/@arXiv_csCL_bot/
- Understanding the Anchoring Effect of LLM with Synthetic Data: Existence, Mechanism, and Potentia...
Yiming Huang, Biquan Bie, Zuqiu Na, Weilin Ruan, Songxin Lei, Yutao Yue, Xinlei He
arxiv.org/abs/2505.15392 mastoxiv.page/@arXiv_csCL_bot/
- Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods
Raza, Qureshi, Farooq, Lotif, Chadha, Pandya, Emmanouilidis
arxiv.org/abs/2505.17870 mastoxiv.page/@arXiv_csCL_bot/
- LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
Fu, Jiang, Hong, Li, Guo, Yang, Chen, Zhang
arxiv.org/abs/2506.14493 mastoxiv.page/@arXiv_csCL_bot/
- GHTM: A Graph-based Hybrid Topic Modeling Approach with a Benchmark Dataset for the Low-Resource ...
Farhana Haque, Md. Abdur Rahman, Sumon Ahmed
arxiv.org/abs/2508.00605 mastoxiv.page/@arXiv_csCL_bot/
- Link Prediction for Event Logs in the Process Industry
Anastasia Zhukova, Thomas Walton, Christian E. Lobm\"uller, Bela Gipp
arxiv.org/abs/2508.09096 mastoxiv.page/@arXiv_csCL_bot/
- AirQA: A Comprehensive QA Dataset for AI Research with Instance-Level Evaluation
Huang, Cao, Zhang, Kang, Wang, Wang, Luo, Zheng, Qian, Chen, Yu
arxiv.org/abs/2509.16952 mastoxiv.page/@arXiv_csCL_bot/
- Multi-View Attention Multiple-Instance Learning Enhanced by LLM Reasoning for Cognitive Distortio...
Jun Seo Kim, Hyemi Kim, Woo Joo Oh, Hongjin Cho, Hochul Lee, Hye Hyeon Kim
arxiv.org/abs/2509.17292 mastoxiv.page/@arXiv_csCL_bot/
- Dual-Space Smoothness for Robust and Balanced LLM Unlearning
Han Yan, Zheyuan Liu, Meng Jiang
arxiv.org/abs/2509.23362 mastoxiv.page/@arXiv_csCL_bot/
- The Rise of AfricaNLP: Contributions, Contributors, Community Impact, and Bibliometric Analysis
Tadesse Destaw Belay, et al.
arxiv.org/abs/2509.25477 mastoxiv.page/@arXiv_csCL_bot/
- Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Reco...
Srivastav, Zheng, Bezzam, Le Bihan, Koluguri, \.Zelasko, Majumdar, Moumen, Gandhi
arxiv.org/abs/2510.06961 mastoxiv.page/@arXiv_csCL_bot/
- Neuron-Level Analysis of Cultural Understanding in Large Language Models
Taisei Yamamoto, Ryoma Kumon, Danushka Bollegala, Hitomi Yanaka
arxiv.org/abs/2510.08284 mastoxiv.page/@arXiv_csCL_bot/
- CLMN: Concept based Language Models via Neural Symbolic Reasoning
Yibo Yang
arxiv.org/abs/2510.10063 mastoxiv.page/@arXiv_csCL_bot/
- Schema for In-Context Learning
Chen, Chen, Wang, Leong, Fung, Bernales, Aspuru-Guzik
arxiv.org/abs/2510.13905 mastoxiv.page/@arXiv_csCL_bot/
- Evaluating Latent Knowledge of Public Tabular Datasets in Large Language Models
Matteo Silvestri, Fabiano Veglianti, Flavio Giorgi, Fabrizio Silvestri, Gabriele Tolomei
arxiv.org/abs/2510.20351 mastoxiv.page/@arXiv_csCL_bot/
- LuxIT: A Luxembourgish Instruction Tuning Dataset from Monolingual Seed Data
Julian Valline, Cedric Lothritz, Siwen Guo, Jordi Cabot
arxiv.org/abs/2510.24434 mastoxiv.page/@arXiv_csCL_bot/
- Surfacing Subtle Stereotypes: A Multilingual, Debate-Oriented Evaluation of Modern LLMs
Muhammed Saeed, Muhammad Abdul-mageed, Shady Shehata
arxiv.org/abs/2511.01187 mastoxiv.page/@arXiv_csCL_bot/
toXiv_bot_toot

@arXiv_csCL_bot@mastoxiv.page
2026-03-31 11:13:03

Replaced article(s) found for cs.CL. arxiv.org/list/cs.CL/new
[4/5]:
- Retrieving Climate Change Disinformation by Narrative
Upravitelev, Solopova, Jakob, Sahitaj, M\"oller, Schmitt
arxiv.org/abs/2603.22015 mastoxiv.page/@arXiv_csCL_bot/
- PaperVoyager : Building Interactive Web with Visual Language Models
Dasen Dai, Biao Wu, Meng Fang, Wenhao Wang
arxiv.org/abs/2603.22999 mastoxiv.page/@arXiv_csCL_bot/
- Continual Robot Skill and Task Learning via Dialogue
Weiwei Gu, Suresh Kondepudi, Anmol Gupta, Lixiao Huang, Nakul Gopalan
arxiv.org/abs/2409.03166 mastoxiv.page/@arXiv_csRO_bot/
- Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs
Zara Siddique, Irtaza Khalid, Liam D. Turner, Luis Espinosa-Anke
arxiv.org/abs/2503.05371 mastoxiv.page/@arXiv_csLG_bot/
- SkillFlow: Scalable and Efficient Agent Skill Retrieval System
Fangzhou Li, Pagkratios Tagkopoulos, Ilias Tagkopoulos
arxiv.org/abs/2504.06188 mastoxiv.page/@arXiv_csAI_bot/
- Large Language Models for Computer-Aided Design: A Survey
Licheng Zhang, Bach Le, Naveed Akhtar, Siew-Kei Lam, Tuan Ngo
arxiv.org/abs/2505.08137 mastoxiv.page/@arXiv_csLG_bot/
- Structured Agent Distillation for Large Language Model
Liu, Kong, Dong, Yang, Li, Tang, Yuan, Niu, Zhang, Zhao, Lin, Huang, Wang
arxiv.org/abs/2505.13820 mastoxiv.page/@arXiv_csLG_bot/
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
Fan, Zhang, Li, Zhang, Chen, Hu, Wang, Qu, Zhou, Wang, Yan, Xu, Theiss, Chen, Li, Tu, Wang, Ranjan
arxiv.org/abs/2505.20279 mastoxiv.page/@arXiv_csCV_bot/
- Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification
Bhattacharjee, Tian, Rubin, Lo, Merchant, Hanson, Gounley, Tandon
arxiv.org/abs/2506.04450 mastoxiv.page/@arXiv_csCR_bot/
- L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search
Ziqi Wang, Boqin Yuan
arxiv.org/abs/2509.00761 mastoxiv.page/@arXiv_csAI_bot/
- Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
Han, Huang, Liao, Jiang, Lu, Zhao, Wang, Zhou, Jiang, Liang, Zhou, Sun, Yu, Xiao
arxiv.org/abs/2509.23392 mastoxiv.page/@arXiv_csAI_bot/
- Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
Leander Girrbach, Stephan Alaniz, Genevieve Smith, Trevor Darrell, Zeynep Akata
arxiv.org/abs/2510.03721 mastoxiv.page/@arXiv_csCV_bot/
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Zhang, Hu, Upasani, Ma, Hong, Kamanuru, Rainton, Wu, Ji, Li, Thakker, Zou, Olukotun
arxiv.org/abs/2510.04618 mastoxiv.page/@arXiv_csLG_bot/
- Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling
Giannone, Xu, Nayak, Awhad, Sudalairaj, Xu, Srivastava
arxiv.org/abs/2510.05825 mastoxiv.page/@arXiv_csLG_bot/
- Complete asymptotic type-token relationship for growing complex systems with inverse power-law co...
Pablo Rosillo-Rodes, Laurent H\'ebert-Dufresne, Peter Sheridan Dodds
arxiv.org/abs/2511.02069 mastoxiv.page/@arXiv_physicsso
- ViPRA: Video Prediction for Robot Actions
Sandeep Routray, Hengkai Pan, Unnat Jain, Shikhar Bahl, Deepak Pathak
arxiv.org/abs/2511.07732 mastoxiv.page/@arXiv_csRO_bot/
- AISAC: An Integrated multi-agent System for Transparent, Retrieval-Grounded Scientific Assistance
Chandrachur Bhattacharya, Sibendu Som
arxiv.org/abs/2511.14043
- VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
Yufei Yin, Qianke Meng, Minghao Chen, Jiajun Ding, Zhenwei Shao, Zhou Yu
arxiv.org/abs/2512.12360 mastoxiv.page/@arXiv_csCV_bot/
- RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering
L\'eo Butsanets, Charles Corbi\`ere, Julien Khlaut, Pierre Manceron, Corentin Dancette
arxiv.org/abs/2512.17396 mastoxiv.page/@arXiv_csCV_bot/
- Measuring all the noises of LLM Evals
Sida Wang
arxiv.org/abs/2512.21326 mastoxiv.page/@arXiv_csLG_bot/
toXiv_bot_toot

@heiseonline@social.heise.de
2026-02-05 16:45:15

Einige der zuletzt hier besonders häufig geteilten #News:
IT-Angriff betrifft IT der Beweisstückstelle der Polizei