Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:51:40

HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds
Petr Anokhin, Roman Khalikov, Stefan Rebrikov, Viktor Volkov, Artyom Sorokin, Vincent Bissonnette
arxiv.org/abs/2508.12782

@arXiv_csCL_bot@mastoxiv.page
2025-08-20 10:01:00

Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation
Dongyoon Hahm, Taywon Min, Woogyeol Jin, Kimin Lee
arxiv.org/abs/2508.14031

@arXiv_csSE_bot@mastoxiv.page
2025-09-16 09:50:27

Rethinking Technology Stack Selection with AI Coding Proficiency
Xiaoyu Zhang, Weipeng Jiang, Juan Zhai, Shiqing Ma, Qingshuang Bao, Chenhao Lin, Chao Shen, Tianlin Li, Yang Liu
arxiv.org/abs/2509.11132

@arXiv_csCL_bot@mastoxiv.page
2025-07-14 09:58:12

The AI Language Proficiency Monitor -- Tracking the Progress of LLMs on Multilingual Benchmarks
David Pomerenke, Jonas Nothnagel, Simon Ostermann
arxiv.org/abs/2507.08538

@rossng@indieweb.social
2025-07-16 18:46:43

My guide to the CEFR levels of language proficiency

@arXiv_csSE_bot@mastoxiv.page
2025-07-17 09:34:10

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?
Xinyi He, Qian Liu, Mingzhe Du, Lin Yan, Zhijie Fan, Yiming Huang, Zejian Yuan, Zejun Ma
arxiv.org/abs/2507.12415

@arXiv_physicssocph_bot@mastoxiv.page
2025-06-25 08:12:20

How trust networks shape students' opinions about the proficiency of artificially intelligent assistants
Yutong Bu, Andrew Melatos, Robin Evans
arxiv.org/abs/2506.19655

@arXiv_statME_bot@mastoxiv.page
2025-08-13 09:26:52

Analytics of Adaptive Online Testing in Practice Over a Decade
Hideo Hirose
arxiv.org/abs/2508.08643 arxiv.org/pdf/2508.08643

@arXiv_csCY_bot@mastoxiv.page
2025-08-26 10:55:06

Detecting Struggling Student Programmers using Proficiency Taxonomies
Noga Schwartz, Roy Fairstein, Avi Segal, Kobi Gal
arxiv.org/abs/2508.17353

@arXiv_astrophHE_bot@mastoxiv.page
2025-07-14 09:43:12

Prospects for sub-GeV astrophysical neutrino detection with IceCube
Per Arne Sevle Myhr (for the IceCube Collaboration), Gwenha\"el de Wasseige (for the IceCube Collaboration)
arxiv.org/abs/2507.08569

@light@noc.social
2025-09-08 15:51:33

petition.parliament.uk/petitio
>Keep 5-year ILR terms to Hong Kong British National (Overseas) visas
Debate:

@arXiv_csAI_bot@mastoxiv.page
2025-08-06 10:14:10

VQA support to Arabic Language Learning Educational Tool
Khaled Bachir Delassi (LIM Lab, Amar Telidji University, Laghouat, Algeria), Lakhdar Zeggane (LIM Lab, Amar Telidji University, Laghouat, Algeria), Hadda Cherroun (LIM Lab, Amar Telidji University, Laghouat, Algeria), Abdelhamid Haouhat (LIM Lab, Amar Telidji University, Laghouat, Algeria), Kaoutar Bouzouad (Computer Science Dept., USTHB, Algiers, Algeria)

@arXiv_csDB_bot@mastoxiv.page
2025-07-10 08:18:51

Interactive Text-to-SQL via Expected Information Gain for Disambiguation
Luyu Qiu, Jianing Li, Chi Su, Lei Chen
arxiv.org/abs/2507.06467

@arXiv_csSE_bot@mastoxiv.page
2025-09-16 10:42:47

Do Code Semantics Help? A Comprehensive Study on Execution Trace-Based Information for Code Large Language Models
Jian Wang, Xiaofei Xie, Qiang Hu, Shangqing Liu, Yi Li
arxiv.org/abs/2509.11686

@askesis@qoto.org
2025-07-01 10:58:46

# Philosophical test fails ChatGPT: AI coherence isn’t enough to prove human mind
The research reveals that #ChatGPT does exhibit proficiency in basic coherence building. It maintains consistent dictional and intentional lines by reusing phrases and aligning responses with contextual topics. It also demonstrates some ability to construct rational coherence by offering logically consistent replies…

@arXiv_csRO_bot@mastoxiv.page
2025-07-01 11:44:23

PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies?
Atharva Gundawar, Som Sagar, Ransalu Senanayake
arxiv.org/abs/2506.23725

@arXiv_csLG_bot@mastoxiv.page
2025-08-26 12:25:46

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
Weida Wang, Dongchen Huang, Jiatong Li, Tengchao Yang, Ziyang Zheng, Di Zhang, Dong Han, Benteng Chen, Binzhao Luo, Zhiyu Liu, Kunling Liu, Zhiyuan Gao, Shiqi Geng, Wei Ma, Jiaming Su, Xin Li, Shuchen Pu, Yuhan Shui, Qianjia Cheng, Zhihao Dou, Dongfei Cui, Changyong He, Jin Zeng, Zeke Xie, Mao Su, Dongzhan Zhou, Yuqiang Li, Wanli Ouyang, Lei Bai, Yunqi Cai, Xi Dai, Shufei Zhang, Jinguang Cheng, Zh…

@arXiv_csCL_bot@mastoxiv.page
2025-09-12 10:00:09

All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens
Siddarth Mamidanna, Daking Rai, Ziyu Yao, Yilun Zhou
arxiv.org/abs/2509.09650

@arXiv_eessAS_bot@mastoxiv.page
2025-07-09 09:41:52

ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
He Wang, Linhan Ma, Dake Guo, Xiong Wang, Lei Xie, Jin Xu, Junyang Lin
arxiv.org/abs/2507.05727

@arXiv_csSI_bot@mastoxiv.page
2025-06-30 08:23:50

The Missing Link: Joint Legal Citation Prediction using Heterogeneous Graph Enrichment
Lorenz Wendlinger, Simon Alexander Nonn, Abdullah Al Zubaer, Michael Granitzer
arxiv.org/abs/2506.22165

@arXiv_mathNA_bot@mastoxiv.page
2025-07-29 10:58:41

Enhancing Complex Injection Mold Design Validation Using Multicombined RV Environments
J. M. Mercado-Colmenero, D. F. Garcia-Molina, B. Gutierrez-Jimenez, C. Martin-Donate
arxiv.org/abs/2507.20732

@arXiv_csIR_bot@mastoxiv.page
2025-08-26 09:25:46

Demographically-Inspired Query Variants Using an LLM
Marwah Alaofi, Nicola Ferro, Paul Thomas, Falk Scholer, Mark Sanderson
arxiv.org/abs/2508.17644

@arXiv_csAI_bot@mastoxiv.page
2025-08-06 09:54:50

ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
Shaofeng Yin, Ting Lei, Yang Liu
arxiv.org/abs/2508.03284 arxiv.org/pdf…

@arXiv_csAR_bot@mastoxiv.page
2025-08-21 07:30:59

MAHL: Multi-Agent LLM-Guided Hierarchical Chiplet Design with Adaptive Debugging
Jinwei Tang (Katie), Jiayin Qin (Katie), Nuo Xu (Katie), Pragnya Sudershan Nalla (Katie), Yu Cao (Katie), Yang (Katie), Zhao, Caiwen Ding
arxiv.org/abs/2508.14053

@arXiv_csPF_bot@mastoxiv.page
2025-08-26 07:48:06

H2EAL: Hybrid-Bonding Architecture with Hybrid Sparse Attention for Efficient Long-Context LLM Inference
Zizhuo Fu, Xiaotian Guo, Wenxuan Zeng, Shuzhang Zhong, Yadong Zhang, Peiyu Chen, Runsheng Wang, Le Ye, Meng Li
arxiv.org/abs/2508.16653

@arXiv_statME_bot@mastoxiv.page
2025-07-04 08:49:01

On the analysis of sequential designs without a specified number of observations
Anna Klimova, Tam\'as Rudas
arxiv.org/abs/2507.02580

@arXiv_csCY_bot@mastoxiv.page
2025-06-24 10:38:00

Optimizing Mastery Learning by Fast-Forwarding Over-Practice Steps
Meng Xia, Robin Schmucker, Conrad Borchers, Vincent Aleven
arxiv.org/abs/2506.17577

@arXiv_csCL_bot@mastoxiv.page
2025-07-31 09:54:01

Investigating Hallucination in Conversations for Low Resource Languages
Amit Das, Md. Najib Hasan, Souvika Sarkar, Zheng Zhang, Fatemeh Jamshidi, Tathagata Bhattacharya, Nilanjana Raychawdhury, Dongji Feng, Vinija Jain, Aman Chadha
arxiv.org/abs/2507.22720

@arXiv_csAI_bot@mastoxiv.page
2025-08-26 10:10:27

Modular Embedding Recomposition for Incremental Learning
Aniello Panariello, Emanuele Frascaroli, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara
arxiv.org/abs/2508.16463

@arXiv_csSE_bot@mastoxiv.page
2025-08-26 10:16:36

modelSolver: A Symbolic Model-Driven Solver for Power Network Simulation and Monitoring
Izudin Dzafic, Rabih A. Jabr
arxiv.org/abs/2508.17882

@arXiv_csAI_bot@mastoxiv.page
2025-08-25 09:21:30

Modular Embedding Recomposition for Incremental Learning
Aniello Panariello, Emanuele Frascaroli, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara
arxiv.org/abs/2508.16463

@arXiv_csSE_bot@mastoxiv.page
2025-08-25 09:35:30

AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions
Zihan Wang, Jiaze Chen, Zhicheng Liu, Markus Mak, Yidi Du, Geonsik Moon, Luoqi Xu, Aaron Tua, Kunshuo Peng, Jiayi Lu, Mingfei Xia, Boqian Zou, Chenyang Ran, Guang Tian, Shoutai Zhu, Yeheng Duan, Zhenghui Kang, Zhenxing Lin, Shangshu Li, Qiang Luo, Qingshen Long, Zhiyong Chen, Yihan Xiao, Yurong Wu, Daoguang Zan, Yuyi Fu, Mingxuan Wang, Ming Ding

@arXiv_csAI_bot@mastoxiv.page
2025-08-27 10:09:53

Who Is Lagging Behind: Profiling Student Behaviors with Graph-Level Encoding in Curriculum-Based Online Learning Systems
Qian Xiao, Conn Breathnach, Ioana Ghergulescu, Conor O'Sullivan, Keith Johnston, Vincent Wade
arxiv.org/abs/2508.18925

@arXiv_csSE_bot@mastoxiv.page
2025-08-26 10:37:06

A Large-Scale Study on Developer Engagement and Expertise in Configurable Software System Projects
Karolina M. Milano, Wesley K. G. Assun\c{c}\~ao, Bruno B. P. Cafeo
arxiv.org/abs/2508.18070

@arXiv_csSE_bot@mastoxiv.page
2025-07-24 09:18:50

How Do Code Smells Affect Skill Growth in Scratch Novice Programmers?
Ricardo Hidalgo Arag\'on, Jes\'us M. Gonz\'alez-Barahona, Gregorio Robles
arxiv.org/abs/2507.17314