Tootfinder

@arXiv_csCL_bot@mastoxiv.page
2025-08-15 10:14:42

When Explainability Meets Privacy: An Investigation at the Intersection of Post-hoc Explainability and Differential Privacy in the Context of Natural Language Processing
Mahdi Dhaini, Stephen Meisenbacher, Ege Erdogan, Florian Matthes, Gjergji Kasneci
https://arxiv.org/abs/2508.10482

When Explainability Meets Privacy: An Investigation at the Intersection of Post-hoc Explainability and Differential Privacy in the Context of Natural Language Processing
In the study of trustworthy Natural Language Processing (NLP), a number of important research fields have emerged, including that of \textit{explainability} and \textit{privacy}. While research interest in both explainable and privacy-preserving NLP has increased considerably in recent years, there remains a lack of investigation at the intersection of the two. This leaves a considerable gap in understanding of whether achieving \textit{both} explainability and privacy is possible, or whether t…

@arXiv_csRO_bot@mastoxiv.page
2025-08-14 09:02:32

Interpretable Robot Control via Structured Behavior Trees and Large Language Models
Ingrid Ma\'eva Chekam, Ines Pastor-Martinez, Ali Tourani, Jose Andres Millan-Romera, Laura Ribeiro, Pedro Miguel Bastos Soares, Holger Voos, Jose Luis Sanchez-Lopez
https://arxiv.org/abs/2508.09621

Interpretable Robot Control via Structured Behavior Trees and Large Language Models
As intelligent robots become more integrated into human environments, there is a growing need for intuitive and reliable Human-Robot Interaction (HRI) interfaces that are adaptable and more natural to interact with. Traditional robot control methods often require users to adapt to interfaces or memorize predefined commands, limiting usability in dynamic, unstructured environments. This paper presents a novel framework that bridges natural language understanding and robotic execution by combinin…

@arXiv_csAI_bot@mastoxiv.page
2025-09-15 08:49:51

GAMA: A General Anonymizing Multi-Agent System for Privacy Preservation Enhanced by Domain Rules and Disproof Method
Hailong Yang, Renhuo Zhao, Guanjin Wang, Zhaohong Deng
https://arxiv.org/abs/2509.10018

GAMA: A General Anonymizing Multi-Agent System for Privacy Preservation Enhanced by Domain Rules and Disproof Method
With the rapid advancement of Large Language Model (LLM), LLM-based agents exhibit exceptional abilities in understanding and generating natural language, facilitating human-like collaboration and information transmission in LLM-based Multi-Agent System (MAS). High-performance LLMs are often hosted on remote servers in public spaces. When tasks involve privacy data, MAS cannot securely utilize these LLMs without implementing privacy-preserving mechanisms. To address this challenge, we propose a…

@arXiv_csMM_bot@mastoxiv.page
2025-07-15 09:12:51

LayLens: Improving Deepfake Understanding through Simplified Explanations
Abhijeet Narang, Parul Gupta, Liuyijia Su, Abhinav Dhall
https://arxiv.org/abs/2507.10066

LayLens: Improving Deepfake Understanding through Simplified Explanations
This demonstration paper presents $\mathbf{LayLens}$, a tool aimed to make deepfake understanding easier for users of all educational backgrounds. While prior works often rely on outputs containing technical jargon, LayLens bridges the gap between model reasoning and human understanding through a three-stage pipeline: (1) explainable deepfake detection using a state-of-the-art forgery localization model, (2) natural language simplification of technical explanations using a vision-language model…

@arXiv_csSE_bot@mastoxiv.page
2025-09-15 09:22:11

Developer-LLM Conversations: An Empirical Study of Interactions and Generated Code Quality
Suzhen Zhong, Ying Zou, Bram Adams
https://arxiv.org/abs/2509.10402 https://

Developer-LLM Conversations: An Empirical Study of Interactions and Generated Code Quality
Large Language Models (LLMs) are becoming integral to modern software development workflows, assisting developers with code generation, API explanation, and iterative problem-solving through natural language conversations. Despite widespread adoption, there is limited understanding of how developers interact with LLMs in practice and how these conversational dynamics influence task outcomes, code quality, and software engineering workflows. To address this, we leverage CodeChat, a large dataset…

@arXiv_csSI_bot@mastoxiv.page
2025-08-14 08:09:12

CS-Agent: LLM-based Community Search via Dual-agent Collaboration
Jiahao Hua, Long Yuan, Qingshuai Feng, Qiang Fang, Shan Huang
https://arxiv.org/abs/2508.09549 https://

CS-Agent: LLM-based Community Search via Dual-agent Collaboration
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, yet their application to graph structure analysis, particularly in community search, remains underexplored. Community search, a fundamental task in graph analysis, aims to identify groups of nodes with dense interconnections, which is crucial for understanding the macroscopic structure of graphs. In this paper, we propose GraphCS, a comprehensive benchmark designed to evaluate the perfor…

@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:24:40

MomentSeg: Moment-Centric Sampling for Enhanced Video Pixel Understanding
Ming Dai, Sen Yang, Boqiang Duan, Wankou Yang, Jingdong Wang
https://arxiv.org/abs/2510.09274 https://

MomentSeg: Moment-Centric Sampling for Enhanced Video Pixel Understanding
Referring Video Object Segmentation (RefVOS) seeks to segment target objects in videos guided by natural language descriptions, demanding both temporal reasoning and fine-grained visual comprehension. Existing sampling strategies for LLM-based approaches typically rely on either handcrafted heuristics or external keyframe models. The former often overlooks essential temporal cues, while the latter increases system complexity. To address this, we propose a unified framework that jointly optimize…

@arXiv_csSD_bot@mastoxiv.page
2025-09-15 08:00:11

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions
Jun Zhan, Mingyang Han, Yuxuan Xie, Chen Wang, Dong Zhang, Kexin Huang, Haoxiang Shi, DongXiao Wang, Tengtao Song, Qinyuan Cheng, Shimin Li, Jun Song, Xipeng Qiu, Bo Zheng
https://arxiv.org/abs/2509.09716

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions
Spoken language models (SLMs) have emerged as a unified paradigm for speech understanding and generation, enabling natural human machine interaction. However, while most progress has focused on semantic accuracy and instruction following, the ability of SLMs to adapt their speaking style based on spoken instructions has received limited attention. We introduce Voice Style Adaptation (VSA), a new task that examines whether SLMs can modify their speaking style, such as timbre, prosody, or persona…

@arXiv_csCL_bot@mastoxiv.page
2025-08-15 10:12:22

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Huizhen Shu, Xuying Li, Qirui Wang, Yuji Kosuga, Mengqiu Tian, Zhuo Li
https://arxiv.org/abs/2508.10404

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
With the rapid proliferation of Natural Language Processing (NLP), especially Large Language Models (LLMs), generating adversarial examples to jailbreak LLMs remains a key challenge for understanding model vulnerabilities and improving robustness. In this context, we propose a new black-box attack method that leverages the interpretability of large models. We introduce the Sparse Feature Perturbation Framework (SFPF), a novel approach for adversarial text generation that utilizes sparse autoenc…

@arXiv_csIR_bot@mastoxiv.page
2025-08-11 09:24:30

ITDR: An Instruction Tuning Dataset for Enhancing Large Language Models in Recommendations
Zekun Liu, Xiaowen Huang, Jitao Sang
https://arxiv.org/abs/2508.05667 https://

ITDR: An Instruction Tuning Dataset for Enhancing Large Language Models in Recommendations
Large language models (LLMs) have demonstrated outstanding performance in natural language processing tasks. However, in the field of recommendation systems, due to the structural differences between user behavior data and natural language, LLMs struggle to effectively model the associations between user preferences and items. Although prompt-based methods can generate recommendation results, their inadequate understanding of recommendation tasks leads to constrained performance. To address thi…

@arXiv_qbioOT_bot@mastoxiv.page
2025-10-14 08:58:18

Isotropy and Geometry of Pretrained Protein LMs
Sheikh Azizul Hakim, Kowshic Roy, M Saifur Rahman
https://arxiv.org/abs/2510.10655 https://arxiv.org/pdf/25…

Isotropy and Geometry of Pretrained Protein LMs
Large pretrained language models have transformed natural language processing, and their adaptation to protein sequences -- viewed as strings of amino acid characters -- has advanced protein analysis. However, the distinct properties of proteins, such as variable sequence lengths and lack of word-sentence analogs, necessitate a deeper understanding of protein language models (LMs). We investigate the isotropy of protein LM embedding spaces using average pairwise cosine similarity and the IsoSco…

@arXiv_qbioQM_bot@mastoxiv.page
2025-08-13 09:06:22

Language Models Can Understand Spectra: A Multimodal Model for Molecular Structure Elucidation
Yunyue Su, Jiahui Chen, Zao Jiang, Zhenyi Zhong, Liang Wang, Qiang Liu
https://arxiv.org/abs/2508.08441

Language Models Can Understand Spectra: A Multimodal Model for Molecular Structure Elucidation
Structure elucidation is a fundamental technique for understanding the microscopic composition of matter and is widely applied across various disciplines in the natural sciences and engineering. However, existing methods often rely heavily on prior databases or known structural information, making it difficult to resolve unknown structures. In addition, complex structures typically require the joint analysis of multiple spectroscopic modalities. This process heavily depends on expert domain kno…

@arXiv_csSD_bot@mastoxiv.page
2025-10-14 11:16:28

Unify Variables in Neural Scaling Laws for General Audio Representations via Embedding Effective Rank
Xuyao Deng, Yanjie Sun, Yong Dou, Kele Xu
https://arxiv.org/abs/2510.10948 …

Unify Variables in Neural Scaling Laws for General Audio Representations via Embedding Effective Rank
Scaling laws have profoundly shaped our understanding of model performance in computer vision and natural language processing, yet their application to general audio representation learning remains underexplored. A key challenge lies in the multifactorial nature of general audio representation-representation quality is jointly influenced by variables such as audio length, embedding dimensionality, model depth, model architecture, data volume, etc., many of which are difficult to isolate or expr…

@arXiv_csHC_bot@mastoxiv.page
2025-08-08 09:27:22

AI Conversational Tutors in Foreign Language Learning: A Mixed-Methods Evaluation Study
Nikolaos Avouris
https://arxiv.org/abs/2508.05156 https://arxiv.org…

AI Conversational Tutors in Foreign Language Learning: A Mixed-Methods Evaluation Study
This paper focuses on AI tutors in foreign language learning, a field of application of AI tutors with great development, especially during the last years, when great advances in natural language understanding and processing in real time, have been achieved. These tutors attempt to address needs for improving language skills (speaking, or communicative competence, understanding). In this paper, a mixed-methos empirical study on the use of different kinds of state-of-the-art AI tutors for langua…

@arXiv_csCV_bot@mastoxiv.page
2025-08-12 12:48:13

ReferSplat: Referring Segmentation in 3D Gaussian Splatting
Shuting He, Guangquan Jie, Changshuo Wang, Yun Zhou, Shuming Hu, Guanbin Li, Henghui Ding
https://arxiv.org/abs/2508.08252

ReferSplat: Referring Segmentation in 3D Gaussian Splatting
We introduce Referring 3D Gaussian Splatting Segmentation (R3DGS), a new task that aims to segment target objects in a 3D Gaussian scene based on natural language descriptions, which often contain spatial relationships or object attributes. This task requires the model to identify newly described objects that may be occluded or not directly visible in a novel view, posing a significant challenge for 3D multi-modal understanding. Developing this capability is crucial for advancing embodied AI. T…

@arXiv_csCL_bot@mastoxiv.page
2025-09-12 09:56:59

Fluent but Unfeeling: The Emotional Blind Spots of Language Models
Bangzhao Shu, Isha Joshi, Melissa Karnaze, Anh C. Pham, Ishita Kakkar, Sindhu Kothe, Arpine Hovasapian, Mai ElSherief
https://arxiv.org/abs/2509.09593

Fluent but Unfeeling: The Emotional Blind Spots of Language Models
The versatility of Large Language Models (LLMs) in natural language understanding has made them increasingly popular in mental health research. While many studies explore LLMs' capabilities in emotion recognition, a critical gap remains in evaluating whether LLMs align with human emotions at a fine-grained level. Existing research typically focuses on classifying emotions into predefined, limited categories, overlooking more nuanced expressions. To address this gap, we introduce EXPRESS, a benc…

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 10:39:09

QAgent: A modular Search Agent with Interactive Query Understanding
Yi Jiang, Lei Shen, Lujie Niu, Sendong Zhao, Wenbo Su, Bo Zheng
https://arxiv.org/abs/2510.08383 https://

QAgent: A modular Search Agent with Interactive Query Understanding
Large language models (LLMs) excel at natural language tasks but are limited by their static parametric knowledge, especially in knowledge-intensive task. Retrieval-augmented generation (RAG) mitigates this by integrating external information. However, (1) traditional RAG struggles with complex query understanding, and (2) even search agents trained with reinforcement learning (RL), despite their promise, still face generalization and deployment challenges. To address these limitations, we prop…

@arXiv_csCY_bot@mastoxiv.page
2025-08-07 07:32:33

Health Insurance Coverage Rule Interpretation Corpus: Law, Policy, and Medical Guidance for Health Insurance Coverage Understanding
Mike Gartner
https://arxiv.org/abs/2508.03718

Health Insurance Coverage Rule Interpretation Corpus: Law, Policy, and Medical Guidance for Health Insurance Coverage Understanding
U.S. health insurance is complex, and inadequate understanding and limited access to justice have dire implications for the most vulnerable. Advances in natural language processing present an opportunity to support efficient, case-specific understanding, and to improve access to justice and healthcare. Yet existing corpora lack context necessary for assessing even simple cases. We collect and release a corpus of reputable legal and medical text related to U.S. health insurance. We also introduc…

@seeingwithsound@mas.to
2025-10-01 14:05:54

Artificial phantasia: Evidence for propositional reasoning-based mental imagery in large language models https://arxiv.org/abs/2509.23108 on the representation of visual imagery in humans; more information in the Bluesky thread

Artificial Phantasia: Evidence for Propositional Reasoning-Based Mental Imagery in Large Language Models
This study offers a novel approach for benchmarking complex cognitive behavior in artificial systems. Almost universally, Large Language Models (LLMs) perform best on tasks which may be included in their training data and can be accomplished solely using natural language, limiting our understanding of their emergent sophisticated cognitive capacities. In this work, we created dozens of novel items of a classic mental imagery task from cognitive psychology. A task which, traditionally, cognitive…

@arXiv_csCV_bot@mastoxiv.page
2025-09-12 10:14:49

Visual Grounding from Event Cameras
Lingdong Kong, Dongyue Lu, Ao Liang, Rong Li, Yuhao Dong, Tianshuai Hu, Lai Xing Ng, Wei Tsang Ooi, Benoit R. Cottereau
https://arxiv.org/abs/2509.09584

Visual Grounding from Event Cameras
Event cameras capture changes in brightness with microsecond precision and remain reliable under motion blur and challenging illumination, offering clear advantages for modeling highly dynamic scenes. Yet, their integration with natural language understanding has received little attention, leaving a gap in multimodal perception. To address this, we introduce Talk2Event, the first large-scale benchmark for language-driven object grounding using event data. Built on real-world driving scenarios, …

@arXiv_csLG_bot@mastoxiv.page
2025-10-02 11:09:11

Augmenting LLMs for General Time Series Understanding and Prediction
Felix Parker, Nimeesha Chan, Chi Zhang, Kimia Ghobadi
https://arxiv.org/abs/2510.01111 https://

Augmenting LLMs for General Time Series Understanding and Prediction
Time series data is fundamental to decision-making in many crucial domains including healthcare, finance, and environmental science. However, analyzing this data often requires incorporating unstructured contextual information, answering domain-specific questions, and generating natural language explanations -- capabilities that traditional time series models lack due to their inability to process text. While Large Language Models (LLMs) excel at contextual reasoning and knowledge integration, …

@arXiv_csAI_bot@mastoxiv.page
2025-10-08 07:30:39

Structured Cognition for Behavioral Intelligence in Large Language Model Agents: Preliminary Study
Myung Ho Kim
https://arxiv.org/abs/2510.05107 https://ar…

Structured Cognition for Behavioral Intelligence in Large Language Model Agents: Preliminary Study
Large language models have advanced natural language understanding and generation, yet their use as autonomous agents raises architectural challenges for multi-step tasks. Existing frameworks often intertwine inference, memory, and control in a single prompt, which can reduce coherence and predictability. The Structured Cognitive Loop (SCL) is introduced as an alternative architecture that separates these functions. In SCL, the language model is dedicated to inference, memory is maintained exte…

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:21:42

COLE: a Comprehensive Benchmark for French Language Understanding Evaluation
David Beauchemin, Yan Tremblay, Mohamed Amine Youssef, Richard Khoury
https://arxiv.org/abs/2510.05046

COLE: a Comprehensive Benchmark for French Language Understanding Evaluation
To address the need for a more comprehensive evaluation of French Natural Language Understanding (NLU), we introduce COLE, a new benchmark composed of 23 diverse task covering a broad range of NLU capabilities, including sentiment analysis, paraphrase detection, grammatical judgment, and reasoning, with a particular focus on linguistic phenomena relevant to the French language. We benchmark 94 large language models (LLM), providing an extensive analysis of the current state of French NLU. Our r…

@arXiv_qbioQM_bot@mastoxiv.page
2025-08-08 07:59:42

Understanding protein function with a multimodal retrieval-augmented foundation model
Timothy Fei Truong Jr, Tristan Bepler
https://arxiv.org/abs/2508.04724 https://

Understanding protein function with a multimodal retrieval-augmented foundation model
Protein language models (PLMs) learn probability distributions over natural protein sequences. By learning from hundreds of millions of natural protein sequences, protein understanding and design capabilities emerge. Recent works have shown that scaling these models improves structure prediction, but does not seem to improve mutation understanding and representation quality for protein function prediction. We introduce PoET-2, a multimodal, retrieval-augmented protein foundation model that inco…

@arXiv_csSE_bot@mastoxiv.page
2025-08-11 09:17:00

Position: Intelligent Coding Systems Should Write Programs with Justifications
Xiangzhe Xu, Shiwei Feng, Zian Su, Chengpeng Wang, Xiangyu Zhang
https://arxiv.org/abs/2508.06017 …

Position: Intelligent Coding Systems Should Write Programs with Justifications
Intelligent coding systems are transforming software development by enabling users to specify code behavior in natural language. However, the opaque decision-making of AI-driven coders raises trust and usability concerns, particularly for non-expert users who cannot inspect low-level implementations. We argue that these systems should not only generate code but also produce clear, consistent justifications that bridge model reasoning and user understanding. To this end, we identify two critical…

@arXiv_csRO_bot@mastoxiv.page
2025-09-30 13:00:01

Prompting Robot Teams with Natural Language
Nicolas Pfitzer, Eduardo Sebasti\'an, Ajay Shankar, Amanda Prorok
https://arxiv.org/abs/2509.24575 https://…

Prompting Robot Teams with Natural Language
This paper presents a framework towards prompting multi-robot teams with high-level tasks using natural language expressions. Our objective is to use the reasoning capabilities demonstrated by recent language models in understanding and decomposing human expressions of intent, and repurpose these for multi-robot collaboration and decision-making. The key challenge is that an individual's behavior in a collective can be hard to specify and interpret, and must continuously adapt to actions from o…

@arXiv_csCL_bot@mastoxiv.page
2025-09-08 10:12:30

Memorization $\neq$ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?
Boxiang Ma, Ru Li, Yuanlong Wang, Hongye Tan, Xiaoli Li
https://arxiv.org/abs/2509.04866

Memorization $\neq$ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?
Driven by vast and diverse textual data, large language models (LLMs) have demonstrated impressive performance across numerous natural language processing (NLP) tasks. Yet, a critical question persists: does their generalization arise from mere memorization of training data or from deep semantic understanding? To investigate this, we propose a bi-perspective evaluation framework to assess LLMs' scenario cognition - the ability to link semantic scenario elements with their arguments in context. …

@arXiv_csCR_bot@mastoxiv.page
2025-09-19 08:50:11

Beyond Data Privacy: New Privacy Risks for Large Language Models
Yuntao Du, Zitao Li, Ninghui Li, Bolin Ding
https://arxiv.org/abs/2509.14278 https://arxiv…

Beyond Data Privacy: New Privacy Risks for Large Language Models
Large Language Models (LLMs) have achieved remarkable progress in natural language understanding, reasoning, and autonomous decision-making. However, these advancements have also come with significant privacy concerns. While significant research has focused on mitigating the data privacy risks of LLMs during various stages of model training, less attention has been paid to new threats emerging from their deployment. The integration of LLMs into widely used applications and the weaponization of …

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 10:18:29

VoiceAgentBench: Are Voice Assistants ready for agentic tasks?
Dhruv Jain, Harshit Shukla, Gautam Rajeev, Ashish Kulkarni, Chandra Khatri, Shubham Agarwal
https://arxiv.org/abs/2510.07978

VoiceAgentBench: Are Voice Assistants ready for agentic tasks?
Large-scale Speech Language Models (SpeechLMs) have enabled voice assistants capable of understanding natural spoken queries and performing complex tasks. However, existing speech benchmarks primarily focus on isolated capabilities such as transcription, or question-answering, and do not systematically evaluate agentic scenarios encompassing multilingual and cultural understanding, as well as adversarial robustness. To address this, we introduce VoiceAgentBench, a comprehensive benchmark design…

@arXiv_csDB_bot@mastoxiv.page
2025-10-01 07:43:46

From NL2SQL to NL2GeoSQL: GeoSQL-Eval for automated evaluation of LLMs on PostGIS queries
Shuyang Hou, Haoyue Jiao, Ziqi Liu, Lutong Xie, Guanyu Chen, Shaowen Wu, Xuefeng Guan, Huayi Wu
https://arxiv.org/abs/2509.25264

From NL2SQL to NL2GeoSQL: GeoSQL-Eval for automated evaluation of LLMs on PostGIS queries
In recent years, large language models (LLMs) have achieved remarkable progress in natural language understanding and structured query generation (NL2SQL). However, extending these advances to GeoSQL tasks in the PostGIS environment remains challenging due to the complexity of spatial functions, geometric data types, and execution semantics. Existing evaluations primarily focus on general relational databases or Google Earth Engine code generation, leaving a lack of systematic benchmarks tailor…

@arXiv_csCL_bot@mastoxiv.page
2025-09-11 09:21:23

Verbalized Algorithms
Supriya Lall, Christian Farrell, Hari Pathanjaly, Marko Pavic, Sarvesh Chezhian, Masataro Asai
https://arxiv.org/abs/2509.08150 https://

Verbalized Algorithms
Instead of querying LLMs in a one-shot manner and hoping to get the right answer for a reasoning task, we propose a paradigm we call \emph{verbalized algorithms} (VAs), which leverage classical algorithms with established theoretical understanding. VAs decompose a task into simple elementary operations on natural language strings that they should be able to answer reliably, and limit the scope of LLMs to only those simple tasks. For example, for sorting a series of natural language strings, \em…

@arXiv_csHC_bot@mastoxiv.page
2025-08-05 11:27:10

Understanding User Preferences for Interaction Styles in Conversational Recommender Systems: The Predictive Role of System Qualities, User Experience, and Traits
Raj Mahmud, Shlomo Berkovsky, Mukesh Prasad, A. Baki Kocaballi
https://arxiv.org/abs/2508.02328

Understanding User Preferences for Interaction Styles in Conversational Recommender Systems: The Predictive Role of System Qualities, User Experience, and Traits
Conversational Recommender Systems (CRSs) deliver personalised recommendations through multi-turn natural language dialogue and increasingly support both task-oriented and exploratory interactions. Yet, the factors shaping user interaction preferences remain underexplored. In this within-subjects study ($N = 139$), participants experienced two scripted CRS dialogues, rated their experiences, and indicated the importance of eight system qualities. Logistic regression revealed that preference f…

@arXiv_csCV_bot@mastoxiv.page
2025-09-01 09:48:12

HCCM: Hierarchical Cross-Granularity Contrastive and Matching Learning for Natural Language-Guided Drones
Hao Ruan, Jinliang Lin, Yingxin Lai, Zhiming Luo, Shaozi Li
https://arxiv.org/abs/2508.21539

HCCM: Hierarchical Cross-Granularity Contrastive and Matching Learning for Natural Language-Guided Drones
Natural Language-Guided Drones (NLGD) provide a novel paradigm for tasks such as target matching and navigation. However, the wide field of view and complex compositional semantics in drone scenarios pose challenges for vision-language understanding. Mainstream Vision-Language Models (VLMs) emphasize global alignment while lacking fine-grained semantics, and existing hierarchical methods depend on precise entity partitioning and strict containment, limiting effectiveness in dynamic environments…

@arXiv_csIT_bot@mastoxiv.page
2025-09-26 07:50:41

On Theoretical Interpretations of Concept-Based In-Context Learning
Huaze Tang, Tianren Peng, Shao-lun Huang
https://arxiv.org/abs/2509.20882 https://arxiv…

On Theoretical Interpretations of Concept-Based In-Context Learning
In-Context Learning (ICL) has emerged as an important new paradigm in natural language processing and large language model (LLM) applications. However, the theoretical understanding of the ICL mechanism remains limited. This paper aims to investigate this issue by studying a particular ICL approach, called concept-based ICL (CB-ICL). In particular, we propose theoretical analyses on applying CB-ICL to ICL tasks, which explains why and when the CB-ICL performs well for predicting query labels in…

@arXiv_eessIV_bot@mastoxiv.page
2025-07-30 08:17:01

Querying GI Endoscopy Images: A VQA Approach
Gaurav Parajuli
https://arxiv.org/abs/2507.21165 https://arxiv.org/pdf/2507.21165

Querying GI Endoscopy Images: A VQA Approach
VQA (Visual Question Answering) combines Natural Language Processing (NLP) with image understanding to answer questions about a given image. It has enormous potential for the development of medical diagnostic AI systems. Such a system can help clinicians diagnose gastro-intestinal (GI) diseases accurately and efficiently. Although many of the multimodal LLMs available today have excellent VQA capabilities in the general domain, they perform very poorly for VQA tasks in specialized domains such …

@arXiv_csCL_bot@mastoxiv.page
2025-10-09 10:39:11

Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models
Yuntao Gui, James Cheng
https://arxiv.org/abs/2510.07048 https://arxiv.org/…

Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models
Despite their remarkable natural language understanding capabilities, Large Language Models (LLMs) have been underutilized for retrieval tasks. We present Search-R3, a novel framework that addresses this limitation by adapting LLMs to generate search embeddings as a direct output of their reasoning process. Our approach exploits LLMs' chain-of-thought capabilities, allowing them to produce more effective embeddings by reasoning step-by-step through complex semantic analyses. We implement this t…

@arXiv_csLG_bot@mastoxiv.page
2025-09-25 10:51:12

Video models are zero-shot learners and reasoners
Thadd\"aus Wiedemer, Yuxuan Li, Paul Vicol, Shixiang Shane Gu, Nick Matarese, Kevin Swersky, Been Kim, Priyank Jaini, Robert Geirhos
https://arxiv.org/abs/2509.20328

Video models are zero-shot learners and reasoners
The remarkable zero-shot capabilities of Large Language Models (LLMs) have propelled natural language processing from task-specific models to unified, generalist foundation models. This transformation emerged from simple primitives: large, generative models trained on web-scale data. Curiously, the same primitives apply to today's generative video models. Could video models be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language underst…

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 08:24:19

Evaluation of LLMs for Process Model Analysis and Optimization
Akhil Kumar, Jianliang Leon Zhao, Om Dobariya
https://arxiv.org/abs/2510.07489 https://arxiv…

Evaluation of LLMs for Process Model Analysis and Optimization
In this paper, we report our experience with several LLMs for their ability to understand a process model in an interactive, conversational style, find syntactical and logical errors in it, and reason with it in depth through a natural language (NL) interface. Our findings show that a vanilla, untrained LLM like ChatGPT (model o3) in a zero-shot setting is effective in understanding BPMN process models from images and answering queries about them intelligently at syntactic, logic, and semantic …

@arXiv_csHC_bot@mastoxiv.page
2025-09-30 11:08:51

Exploring Similarity between Neural and LLM Trajectories in Language Processing
Xin Xiao, Kaiwen Wei, Jiang Zhong, Dongshuo Yin, Yu Tian, Xuekai Wei, Mingliang Zhou
https://arxiv.org/abs/2509.24307

@arXiv_csCY_bot@mastoxiv.page
2025-09-18 09:53:11

Interleaving Natural Language Prompting with Code Editing for Solving Programming Tasks with Generative AI Models
Victor-Alexandru P\u{a}durean, Paul Denny, Andrew Luxton-Reilly, Alkis Gotovos, Adish Singla
https://arxiv.org/abs/2509.14088

Interleaving Natural Language Prompting with Code Editing for Solving Programming Tasks with Generative AI Models
Nowadays, computing students often rely on both natural-language prompting and manual code editing to solve programming tasks. Yet we still lack a clear understanding of how these two modes are combined in practice, and how their usage varies with task complexity and student ability. In this paper, we investigate this through a large-scale study in an introductory programming course, collecting 13,305 interactions from 355 students during a three-day laboratory activity. Our analysis shows that…

@arXiv_csRO_bot@mastoxiv.page
2025-09-30 12:57:41

DynaMIC: Dynamic Multimodal In-Context Learning Enabled Embodied Robot Counterfactual Resistance Ability
Tianqiang Yan, Ziqiao Lin, Sicheng Wang, Tianwei Zhang, Zhenglong Sun
https://arxiv.org/abs/2509.24413

DynaMIC: Dynamic Multimodal In-Context Learning Enabled Embodied Robot Counterfactual Resistance Ability
The emergence of large pre-trained models based on natural language has breathed new life into robotics development. Extensive research has integrated large models with robots, utilizing the powerful semantic understanding and generation capabilities of large models to facilitate robot control through natural language instructions gradually. However, we found that robots that strictly adhere to human instructions, especially those containing misleading information, may encounter errors during t…

@arXiv_csCV_bot@mastoxiv.page
2025-09-29 11:22:57

JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
Shuang Zeng, Dekang Qi, Xinyuan Chang, Feng Xiong, Shichao Xie, Xiaolong Wu, Shiyi Liang, Mu Xu, Xing Wei
https://arxiv.org/abs/2509.22548

JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
Vision-and-Language Navigation requires an embodied agent to navigate through unseen environments, guided by natural language instructions and a continuous video stream. Recent advances in VLN have been driven by the powerful semantic understanding of Multimodal Large Language Models. However, these methods typically rely on explicit semantic memory, such as building textual cognitive maps or storing historical visual frames. This type of method suffers from spatial information loss, computatio…

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:20:22

A Set of Quebec-French Corpus of Regional Expressions and Terms
David Beauchemin, Yan Tremblay, Mohamed Amine Youssef, Richard Khoury
https://arxiv.org/abs/2510.05026 https://…

A Set of Quebec-French Corpus of Regional Expressions and Terms
The tasks of idiom understanding and dialect understanding are both well-established benchmarks in natural language processing. In this paper, we propose combining them, and using regional idioms as a test of dialect understanding. Towards this end, we propose two new benchmark datasets for the Quebec dialect of French: QFrCoRE, which contains 4,633 instances of idiomatic phrases, and QFrCoRT, which comprises 171 regional instances of idiomatic words. We explain how to construct these corpora, …

@arXiv_csAI_bot@mastoxiv.page
2025-09-05 08:22:51

Towards a Neurosymbolic Reasoning System Grounded in Schematic Representations
Fran\c{c}ois Olivier, Zied Bouraoui
https://arxiv.org/abs/2509.03644 https://

Towards a Neurosymbolic Reasoning System Grounded in Schematic Representations
Despite significant progress in natural language understanding, Large Language Models (LLMs) remain error-prone when performing logical reasoning, often lacking the robust mental representations that enable human-like comprehension. We introduce a prototype neurosymbolic system, Embodied-LM, that grounds understanding and logical reasoning in schematic representations based on image schemas-recurring patterns derived from sensorimotor experience that structure human cognition. Our system operat…

@arXiv_csSE_bot@mastoxiv.page
2025-07-24 08:30:20

Evaluating Uncertainty and Quality of Visual Language Action-enabled Robots
Pablo Valle, Chengjie Lu, Shaukat Ali, Aitor Arrieta
https://arxiv.org/abs/2507.17049

Evaluating Uncertainty and Quality of Visual Language Action-enabled Robots
Visual Language Action (VLA) models are a multi-modal class of Artificial Intelligence (AI) systems that integrate visual perception, natural language understanding, and action planning to enable agents to interpret their environment, comprehend instructions, and perform embodied tasks autonomously. Recently, significant progress has been made to advance this field. These kinds of models are typically evaluated through task success rates, which fail to capture the quality of task execution and …

@arXiv_csRO_bot@mastoxiv.page
2025-08-27 09:41:12

DELIVER: A System for LLM-Guided Coordinated Multi-Robot Pickup and Delivery using Voronoi-Based Relay Planning
Alkesh K. Srivastava, Jared Michael Levin, Alexander Derrico, Philip Dames
https://arxiv.org/abs/2508.19114

DELIVER: A System for LLM-Guided Coordinated Multi-Robot Pickup and Delivery using Voronoi-Based Relay Planning
We present DELIVER (Directed Execution of Language-instructed Item Via Engineered Relay), a fully integrated framework for cooperative multi-robot pickup and delivery driven by natural language commands. DELIVER unifies natural language understanding, spatial decomposition, relay planning, and motion execution to enable scalable, collision-free coordination in real-world settings. Given a spoken or written instruction, a lightweight instance of LLaMA3 interprets the command to extract pickup an…

@arXiv_csAI_bot@mastoxiv.page
2025-09-01 09:16:32

Integrating Large Language Models with Network Optimization for Interactive and Explainable Supply Chain Planning: A Real-World Case Study
Saravanan Venkatachalam
https://arxiv.org/abs/2508.21622

Integrating Large Language Models with Network Optimization for Interactive and Explainable Supply Chain Planning: A Real-World Case Study
This paper presents an integrated framework that combines traditional network optimization models with large language models (LLMs) to deliver interactive, explainable, and role-aware decision support for supply chain planning. The proposed system bridges the gap between complex operations research outputs and business stakeholder understanding by generating natural language summaries, contextual visualizations, and tailored key performance indicators (KPIs). The core optimization model address…

@arXiv_csSE_bot@mastoxiv.page
2025-08-04 09:32:01

Can User Feedback Help Issue Detection? An Empirical Study on a One-billion-user Online Service System
Shuyao Jiang, Jiazhen Gu, Wujie Zheng, Yangfan Zhou, Michael R. Lyu
https://arxiv.org/abs/2508.00593

Can User Feedback Help Issue Detection? An Empirical Study on a One-billion-user Online Service System
Background: It has long been suggested that user feedback, typically written in natural language by end-users, can help issue detection. However, for large-scale online service systems that receive a tremendous amount of feedback, it remains a challenging task to identify severe issues from user feedback. Aims: To develop a better feedback-based issue detection approach, it is crucial first to gain a comprehensive understanding of the characteristics of user feedback in real production systems.…

@arXiv_csCV_bot@mastoxiv.page
2025-09-05 10:25:11

AnomalyLMM: Bridging Generative Knowledge and Discriminative Retrieval for Text-Based Person Anomaly Search
Hao Ju, Hu Zhang, Zhedong Zheng
https://arxiv.org/abs/2509.04376 http…

AnomalyLMM: Bridging Generative Knowledge and Discriminative Retrieval for Text-Based Person Anomaly Search
With growing public safety demands, text-based person anomaly search has emerged as a critical task, aiming to retrieve individuals with abnormal behaviors via natural language descriptions. Unlike conventional person search, this task presents two unique challenges: (1) fine-grained cross-modal alignment between textual anomalies and visual behaviors, and (2) anomaly recognition under sparse real-world samples. While Large Multi-modal Models (LMMs) excel in multi-modal understanding, their pot…

@arXiv_csHC_bot@mastoxiv.page
2025-09-23 09:53:00

Controlled Yet Natural: A Hybrid BDI-LLM Conversational Agent for Child Helpline Training
Mohammed Al Owayyed, Adarsh Denga, Willem-Paul Brinkman
https://arxiv.org/abs/2509.16784

Controlled Yet Natural: A Hybrid BDI-LLM Conversational Agent for Child Helpline Training
Child helpline training often relies on human-led roleplay, which is both time- and resource-consuming. To address this, rule-based interactive agent simulations have been proposed to provide a structured training experience for new counsellors. However, these agents might suffer from limited language understanding and response variety. To overcome these limitations, we present a hybrid interactive agent that integrates Large Language Models (LLMs) into a rule-based Belief-Desire-Intention (BDI…

@arXiv_csCL_bot@mastoxiv.page
2025-09-08 10:10:00

Evaluating NL2SQL via SQL2NL
Mohammadtaher Safarzadeh, Afshin Oroojlooyjadid, Dan Roth
https://arxiv.org/abs/2509.04657 https://arxiv.org/pdf/2509.04657

Evaluating NL2SQL via SQL2NL
Robust evaluation in the presence of linguistic variation is key to understanding the generalization capabilities of Natural Language to SQL (NL2SQL) models, yet existing benchmarks rarely address this factor in a systematic or controlled manner. We propose a novel schema-aligned paraphrasing framework that leverages SQL-to-NL (SQL2NL) to automatically generate semantically equivalent, lexically diverse queries while maintaining alignment with the original schema and intent. This enables the fi…

@arXiv_csCL_bot@mastoxiv.page
2025-10-06 08:33:39

KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI
So Kuroki, Yotaro Kubo, Takuya Akiba, Yujin Tang
https://arxiv.org/abs/2510.02327

KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI
Real-time speech-to-speech (S2S) models excel at generating natural, low-latency conversational responses but often lack deep knowledge and semantic understanding. Conversely, cascaded systems combining automatic speech recognition, a text-based Large Language Model (LLM), and text-to-speech synthesis offer superior knowledge representation at the cost of high latency, which disrupts the flow of natural interaction. This paper introduces a novel hybrid architecture that bridges the gap between …

@arXiv_csCL_bot@mastoxiv.page
2025-08-08 10:04:22

How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations
Brandon Jaipersaud, David Krueger, Ekdeep Singh Lubana
https://arxiv.org/abs/2508.05625

How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations
Large Language Models (LLMs) have started to demonstrate the ability to persuade humans, yet our understanding of how this dynamic transpires is limited. Recent work has used linear probes, lightweight tools for analyzing model representations, to study various LLM skills such as the ability to model user sentiment and political perspective. Motivated by this, we apply probes to study persuasion dynamics in natural, multi-turn conversations. We leverage insights from cognitive science to train …

@arXiv_csRO_bot@mastoxiv.page
2025-07-16 07:46:41

Vision Language Action Models in Robotic Manipulation: A Systematic Review
Muhayy Ud Din, Waseem Akram, Lyes Saad Saoud, Jan Rosell, Irfan Hussain
https://arxiv.org/abs/2507.10672

Vision Language Action Models in Robotic Manipulation: A Systematic Review
Vision Language Action (VLA) models represent a transformative shift in robotics, with the aim of unifying visual perception, natural language understanding, and embodied control within a single learning framework. This review presents a comprehensive and forward-looking synthesis of the VLA paradigm, with a particular emphasis on robotic manipulation and instruction-driven autonomy. We comprehensively analyze 102 VLA models, 26 foundational datasets, and 12 simulation platforms that collective…

@arXiv_csCV_bot@mastoxiv.page
2025-10-02 10:53:41

PhraseStereo: The First Open-Vocabulary Stereo Image Segmentation Dataset
Thomas Campagnolo, Ezio Malis, Philippe Martinet, Gaetan Bahl
https://arxiv.org/abs/2510.00818 https://…

PhraseStereo: The First Open-Vocabulary Stereo Image Segmentation Dataset
Understanding how natural language phrases correspond to specific regions in images is a key challenge in multimodal semantic segmentation. Recent advances in phrase grounding are largely limited to single-view images, neglecting the rich geometric cues available in stereo vision. For this, we introduce PhraseStereo, the first novel dataset that brings phrase-region segmentation to stereo image pairs. PhraseStereo builds upon the PhraseCut dataset by leveraging GenStereo to generate accurate ri…

@arXiv_csCL_bot@mastoxiv.page
2025-09-05 10:22:31

Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases
Bufan Gao, Elisa Kreiss
https://arxiv.org/abs/2509.04373 https://

Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases
As LLMs are increasingly applied in socially impactful settings, concerns about gender bias have prompted growing efforts both to measure and mitigate such bias. These efforts often rely on evaluation tasks that differ from natural language distributions, as they typically involve carefully constructed task prompts that overtly or covertly signal the presence of gender bias-related content. In this paper, we examine how signaling the evaluative purpose of a task impacts measured gender bias in …

@arXiv_csSE_bot@mastoxiv.page
2025-09-23 10:50:10

Clotho: Measuring Task-Specific Pre-Generation Test Adequacy for LLM Inputs
Juyeon Yoon, Somin Kim, Robert Feldt, Shin Yoo
https://arxiv.org/abs/2509.17314 https://

Clotho: Measuring Task-Specific Pre-Generation Test Adequacy for LLM Inputs
Software increasingly relies on the emergent capabilities of Large Language Models (LLMs), from natural language understanding to program analysis and generation. Yet testing them on specific tasks remains difficult and costly: many prompts lack ground truth, forcing reliance on human judgment, while existing uncertainty and adequacy measures typically require full inference. A key challenge is to assess input adequacy in a way that reflects the demands of the task, ideally before even generati…

@arXiv_csCL_bot@mastoxiv.page
2025-08-26 12:13:26

From BERT to LLMs: Comparing and Understanding Chinese Classifier Prediction in Language Models
ZiqiZhang, Jianfei Ma, Emmanuele Chersoni, Jieshun You, Zhaoxin Feng
https://arxiv.org/abs/2508.18253

From BERT to LLMs: Comparing and Understanding Chinese Classifier Prediction in Language Models
Classifiers are an important and defining feature of the Chinese language, and their correct prediction is key to numerous educational applications. Yet, whether the most popular Large Language Models (LLMs) possess proper knowledge the Chinese classifiers is an issue that has largely remain unexplored in the Natural Language Processing (NLP) literature. To address such a question, we employ various masking strategies to evaluate the LLMs' intrinsic ability, the contribution of different sent…

@arXiv_csAI_bot@mastoxiv.page
2025-08-21 07:31:19

Automated Optimization Modeling through Expert-Guided Large Language Model Reasoning
Beinuo Yang, Qishen Zhou, Junyi Li, Xingchen Su, Simon Hu
https://arxiv.org/abs/2508.14410 h…

Automated Optimization Modeling through Expert-Guided Large Language Model Reasoning
Optimization Modeling (OM) is essential for solving complex decision-making problems. However, the process remains time-consuming and error-prone, heavily relying on domain experts. While Large Language Models (LLMs) show promise in addressing these challenges through their natural language understanding and reasoning capabilities, current approaches face three critical limitations: high benchmark labeling error rates reaching up to 42\%, narrow evaluation scope that only considers optimal valu…

@arXiv_csCL_bot@mastoxiv.page
2025-09-05 10:07:31

SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment
Yuqing Huang, Rongyang Zhang, Qimeng Wang, Chengqiang Lu, Yan Gao, Yi Wu, Yao Hu, Xuyang Zhi, Guiquan Liu, Xin Li, Hao Wang, Enhong Chen
https://arxiv.org/abs/2509.03934

SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment
Recent advancements in large language models (LLMs) have revolutionized natural language processing through their remarkable capabilities in understanding and executing diverse tasks. While supervised fine-tuning, particularly in Retrieval-Augmented Generation (RAG) scenarios, effectively enhances task-specific performance, it often leads to catastrophic forgetting, where models lose their previously acquired knowledge and general capabilities. Existing solutions either require access to genera…

@arXiv_csRO_bot@mastoxiv.page
2025-09-18 10:13:21

Language Conditioning Improves Accuracy of Aircraft Goal Prediction in Untowered Airspace
Sundhar Vinodh Sangeetha, Chih-Yuan Chiu, Sarah H. Q. Li, Shreyas Kousik
https://arxiv.org/abs/2509.14063

Language Conditioning Improves Accuracy of Aircraft Goal Prediction in Untowered Airspace
Autonomous aircraft must safely operate in untowered airspace, where coordination relies on voice-based communication among human pilots. Safe operation requires an aircraft to predict the intent, and corresponding goal location, of other aircraft. This paper introduces a multimodal framework for aircraft goal prediction that integrates natural language understanding with spatial reasoning to improve autonomous decision-making in such environments. We leverage automatic speech recognition and l…

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:50:00

GTool: Graph Enhanced Tool Planning with Large Language Model
Wenjie Chen, Wenbin Li, Di Yao, Xuying Meng, Chang Gong, Jingping Bi
https://arxiv.org/abs/2508.12725 https://

GTool: Graph Enhanced Tool Planning with Large Language Model
Tool planning with large language models (LLMs), referring to selecting, organizing, and preparing the tools necessary to complete a user request, bridges the gap between natural language understanding and task execution. However, current works treat different tools as isolated components and fail to leverage the inherent dependencies of tools, leading to invalid planning results. Since tool dependencies are often incomplete, it becomes challenging for LLMs to accurately identify the appropriat…

@arXiv_csHC_bot@mastoxiv.page
2025-09-16 07:41:46

Vibe Coding for UX Design: Understanding UX Professionals' Perceptions of AI-Assisted Design and Development
Jie Li, Youyang Hou, Laura Lin, Ruihao Zhu, Hancheng Cao, Abdallah El Ali
https://arxiv.org/abs/2509.10652

Vibe Coding for UX Design: Understanding UX Professionals' Perceptions of AI-Assisted Design and Development
Generative AI is reshaping UX design practices through "vibe coding," where UX professionals express intent in natural language and AI translates it into functional prototypes and code. Despite rapid adoption, little research has examined how vibe coding reconfigures UX workflows and collaboration. Drawing on interviews with 20 UX professionals across enterprises, startups, and academia, we show how vibe coding follows a four-stage workflow of ideation, AI generation, debugging, and review. Thi…

@arXiv_csCL_bot@mastoxiv.page
2025-08-26 12:11:26

Better Language Model-Based Judging Reward Modeling through Scaling Comprehension Boundaries
Meiling Ning, Zhongbao Zhang, Junda Ye, Jiabao Guo, Qingyuan Guan
https://arxiv.org/abs/2508.18212

Better Language Model-Based Judging Reward Modeling through Scaling Comprehension Boundaries
The emergence of LM-based judging reward modeling, represented by generative reward models, has successfully made reinforcement learning from AI feedback (RLAIF) efficient and scalable. To further advance this paradigm, we propose a core insight: this form of reward modeling shares fundamental formal consistency with natural language inference (NLI), a core task in natural language understanding. This reframed perspective points to a key path for building superior reward models: scaling the mod…

@arXiv_csCL_bot@mastoxiv.page
2025-07-25 10:06:32

AraTable: Benchmarking LLMs' Reasoning and Understanding of Arabic Tabular Data
Rana Alshaikh, Israa Alghanmi, Shelan Jeawak
https://arxiv.org/abs/2507.18442 https://…

AraTable: Benchmarking LLMs' Reasoning and Understanding of Arabic Tabular Data
The cognitive and reasoning abilities of large language models (LLMs) have enabled remarkable progress in natural language processing. However, their performance in interpreting structured data, especially in tabular formats, remains limited. Although benchmarks for English tabular data are widely available, Arabic is still underrepresented because of the limited availability of public resources and its unique language features. To address this gap, we present AraTable, a novel and comprehensiv…

@arXiv_csCL_bot@mastoxiv.page
2025-08-27 10:22:03

Retrieval-Augmented Generation for Natural Language Art Provenance Searches in the Getty Provenance Index
Mathew Henrickson
https://arxiv.org/abs/2508.19093 https://

Retrieval-Augmented Generation for Natural Language Art Provenance Searches in the Getty Provenance Index
This research presents a Retrieval-Augmented Generation (RAG) framework for art provenance studies, focusing on the Getty Provenance Index. Provenance research establishes the ownership history of artworks, which is essential for verifying authenticity, supporting restitution and legal claims, and understanding the cultural and historical context of art objects. The process is complicated by fragmented, multilingual archival data that hinders efficient retrieval. Current search portals require …

@arXiv_csCV_bot@mastoxiv.page
2025-09-17 11:00:10

ResidualViT for Efficient Temporally Dense Video Encoding
Mattia Soldan, Fabian Caba Heilbron, Bernard Ghanem, Josef Sivic, Bryan Russell
https://arxiv.org/abs/2509.13255 https:…

ResidualViT for Efficient Temporally Dense Video Encoding
Several video understanding tasks, such as natural language temporal video grounding, temporal activity localization, and audio description generation, require "temporally dense" reasoning over frames sampled at high temporal resolution. However, computing frame-level features for these tasks is computationally expensive given the temporal resolution requirements. In this paper, we make three contributions to reduce the cost of computing features for temporally dense tasks. First, we introduce …

@arXiv_csSE_bot@mastoxiv.page
2025-07-17 09:25:30

MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks
Artem Chervyakov, Alexander Kharitonov, Pavel Zadorozhny, Adamenko Pavel, Rodion Levichev, Dmitrii Vorobev, Dmitrii Salikhov, Aidar Valeev, Alena Pestova, Maria Dziuba, Ilseyar Alimova, Artem Zavgorodnev, Aleksandr Medvedev, Stanislav Moiseev, Elena Bruches, Daniil Grebenkin, Roman Derunets, Vikulov Vladimir, Anton Emelyanov, Dmitrii Babaev, Vladimir V. Ivanov, Valentin Malykh, Alena Fenogenova

MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks
Advancements in LLMs have enhanced task automation in software engineering; however, current evaluations primarily focus on natural language tasks, overlooking code quality. Most benchmarks prioritize high-level reasoning over executable code and real-world performance, leaving gaps in understanding true capabilities and risks associated with these models in production. To address this issue, we propose MERA Code, a new addition to the MERA benchmark family, specifically focused on evaluating c…

@arXiv_csCL_bot@mastoxiv.page
2025-09-19 10:36:21

Can maiBERT Speak for Maithili?
Sumit Yadav, Raju Kumar Yadav, Utsav Maskey, Gautam Siddharth Kashyap Md Azizul Hoque, Ganesh Gautam
https://arxiv.org/abs/2509.15048 https://

Can maiBERT Speak for Maithili?
Natural Language Understanding (NLU) for low-resource languages remains a major challenge in NLP due to the scarcity of high-quality data and language-specific models. Maithili, despite being spoken by millions, lacks adequate computational resources, limiting its inclusion in digital and AI-driven applications. To address this gap, we introducemaiBERT, a BERT-based language model pre-trained specifically for Maithili using the Masked Language Modeling (MLM) technique. Our model is trained on a…

@arXiv_csCV_bot@mastoxiv.page
2025-09-19 10:31:41

Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation
Xiaoyu Yue, Zidong Wang, Yuqing Wang, Wenlong Zhang, Xihui Liu, Wanli Ouyang, Lei Bai, Luping Zhou
https://arxiv.org/abs/2509.15185

Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation
Recent studies have demonstrated the importance of high-quality visual representations in image generation and have highlighted the limitations of generative models in image understanding. As a generative paradigm originally designed for natural language, autoregressive models face similar challenges. In this work, we present the first systematic investigation into the mechanisms of applying the next-token prediction paradigm to the visual domain. We identify three key properties that hinder th…

@arXiv_csCL_bot@mastoxiv.page
2025-08-18 09:17:30

MobQA: A Benchmark Dataset for Semantic Understanding of Human Mobility Data through Question Answering
Hikaru Asano, Hiroki Ouchi, Akira Kasuga, Ryo Yonetani
https://arxiv.org/abs/2508.11163

MobQA: A Benchmark Dataset for Semantic Understanding of Human Mobility Data through Question Answering
This paper presents MobQA, a benchmark dataset designed to evaluate the semantic understanding capabilities of large language models (LLMs) for human mobility data through natural language question answering. While existing models excel at predicting human movement patterns, it remains unobvious how much they can interpret the underlying reasons or semantic meaning of those patterns. MobQA provides a comprehensive evaluation framework for LLMs to answer questions about diverse human GPS traje…

@arXiv_csAI_bot@mastoxiv.page
2025-08-20 09:18:40

LOOP: A Plug-and-Play Neuro-Symbolic Framework for Enhancing Planning in Autonomous Systems
Ronit Virwani, Ruchika Suryawanshi
https://arxiv.org/abs/2508.13371 https://

LOOP: A Plug-and-Play Neuro-Symbolic Framework for Enhancing Planning in Autonomous Systems
Planning is one of the most critical tasks in autonomous systems, where even a small error can lead to major failures or million-dollar losses. Current state-of-the-art neural planning approaches struggle with complex domains, producing plans with missing preconditions, inconsistent goals, and hallucinations. While classical planners provide logical guarantees, they lack the flexibility and natural language understanding capabilities needed for modern autonomous systems. Existing neuro-symbolic…

@arXiv_csCL_bot@mastoxiv.page
2025-09-26 10:12:41

Who's Laughing Now? An Overview of Computational Humour Generation and Explanation
Tyler Loakman, William Thorne, Chenghua Lin
https://arxiv.org/abs/2509.21175 https://

Who's Laughing Now? An Overview of Computational Humour Generation and Explanation
The creation and perception of humour is a fundamental human trait, positioning its computational understanding as one of the most challenging tasks in natural language processing (NLP). As an abstract, creative, and frequently context-dependent construct, humour requires extensive reasoning to understand and create, making it a pertinent task for assessing the common-sense knowledge and reasoning abilities of modern large language models (LLMs). In this work, we survey the landscape of computa…

@arXiv_csCL_bot@mastoxiv.page
2025-08-22 09:55:41

TComQA: Extracting Temporal Commonsense from Text
Lekshmi R Nair, Arun Sankar, Koninika Pal
https://arxiv.org/abs/2508.15274 https://arxiv.org/pdf/2508.152…

TComQA: Extracting Temporal Commonsense from Text
Understanding events necessitates grasping their temporal context, which is often not explicitly stated in natural language. For example, it is not a trivial task for a machine to infer that a museum tour may last for a few hours, but can not take months. Recent studies indicate that even advanced large language models (LLMs) struggle in generating text that require reasoning with temporal commonsense due to its infrequent explicit mention in text. Therefore, automatically mining temporal commo…

@arXiv_csCL_bot@mastoxiv.page
2025-09-18 09:48:31

Implementing a Logical Inference System for Japanese Comparatives
Yosuke Mikami, Daiki Matsuoka, Hitomi Yanaka
https://arxiv.org/abs/2509.13734 https://arx…

Implementing a Logical Inference System for Japanese Comparatives
Natural Language Inference (NLI) involving comparatives is challenging because it requires understanding quantities and comparative relations expressed by sentences. While some approaches leverage Large Language Models (LLMs), we focus on logic-based approaches grounded in compositional semantics, which are promising for robust handling of numerical and logical expressions. Previous studies along these lines have proposed logical inference systems for English comparatives. However, it has been …

@arXiv_csCL_bot@mastoxiv.page
2025-09-19 10:28:41

LLM Agents at the Roundtable: A Multi-Perspective and Dialectical Reasoning Framework for Essay Scoring
Jinhee Jang, Ayoung Moon, Minkyoung Jung, YoungBin Kim. Seung Jin Lee
https://arxiv.org/abs/2509.14834

LLM Agents at the Roundtable: A Multi-Perspective and Dialectical Reasoning Framework for Essay Scoring
The emergence of large language models (LLMs) has brought a new paradigm to automated essay scoring (AES), a long-standing and practical application of natural language processing in education. However, achieving human-level multi-perspective understanding and judgment remains a challenge. In this work, we propose Roundtable Essay Scoring (RES), a multi-agent evaluation framework designed to perform precise and human-aligned scoring under a zero-shot setting. RES constructs evaluator agents bas…

@arXiv_csCL_bot@mastoxiv.page
2025-08-19 11:50:20

MuDRiC: Multi-Dialect Reasoning for Arabic Commonsense Validation
Kareem Elozeiri, Mervat Abassy, Preslav Nakov, Yuxia Wang
https://arxiv.org/abs/2508.13130 https://

MuDRiC: Multi-Dialect Reasoning for Arabic Commonsense Validation
Commonsense validation evaluates whether a sentence aligns with everyday human understanding, a critical capability for developing robust natural language understanding systems. While substantial progress has been made in English, the task remains underexplored in Arabic, particularly given its rich linguistic diversity. Existing Arabic resources have primarily focused on Modern Standard Arabic (MSA), leaving regional dialects underrepresented despite their prevalence in spoken contexts. To bri…

@arXiv_csCL_bot@mastoxiv.page
2025-07-17 10:10:50

Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate
Ana Davila, Jacinto Colan, Yasuhisa Hasegawa
https://arxiv.org/abs/2507.12370

Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate
Large Language Models (LLMs) have demonstrated significant capabilities in understanding and generating human language, contributing to more natural interactions with complex systems. However, they face challenges such as ambiguity in user requests processed by LLMs. To address these challenges, this paper introduces and evaluates a multi-agent debate framework designed to enhance detection and resolution capabilities beyond single models. The framework consists of three LLM architectures (Llam…

@arXiv_csCL_bot@mastoxiv.page
2025-07-17 08:34:00

ExpliCIT-QA: Explainable Code-Based Image Table Question Answering
Maximiliano Hormaz\'abal Lagos, \'Alvaro Bueno S\'aez, Pedro Alonso Doval, Jorge Alcalde Vesteiro, H\'ector Cerezo-Costas
https://arxiv.org/abs/2507.11694

ExpliCIT-QA: Explainable Code-Based Image Table Question Answering
We present ExpliCIT-QA, a system that extends our previous MRT approach for tabular question answering into a multimodal pipeline capable of handling complex table images and providing explainable answers. ExpliCIT-QA follows a modular design, consisting of: (1) Multimodal Table Understanding, which uses a Chain-of-Thought approach to extract and transform content from table images; (2) Language-based Reasoning, where a step-by-step explanation in natural language is generated to solve the prob…

Tootfinder

Opt-in global Mastodon full text search. Join the index!