Tootfinder

@arXiv_csDB_bot@mastoxiv.page
2025-08-12 09:10:23

Towards General-Purpose Data Discovery: A Programming Languages Approach
Andrew Kang, Yashnil Saha, Sainyam Galhotra
https://arxiv.org/abs/2508.08074 https://

Towards General-Purpose Data Discovery: A Programming Languages Approach
Efficient and effective data discovery is critical for many modern applications in machine learning and data science. One major bottleneck to the development of a general-purpose data discovery tool is the absence of an expressive formal language, and corresponding implementation, for characterizing and solving generic discovery queries. To this end, we present TQL, a domain-specific language for data discovery well-designed to leverage and exploit the results of programming languages research …

@arXiv_csPF_bot@mastoxiv.page
2025-08-13 08:03:22

Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving
Ferran Agullo, Joan Oliveras, Chen Wang, Alberto Gutierrez-Torre, Olivier Tardieu, Alaa Youssef, Jordi Torres, Josep Ll. Berral
https://arxiv.org/abs/2508.08343

Maximizing GPU Efficiency via Optimal Adapter Caching: An Analytical Approach for Multi-Tenant LLM Serving
Serving LLM adapters has gained significant attention as an effective approach to adapt general-purpose language models to diverse, task-specific use cases. However, serving a wide range of adapters introduces several and substantial overheads, leading to performance degradation and challenges in optimal placement. To address these challenges, we present an analytical, AI-driven pipeline that accurately determines the optimal allocation of adapters in single-node setups. This allocation maximiz…

@arXiv_csSE_bot@mastoxiv.page
2025-08-12 10:35:43

PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C
Pedro Orvalho, Marta Kwiatkowska
https://arxiv.org/abs/2508.08171 https://

PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C
Python has become the dominant language for general-purpose programming, yet it lacks robust tools for formal verification. In contrast, programmers working in languages such as C benefit from mature model checkers, for example CBMC, which enable exhaustive symbolic reasoning and fault localisation. The inherent complexity of Python, coupled with the verbosity and low-level nature of existing transpilers (e.g., Cython), have historically limited the applicability of formal verification to Pytho…

@arXiv_csRO_bot@mastoxiv.page
2025-06-09 08:41:22

Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning
Sheng Chen, Peiyu He, Jiaxin Hu, Ziyang Liu, Yansheng Wang, Tao Xu, Chi Zhang, Chongchong Zhang, Chao An, Shiyu Cai, Duo Cao, Kangping Chen, Shuai Chu, Tianwei Chu, Mingdi Dan, Min Du, Weiwei Fang, Pengyou Fu, Junkai Hu, Xiaowei Jiang, Zhaodi Jiang, Fuxuan Li, Jun Li, Minghui Li, Mingyao Li, Yanchang Li, Zhibin Li, Guangming Liu, Kairui Liu, Lihao Liu, Weizhi Liu, Xiaoshun Liu, Yufei Liu, Yunfei Liu, Qiang…

Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning
Modern robot navigation systems encounter difficulties in diverse and complex indoor environments. Traditional approaches rely on multiple modules with small models or rule-based systems and thus lack adaptability to new environments. To address this, we developed Astra, a comprehensive dual-model architecture, Astra-Global and Astra-Local, for mobile robot navigation. Astra-Global, a multimodal LLM, processes vision and language inputs to perform self and goal localization using a hybrid topol…

@simon_brooke@mastodon.scot
2025-08-08 08:07:09

I'm trying to write a general purpose Inspector UI object for #Clojure
so you have a function
`(inspect o)`
which, when evaluated, throws up a window showing in a sensible form the value of `o`.
Obviously, though, if `o` is lazy, you don't want the inspector to explore it all.
LazySeq implements an interface IPending, which isn't documented. Are all lazy …

clojure/src/jvm/clojure/lang/IPending.java at master · clojure/clojure
The Clojure programming language. Contribute to clojure/clojure development by creating an account on GitHub.

@arXiv_csCL_bot@mastoxiv.page
2025-06-10 19:06:51

This https://arxiv.org/abs/2506.06266 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Cartridges: Lightweight and general-purpose long context representations via self-study
Large language models are often used to answer queries grounded in large text corpora (e.g. codebases, legal documents, or chat histories) by placing the entire corpus in the context window and leveraging in-context learning (ICL). Although current models support contexts of 100K-1M tokens, this setup is costly to serve because the memory consumption of the KV cache scales with input length. We explore an alternative: training a smaller KV cache offline on each corpus. At inference time, we loa…

@arXiv_csAI_bot@mastoxiv.page
2025-07-01 11:38:33

AI Risk-Management Standards Profile for General-Purpose AI (GPAI) and Foundation Models
Anthony M. Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R. Murphy, Krystal Jackson, Deepika Raman
https://arxiv.org/abs/2506.23949

AI Risk-Management Standards Profile for General-Purpose AI (GPAI) and Foundation Models
Increasingly multi-purpose AI models, such as cutting-edge large language models or other 'general-purpose AI' (GPAI) models, 'foundation models,' generative AI models, and 'frontier models' (typically all referred to hereafter with the umbrella term 'GPAI/foundation models' except where greater specificity is needed), can provide many beneficial capabilities but also risks of adverse events with profound consequences. This document provides risk-management practices or controls for identifying…

@arXiv_csAR_bot@mastoxiv.page
2025-06-10 07:22:02

ProtocolLLM: RTL Benchmark for SystemVerilog Generation of Communication Protocols
Arnav Sheth, Ivaxi Sheth, Mario Fritz
https://arxiv.org/abs/2506.07945 h…

ProtocolLLM: RTL Benchmark for SystemVerilog Generation of Communication Protocols
Recent advances in Large Language Models (LLMs) have shown promising capabilities in generating code for general-purpose programming languages. In contrast, their applicability for hardware description languages, particularly for generating synthesizable and functionally correct designs, remains significantly underexplored. HDLs such as SystemVerilog are logic-oriented and demand strict adherence to timing semantics, concurrency, and synthesizability constraints. Moreover, HDL-based design flow…

@arXiv_physicsedph_bot@mastoxiv.page
2025-06-10 09:32:02

Teaching Astronomy with Large Language Models
Yuan-Sen Ting, Teaghan O'Briain
https://arxiv.org/abs/2506.06921 https://arxiv.org/…

Teaching Astronomy with Large Language Models
We present a study of LLM integration in final-year undergraduate astronomy education, examining how students develop AI literacy through structured guidance and documentation requirements. We developed AstroTutor, a domain-specific astronomy tutoring system enhanced with curated arXiv content, and deployed it alongside general-purpose LLMs in the course. Students documented their AI usage through homework reflections and post-course surveys. We analyzed student evolution in AI interaction stra…

@arXiv_eessAS_bot@mastoxiv.page
2025-07-04 09:16:11

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Sung-Feng Huang, Chih-Kai Yang, Chee-En Yu, Chun-Wei Chen, Wei-Chih Chen, Chien-yu Huang, Yi-Cheng Lin, Yu-Xiang Lin, Chi-An Fu, Chun-Yi Kuan, Wenze Ren, Xuanjun Chen, Wei-Ping Huang, En-Pei Hu, Tzu-Quan Lin, Yuan-Kuei Wu, Kuan-Po Huang, Hsiao-Ying Huang, Huang-Cheng Chou, Kai-Wei Chang, Cheng-Han Chiang, Boris Ginsburg, Yu…

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
We introduce DeSTA2.5-Audio, a general-purpose Large Audio Language Model (LALM) designed for robust auditory perception and instruction-following, without requiring task-specific audio instruction-tuning. Recent LALMs typically augment Large Language Models (LLMs) with auditory capabilities by training on large-scale, manually curated or LLM-synthesized audio-instruction datasets. However, these approaches have often suffered from the catastrophic forgetting of the LLM's original language abil…

@arXiv_csCY_bot@mastoxiv.page
2025-06-03 07:20:20

SafeCOMM: What about Safety Alignment in Fine-Tuned Telecom Large Language Models?
Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche, Walid Saad
https://arxiv.org/abs/2506.00062

SafeCOMM: What about Safety Alignment in Fine-Tuned Telecom Large Language Models?
Fine-tuning large language models (LLMs) for telecom tasks and datasets is a common practice to adapt general-purpose models to the telecom domain. However, little attention has been paid to how this process may compromise model safety. Recent research has shown that even benign fine-tuning can degrade the safety alignment of LLMs, causing them to respond to harmful or unethical user queries. In this paper, we investigate this issue for telecom-tuned LLMs using three representative datasets fea…

@arXiv_csCV_bot@mastoxiv.page
2025-07-03 10:31:10

Kwai Keye-VL Technical Report
Kwai Keye Team, Biao Yang, Bin Wen, Changyi Liu, Chenglong Chu, Chengru Song, Chongling Rao, Chuan Yi, Da Li, Dunju Zang, Fan Yang, Guorui Zhou, Hao Peng, Haojie Ding, Jiaming Huang, Jiangxia Cao, Jiankang Chen, Jingyun Hua, Jin Ouyang, Kaibing Chen, Kaiyu Jiang, Kaiyu Tang, Kun Gai, Shengnan Zhang, Siyang Mao, Sui Huang, Tianke Zhang, Tingting Gao, Wei Chen, Wei Yuan, Xiangyu Wu, Xiao Hu, Xingyu Lu, Yang Zhou, Yi-Fan Zhang, Yiping Yang, Yulong Chen, Zhenh…

Kwai Keye-VL Technical Report
While Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities on static images, they often fall short in comprehending dynamic, information-dense short-form videos, a dominant medium in today's digital landscape. To bridge this gap, we introduce \textbf{Kwai Keye-VL}, an 8-billion-parameter multimodal foundation model engineered for leading-edge performance in short-video understanding while maintaining robust general-purpose vision-language abilities. The development of Ke…

@arXiv_qbioMN_bot@mastoxiv.page
2025-07-08 09:49:50

Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models
Pranta Saha, Joyce Reimer, Brook Byrns, Connor Burbridge, Neeraj Dhar, Jeffrey Chen, Steven Rayan, Gordon Broderick
https://arxiv.org/abs/2507.04432

Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models
The use of generative artificial intelligence (AI) models is becoming ubiquitous in many fields. Though progress continues to be made, general purpose large language AI models (LLM) show a tendency to deliver creative answers, often called "hallucinations", which have slowed their application in the medical and biomedical fields where accuracy is paramount. We propose that the design and use of much smaller, domain and even task-specific LM may be a more rational and appropriate use of this tec…

@arXiv_csCL_bot@mastoxiv.page
2025-07-29 11:43:01

On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
Meishan Zhang, Xin Zhang, Xinping Zhao, Shouzheng Huang, Baotian Hu, Min Zhang
https://arxiv.org/abs/2507.20783

On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
Text embeddings have attracted growing interest due to their effectiveness across a wide range of natural language processing (NLP) tasks, such as retrieval, classification, clustering, bitext mining, and summarization. With the emergence of pretrained language models (PLMs), general-purpose text embeddings (GPTE) have gained significant traction for their ability to produce rich, transferable representations. The general architecture of GPTE typically leverages PLMs to derive dense text repres…

@arXiv_csRO_bot@mastoxiv.page
2025-06-10 17:26:00

This https://arxiv.org/abs/2505.21652 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…

PartInstruct: Part-level Instruction Following for Fine-grained Robot Manipulation
Fine-grained robot manipulation, such as lifting and rotating a bottle to display the label on the cap, requires robust reasoning about object parts and their relationships with intended tasks. Despite recent advances in training general-purpose robot manipulation policies guided by language instructions, there is a notable lack of large-scale datasets for fine-grained manipulation tasks with part-level instructions and diverse 3D object instances annotated with part-level labels. In this work,…

@arXiv_csCR_bot@mastoxiv.page
2025-07-31 09:19:41

SAEL: Leveraging Large Language Models with Adaptive Mixture-of-Experts for Smart Contract Vulnerability Detection
Lei Yu, Shiqi Cheng, Zhirong Huang, Jingyuan Zhang, Chenjie Shen, Junyi Lu, Li Yang, Fengjun Zhang, Jiajia Ma
https://arxiv.org/abs/2507.22371

SAEL: Leveraging Large Language Models with Adaptive Mixture-of-Experts for Smart Contract Vulnerability Detection
With the increasing security issues in blockchain, smart contract vulnerability detection has become a research focus. Existing vulnerability detection methods have their limitations: 1) Static analysis methods struggle with complex scenarios. 2) Methods based on specialized pre-trained models perform well on specific datasets but have limited generalization capabilities. In contrast, general-purpose Large Language Models (LLMs) demonstrate impressive ability in adapting to new vulnerability pa…

@arXiv_csIR_bot@mastoxiv.page
2025-06-03 16:10:12

This https://arxiv.org/abs/2410.21801 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…

PerSRV: Personalized Sticker Retrieval with Vision-Language Model
Instant Messaging is a popular means for daily communication, allowing users to send text and stickers. As the saying goes, "a picture is worth a thousand words", so developing an effective sticker retrieval technique is crucial for enhancing user experience. However, existing sticker retrieval methods rely on labeled data to interpret stickers, and general-purpose Vision-Language Models (VLMs) often struggle to capture the unique semantics of stickers. Additionally, relevant-based sticker retr…

@arXiv_csPL_bot@mastoxiv.page
2025-05-29 07:20:53

An instance of FreeCHR with refined operational semantics
Sascha Rechenberger, Thom Fr\"uhwirth
https://arxiv.org/abs/2505.22155 https://

An instance of FreeCHR with refined operational semantics
Constraint Handling Rules (CHR) is a rule-based programming language which is typically embedded into a general-purpose language. There exists a plethora of implementations of CHR for numerous host languages. However, the existing implementations often reinvent the way to embed CHR, which impedes maintenance and weakens assertions of correctness. To formalize and thereby unify the embedding of CHR into arbitrary host languages, we introduced the framework FreeCHR and proved it to be a valid rep…

@arXiv_physicschemph_bot@mastoxiv.page
2025-07-03 09:07:50

A Large Language Model for Chemistry and Retrosynthesis Predictions
Yueqing Zhang, Wentao Liu, Yan Zhang, Danyang Xiong, Jihang Zhai, Hao Hao, YuCheng Gu, HaiBo Yang, Shuanhu Gao, Lianrui Hu, Aimin Zhou, Xiao He
https://arxiv.org/abs/2507.01444

A Large Language Model for Chemistry and Retrosynthesis Predictions
Large language models (LLM) have achieved impressive progress across a broad range of general-purpose tasks, but their effectiveness in chemistry remains limited due to scarce domain-specific datasets and the demand for precise symbolic and structural reasoning. Here we introduce ECNU-ChemGPT(name after East China Normal University), a chemistry-specialized LLM engineered for deep chemical knowledge understanding and accurate retrosynthetic route planning. Our approach is distinguished by four …

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-07-31 09:44:21

aLLoyM: A large language model for alloy phase diagram prediction
Yuna Oikawa, Guillaume Deffrennes, Taichi Abe, Ryo Tamura, Koji Tsuda
https://arxiv.org/abs/2507.22558 https://…

aLLoyM: A large language model for alloy phase diagram prediction
Large Language Models (LLMs) are general-purpose tools with wide-ranging applications, including in materials science. In this work, we introduce aLLoyM, a fine-tuned LLM specifically trained on alloy compositions, temperatures, and their corresponding phase information. To develop aLLoyM, we curated question-and-answer (Q&A) pairs for binary and ternary phase diagrams using the open-source Computational Phase Diagram Database (CPDDB) and assessments based on CALPHAD (CALculation of PHAse Diagr…

@arXiv_csNI_bot@mastoxiv.page
2025-06-30 08:54:30

Concept-Level AI for Telecom: Moving Beyond Large Language Models
Viswanath Kumarskandpriya, Abdulhalim Dandoush, Abbas Bradai, Ali Belgacem
https://arxiv.org/abs/2506.22359

Concept-Level AI for Telecom: Moving Beyond Large Language Models
The telecommunications and networking domain stands at the precipice of a transformative era, driven by the necessity to manage increasingly complex, hierarchical, multi administrative domains (i.e., several operators on the same path) and multilingual systems. Recent research has demonstrated that Large Language Models (LLMs), with their exceptional general-purpose text analysis and code generation capabilities, can be effectively applied to certain telecom problems (e.g., auto-configuration o…

@cdarwin@c.im
2025-07-22 03:50:37

For the first time AI systems crossed the gold-medal scoring threshold at the International Mathematical Olympiad for high-school students.
Both Google and OpenAI's models solved five out of six problems,
-- achieving the result using general-purpose “reasoning” models that processed mathematical concepts using natural language, in contrast to the previous approaches used by AI firms.
OpenAI’s breakthrough was achieved with a new experimental model centered on massively …

@arXiv_csCL_bot@mastoxiv.page
2025-08-06 09:59:50

RooseBERT: A New Deal For Political Language Modelling
Deborah Dore, Elena Cabrio, Serena Villata
https://arxiv.org/abs/2508.03250 https://arxiv.org/pdf/25…

RooseBERT: A New Deal For Political Language Modelling
The increasing amount of political debates and politics-related discussions calls for the definition of novel computational methods to automatically analyse such content with the final goal of lightening up political deliberation to citizens. However, the specificity of the political language and the argumentative form of these debates (employing hidden communication strategies and leveraging implicit arguments) make this task very challenging, even for current general-purpose pre-trained Langu…

@arXiv_eessIV_bot@mastoxiv.page
2025-06-24 09:54:09

Taming Vision-Language Models for Medical Image Analysis: A Comprehensive Review
Haoneng Lin, Cheng Xu, Jing Qin
https://arxiv.org/abs/2506.18378 https://

Taming Vision-Language Models for Medical Image Analysis: A Comprehensive Review
Modern Vision-Language Models (VLMs) exhibit unprecedented capabilities in cross-modal semantic understanding between visual and textual modalities. Given the intrinsic need for multi-modal integration in clinical applications, VLMs have emerged as a promising solution for a wide range of medical image analysis tasks. However, adapting general-purpose VLMs to medical domain poses numerous challenges, such as large domain gaps, complicated pathological variations, and diversity and uniqueness of…

@arXiv_csSD_bot@mastoxiv.page
2025-07-21 08:49:00

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe
https://arxiv.org/abs/2507.14129

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
Masked token prediction has emerged as a powerful pre-training objective across language, vision, and speech, offering the potential to unify these diverse modalities through a single pre-training task. However, its application for general audio understanding remains underexplored, with BEATs being the only notable example. BEATs has seen limited modifications due to the absence of open-source pre-training code. Furthermore, BEATs was trained only on AudioSet, restricting its broader downstream…

@teledyn@mstdn.ca
2025-07-24 19:03:26

This just occured to me (too much sun and gin lemonade could be a factor): English is a funny language and when they say Artificial they mean Automated, and when they say Intelligence they don't mean smarts, they mean covertly gathering intel from prospective enemies!
Hence #ArtificialIntelligence, often promoted to General.
The purpose of any system is what it does, not what it consistently fails to do.

@arXiv_csAI_bot@mastoxiv.page
2025-06-03 18:04:31

This https://arxiv.org/abs/2505.07453 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…

How well do LLMs reason over tabular data, really?
Large Language Models (LLMs) excel in natural language tasks, but less is known about their reasoning capabilities over tabular data. Prior analyses devise evaluation strategies that poorly reflect an LLM's realistic performance on tabular queries. Moreover, we have a limited understanding of the robustness of LLMs towards realistic variations in tabular inputs. Therefore, we ask: Can general-purpose LLMs reason over tabular data, really?, and focus on two questions 1) are tabular reasoning cap…

@arXiv_csDB_bot@mastoxiv.page
2025-07-08 09:58:31

The Case for Instance-Optimized LLMs in OLAP Databases
Bardia Mohammadi, Laurent Bindschaedler
https://arxiv.org/abs/2507.04967 https://

The Case for Instance-Optimized LLMs in OLAP Databases
Large Language Models (LLMs) can enhance analytics systems with powerful data summarization, cleaning, and semantic transformation capabilities. However, deploying LLMs at scale -- processing millions to billions of rows -- remains prohibitively expensive in computation and memory. We present IOLM-DB, a novel system that makes LLM-enhanced database queries practical through query-specific model optimization. Instead of using general-purpose LLMs, IOLM-DB generates lightweight, specialized model…

@arXiv_csRO_bot@mastoxiv.page
2025-07-09 07:36:02

A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation
TRI LBM Team, Jose Barreiros, Andrew Beaulieu, Aditya Bhat, Rick Cory, Eric Cousineau, Hongkai Dai, Ching-Hsin Fang, Kunimatsu Hashimoto, Muhammad Zubair Irshad, Masha Itkina, Naveen Kuppuswamy, Kuan-Hui Lee, Katherine Liu, Dale McConachie, Ian McMahon, Haruki Nishimura, Calder Phillips-Grafflin, Charles Richter, Paarth Shah, Krishnan Srinivasan, Blake Wulfe, Chen Xu, Mengchao Zhang, Alex Alspach, Maya …

A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation
Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. While these models have garnered significant enthusiasm and investment, meaningful evaluation of real-world performance remains a …

@arXiv_csSE_bot@mastoxiv.page
2025-07-23 08:03:52

AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?
Ori Press, Brandon Amos, Haoyu Zhao, Yikai Wu, Samuel K. Ainsworth, Dominik Krupke, Patrick Kidger, Touqir Sajed, Bartolomeo Stellato, Jisun Park, Nathanael Bosch, Eli Meril, Albert Steppi, Arman Zharmagambetov, Fangzhao Zhang, David Perez-Pineiro, Alberto Mercurio, Ni Zhan, Talor Abramovich, Kilian Lieret, Hanlin Zhang, Shirley Huang, Matthias Bethge, Ofir Press

AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?
Despite progress in language model (LM) capabilities, evaluations have thus far focused on models' performance on tasks that humans have previously solved, including in programming (Jimenez et al., 2024) and mathematics (Glazer et al., 2024). We therefore propose testing models' ability to design and implement algorithms in an open-ended benchmark: We task LMs with writing code that efficiently solves computationally challenging problems in computer science, physics, and mathematics. Our AlgoTu…

@arXiv_csCL_bot@mastoxiv.page
2025-08-07 10:29:04

Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis
Anushka Yadav, Isha Nalawade, Srujana Pillarichety, Yashwanth Babu, Reshmi Ghosh, Samyadeep Basu, Wenlong Zhao, Ali Nasaeh, Sriram Balasubramanian, Soundararajan Srinivasan
https://arxiv.org/abs/2508.04699

Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis
The emergence of reasoning models and their integration into practical AI chat bots has led to breakthroughs in solving advanced math, deep search, and extractive question answering problems that requires a complex and multi-step thought process. Yet, a complete understanding of why these models hallucinate more than general purpose language models is missing. In this investigative study, we systematicallyexplore reasoning failures of contemporary language models on multi-hop question answering…

@arXiv_csAI_bot@mastoxiv.page
2025-07-04 07:31:41

Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs
Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates
https://

Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs
Large language models (LLMs) have rapidly progressed into general-purpose agents capable of solving a broad spectrum of tasks. However, current models remain inefficient at reasoning: they apply fixed inference-time compute regardless of task complexity, often overthinking simple problems while underthinking hard ones. This survey presents a comprehensive review of efficient test-time compute (TTC) strategies, which aim to improve the computational efficiency of LLM reasoning. We introduce a tw…

@arXiv_csRO_bot@mastoxiv.page
2025-08-07 09:23:04

Open Scene Graphs for Open-World Object-Goal Navigation
Joel Loo, Zhanxin Wu, David Hsu
https://arxiv.org/abs/2508.04678 https://arxiv.org/pdf/2508.04678…

Open Scene Graphs for Open-World Object-Goal Navigation
How can we build general-purpose robot systems for open-world semantic navigation, e.g., searching a novel environment for a target object specified in natural language? To tackle this challenge, we introduce OSG Navigator, a modular system composed of foundation models, for open-world Object-Goal Navigation (ObjectNav). Foundation models provide enormous semantic knowledge about the world, but struggle to organise and maintain spatial information effectively at scale. Key to OSG Navigator is t…

@arXiv_csCY_bot@mastoxiv.page
2025-07-16 08:27:11

Exploring User Security and Privacy Attitudes and Concerns Toward the Use of General-Purpose LLM Chatbots for Mental Health
Jabari Kwesi, Jiaxun Cao, Riya Manchanda, Pardis Emami-Naeini
https://arxiv.org/abs/2507.10695

Exploring User Security and Privacy Attitudes and Concerns Toward the Use of General-Purpose LLM Chatbots for Mental Health
Individuals are increasingly relying on large language model (LLM)-enabled conversational agents for emotional support. While prior research has examined privacy and security issues in chatbots specifically designed for mental health purposes, these chatbots are overwhelmingly "rule-based" offerings that do not leverage generative AI. Little empirical research currently measures users' privacy and security concerns, attitudes, and expectations when using general-purpose LLM-enabled chatbots to …

@arXiv_eessAS_bot@mastoxiv.page
2025-07-08 10:02:40

Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study
Qiquan Zhang, Moran Chen, Zeyang Song, Hexin Liu, Xiangyu Zhang, Haizhou Li
https://arxiv.org/abs/2507.04368

Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study
Advanced long-context modeling backbone networks, such as Transformer, Conformer, and Mamba, have demonstrated state-of-the-art performance in speech enhancement. However, a systematic and comprehensive comparative study of these backbones within a unified speech enhancement framework remains lacking. In addition, xLSTM, a more recent and efficient variant of LSTM, has shown promising results in language modeling and as a general-purpose vision backbone. In this paper, we investigate the capabi…

@arXiv_csPL_bot@mastoxiv.page
2025-05-28 07:21:02

Thread and Memory-Safe Programming with CLASS
Lu\'is Caires (Instituto Superior T\'ecnico)
https://arxiv.org/abs/2505.20848 https://

Thread and Memory-Safe Programming with CLASS
CLASS is a proof-of-concept general purpose linear programming language, flexibly supporting realistic concurrent programming idioms, and featuring an expressive linear type system ensuring that programs (1) never misuse or leak stateful resources or memory, (2) never deadlock, and (3) always terminate. The design of CLASS and the strong static guarantees of its type system originates in its Linear Logic and proposition-as-types foundations. However, instead of focusing on its theoretical found…

@arXiv_csIR_bot@mastoxiv.page
2025-06-24 11:00:20

Context-Aware Scientific Knowledge Extraction on Linked Open Data using Large Language Models
Sajratul Y. Rubaiat, Hasan M. Jamil
https://arxiv.org/abs/2506.17580

Context-Aware Scientific Knowledge Extraction on Linked Open Data using Large Language Models
The exponential growth of scientific literature challenges researchers extracting and synthesizing knowledge. Traditional search engines return many sources without direct, detailed answers, while general-purpose LLMs may offer concise responses that lack depth or omit current information. LLMs with search capabilities are also limited by context window, yielding short, incomplete answers. This paper introduces WISE (Workflow for Intelligent Scientific Knowledge Extraction), a system addressing…

@arXiv_csRO_bot@mastoxiv.page
2025-06-24 09:26:50

General-Purpose Robotic Navigation via LVLM-Orchestrated Perception, Reasoning, and Acting
Bernard Lange, Anil Yildiz, Mansur Arief, Shehryar Khattak, Mykel Kochenderfer, Georgios Georgakis
https://arxiv.org/abs/2506.17462

General-Purpose Robotic Navigation via LVLM-Orchestrated Perception, Reasoning, and Acting
Developing general-purpose navigation policies for unknown environments remains a core challenge in robotics. Most existing systems rely on task-specific neural networks and fixed data flows, limiting generalizability. Large Vision-Language Models (LVLMs) offer a promising alternative by embedding human-like knowledge suitable for reasoning and planning. Yet, prior LVLM-robot integrations typically depend on pre-mapped spaces, hard-coded representations, and myopic exploration. We introduce the…

@arXiv_csCR_bot@mastoxiv.page
2025-06-24 11:48:30

Shrinking the Generation-Verification Gap with Weak Verifiers
Jon Saad-Falcon, E. Kelly Buchanan, Mayee F. Chen, Tzu-Heng Huang, Brendan McLaughlin, Tanvir Bhathal, Shang Zhu, Ben Athiwaratkun, Frederic Sala, Scott Linderman, Azalia Mirhoseini, Christopher R\'e
https://arxiv.org/abs/2506.18203

Shrinking the Generation-Verification Gap with Weak Verifiers
Verifiers can improve language model capabilities by scoring and ranking responses from generated candidates. Currently, high-quality verifiers are either unscalable (e.g., humans) or limited in utility (e.g., tools like Lean). While LM judges and reward models have become broadly useful as general-purpose verifiers, a significant performance gap remains between them and oracle verifiers (verifiers with perfect accuracy). To help close this gap, we introduce Weaver, a framework for designing a …

@arXiv_eessIV_bot@mastoxiv.page
2025-06-24 08:20:49

Can Common VLMs Rival Medical VLMs? Evaluation and Strategic Insights
Yuan Zhong, Ruinan Jin, Xiaoxiao Li, Qi Dou
https://arxiv.org/abs/2506.17337 https://…

Can Common VLMs Rival Medical VLMs? Evaluation and Strategic Insights
Medical vision-language models (VLMs) leverage large-scale pretraining for diverse imaging tasks but require substantial computational and data resources. Meanwhile, common or general-purpose VLMs (e.g., CLIP, LLaVA), though not trained for medical use, show promise with fine-tuning. This raises a key question: Can efficient fine-tuned common VLMs rival generalist medical VLMs for solving specific medical imaging tasks? This study systematically evaluates common and medical VLMs across disease …

@arXiv_csSE_bot@mastoxiv.page
2025-07-24 08:36:09

Can LLMs Write CI? A Study on Automatic Generation of GitHub Actions Configurations
Taher A. Ghaleb, Dulina Rathnayake
https://arxiv.org/abs/2507.17165 htt…

Can LLMs Write CI? A Study on Automatic Generation of GitHub Actions Configurations
Continuous Integration (CI) services, such as GitHub Actions, require developers to write YAML-based configurations, which can be tedious and error-prone. Despite the increasing use of Large Language Models (LLMs) to automate software engineering tasks, their ability to generate CI configurations remains underexplored. This paper presents a preliminary study evaluating six LLMs for generating GitHub Actions configurations from natural language descriptions. We assess three general-purpose found…

@arXiv_csPL_bot@mastoxiv.page
2025-06-18 08:33:25

Optimized Execution of FreeCHR
Sascha Rechenberger, Thom Fr\"uhwirth
https://arxiv.org/abs/2506.14485 https://arxiv.org/pdf/2506…

Optimized Execution of FreeCHR
Constraint Handling Rules (CHR) is a rule-based programming language that rewrites collections of constraints. It is typically embedded into a general-purpose language. There exists a plethora of implementations for numerous host languages. However, the existing implementations often re-invent the method of embedding, which impedes maintenance and weakens assertions of correctness. To formalize and thereby unify the embedding of a ground subset of CHR into arbitrary host languages, we introduce…

@arXiv_csCL_bot@mastoxiv.page
2025-08-01 10:14:51

Text-to-SQL Task-oriented Dialogue Ontology Construction
Renato Vukovic, Carel van Niekerk, Michael Heck, Benjamin Ruppik, Hsien-Chin Lin, Shutong Feng, Nurul Lubis, Milica Gasic
https://arxiv.org/abs/2507.23358

Text-to-SQL Task-oriented Dialogue Ontology Construction
Large language models (LLMs) are widely used as general-purpose knowledge sources, but they rely on parametric knowledge, limiting explainability and trustworthiness. In task-oriented dialogue (TOD) systems, this separation is explicit, using an external database structured by an explicit ontology to ensure explainability and controllability. However, building such ontologies requires manual labels or supervised training. We introduce TeQoDO: a Text-to-SQL task-oriented Dialogue Ontology constr…

@arXiv_csAI_bot@mastoxiv.page
2025-06-24 11:56:30

Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training
Jonathan Cook, Silvia Sapora, Arash Ahmadian, Akbir Khan, Tim Rocktaschel, Jakob Foerster, Laura Ruis
https://arxiv.org/abs/2506.18777

Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training
Training large language models (LLMs) on source code significantly enhances their general-purpose reasoning abilities, but the mechanisms underlying this generalisation are poorly understood. In this paper, we propose Programming by Backprop (PBB) as a potential driver of this effect - teaching a model to evaluate a program for inputs by training on its source code alone, without ever seeing I/O examples. To explore this idea, we finetune LLMs on two sets of programs representing simple maths p…

@arXiv_csCL_bot@mastoxiv.page
2025-08-01 10:14:21

MUST-RAG: MUSical Text Question Answering with Retrieval Augmented Generation
Daeyong Kwon, SeungHeon Doh, Juhan Nam
https://arxiv.org/abs/2507.23334 https://

MUST-RAG: MUSical Text Question Answering with Retrieval Augmented Generation
Recent advancements in Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains. While they exhibit strong zero-shot performance on various tasks, LLMs' effectiveness in music-related applications remains limited due to the relatively small proportion of music-specific knowledge in their training data. To address this limitation, we propose MusT-RAG, a comprehensive framework based on Retrieval Augmented Generation (RAG) to adapt general-purpose LLMs for tex…

@arXiv_csSE_bot@mastoxiv.page
2025-07-22 10:56:00

Investigating the Role of LLMs Hyperparameter Tuning and Prompt Engineering to Support Domain Modeling
Vladyslav Bulhakov, Giordano d'Aloisio, Claudio Di Sipio, Antinisca Di Marco, Davide Di Ruscio
https://arxiv.org/abs/2507.14735

Investigating the Role of LLMs Hyperparameter Tuning and Prompt Engineering to Support Domain Modeling
The introduction of large language models (LLMs) has enhanced automation in software engineering tasks, including in Model Driven Engineering (MDE). However, using general-purpose LLMs for domain modeling has its limitations. One approach is to adopt fine-tuned models, but this requires significant computational resources and can lead to issues like catastrophic forgetting. This paper explores how hyperparameter tuning and prompt engineering can improve the accuracy of the Llama 3.1 model for…

@arXiv_csAI_bot@mastoxiv.page
2025-07-25 07:30:41

I2I-STRADA -- Information to Insights via Structured Reasoning Agent for Data Analysis
SaiBarath Sundar, Pranav Satheesan, Udayaadithya Avadhanam
https://arxiv.org/abs/2507.17874

I2I-STRADA -- Information to Insights via Structured Reasoning Agent for Data Analysis
Recent advances in agentic systems for data analysis have emphasized automation of insight generation through multi-agent frameworks, and orchestration layers. While these systems effectively manage tasks like query translation, data transformation, and visualization, they often overlook the structured reasoning process underlying analytical thinking. Reasoning large language models (LLMs) used for multi-step problem solving are trained as general-purpose problem solvers. As a result, their rea…

@arXiv_csCL_bot@mastoxiv.page
2025-07-29 08:40:31

MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Sara Papi, Maike Z\"ufle, Marco Gaido, Beatrice Savoldi, Danni Liu, Ioannis Douros, Luisa Bentivogli, Jan Niehues
https://arxiv.org/abs/2507.19634

MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Recent advances in large language models have catalyzed the development of multimodal LLMs (MLLMs) that integrate text, speech, and vision within unified frameworks. As MLLMs evolve from narrow, monolingual, task-specific systems to general-purpose instruction-following models, a key frontier lies in evaluating their multilingual and multimodal capabilities over both long and short contexts. However, existing benchmarks fall short in evaluating these dimensions jointly: they are often limited t…

@arXiv_csCL_bot@mastoxiv.page
2025-06-23 12:12:50

Fine-Tuning Lowers Safety and Disrupts Evaluation Consistency
Kathleen C. Fraser, Hillary Dawkins, Isar Nejadgholi, Svetlana Kiritchenko
https://arxiv.org/abs/2506.17209

Fine-Tuning Lowers Safety and Disrupts Evaluation Consistency
Fine-tuning a general-purpose large language model (LLM) for a specific domain or task has become a routine procedure for ordinary users. However, fine-tuning is known to remove the safety alignment features of the model, even when the fine-tuning data does not contain any harmful content. We consider this to be a critical failure mode of LLMs due to the widespread uptake of fine-tuning, combined with the benign nature of the "attack". Most well-intentioned developers are likely unaware that th…

@arXiv_csCL_bot@mastoxiv.page
2025-07-25 10:14:52

TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards
Andreea Nica, Ivan Zakazov, Nicolas Mario Baldwin, Saibo Geng, Robert West
https://arxiv.org/abs/2507.18618

TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards
Prompt optimization improves the reasoning abilities of large language models (LLMs) without requiring parameter updates to the target model. Following heuristic-based "Think step by step" approaches, the field has evolved in two main directions: while one group of methods uses textual feedback to elicit improved prompts from general-purpose LLMs in a training-free way, a concurrent line of research relies on numerical rewards to train a special prompt model, tailored for providing optimal prom…

@arXiv_csRO_bot@mastoxiv.page
2025-07-16 09:30:31

Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection
Huiyi Wang, Fahim Shahriar, Alireza Azimi, Gautham Vasan, Rupam Mahmood, Colin Bellinger
https://arxiv.org/abs/2507.10814

Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection
General-purpose robotic manipulation, including reach and grasp, is essential for deployment into households and workspaces involving diverse and evolving tasks. Recent advances propose using large pre-trained models, such as Large Language Models and object detectors, to boost robotic perception in reinforcement learning. These models, trained on large datasets via self-supervised learning, can process text prompts and identify diverse objects in scenes, an invaluable skill in RL where learnin…

@arXiv_csCL_bot@mastoxiv.page
2025-07-24 08:18:49

Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain
Rishemjit Kaur, Arshdeep Singh Bhankhar, Surangika Ranathunga, Jashanpreet Singh Salh, Sudhir Rajput, Vidhi, Kashish Mahendra, Bhavika Berwal, Ritesh Kumar
https://arxiv.org/abs/2507.16974…

Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain
Enabling farmers to access accurate agriculture-related information in their native languages in a timely manner is crucial for the success of the agriculture field. Although large language models (LLMs) can be used to implement Question Answering (QA) systems, simply using publicly available general-purpose LLMs in agriculture typically offer generic advisories, lacking precision in local and multilingual contexts due to insufficient domain-specific training and scarcity of high-quality, regio…

Tootfinder

Opt-in global Mastodon full text search. Join the index!