Tootfinder

@arXiv_csAI_bot@mastoxiv.page
2025-10-15 09:22:32

EmboMatrix: A Scalable Training-Ground for Embodied Decision-Making
Zixing Lei, Sheng Yin, Yichen Xiong, Yuanzhuo Ding, Wenhao Huang, Yuxi Wei, Qingyao Xu, Yiming Li, Weixin Li, Yunhong Wang, Siheng Chen
https://arxiv.org/abs/2510.12072

EmboMatrix: A Scalable Training-Ground for Embodied Decision-Making
Embodied decision-making enables agents to translate high-level goals into executable actions through continuous interactions within the physical world, forming a cornerstone of general-purpose embodied intelligence. Large language models (LLMs), with their general decision-making capabilities, offer a promising path to realize this potential; however, LLMs trained solely on language lack exposure to physical environments, limiting their true embodied understanding. To bridge this gap, we propo…

@arXiv_csRO_bot@mastoxiv.page
2025-10-15 10:10:51

Reflection-Based Task Adaptation for Self-Improving VLA
Baicheng Li, Dong Wu, Zike Yan, Xinchen Liu, Zecui Zeng, Lusong Li, Hongbin Zha
https://arxiv.org/abs/2510.12710 https://…

Reflection-Based Task Adaptation for Self-Improving VLA
Pre-trained Vision-Language-Action (VLA) models represent a major leap towards general-purpose robots, yet efficiently adapting them to novel, specific tasks in-situ remains a significant hurdle. While reinforcement learning (RL) is a promising avenue for such adaptation, the process often suffers from low efficiency, hindering rapid task mastery. We introduce Reflective Self-Adaptation, a framework for rapid, autonomous task adaptation without human intervention. Our framework establishes a se…

@arXiv_csIT_bot@mastoxiv.page
2025-10-15 07:42:01

FedLoDrop: Federated LoRA with Dropout for Generalized LLM Fine-tuning
Sijing Xie, Dingzhu Wen, Changsheng You, Qimei Chen, Mehdi Bennis, Kaibin Huang
https://arxiv.org/abs/2510.12078

FedLoDrop: Federated LoRA with Dropout for Generalized LLM Fine-tuning
Fine-tuning (FT) large language models (LLMs) is crucial for adapting general-purpose models to specific tasks, enhancing accuracy and relevance with minimal resources. To further enhance generalization ability while reducing training costs, this paper proposes Federated LoRA with Dropout (FedLoDrop), a new framework that applies dropout to the rows and columns of the trainable matrix in Federated LoRA. A generalization error bound and convergence analysis under sparsity regularization are obta…

@arXiv_csIR_bot@mastoxiv.page
2025-10-14 11:02:49

From Reasoning LLMs to BERT: A Two-Stage Distillation Framework for Search Relevance
Runze Xia, Yupeng Ji, Yuxi Zhou, Haodong Liu, Teng Zhang, Piji Li
https://arxiv.org/abs/2510.11056

From Reasoning LLMs to BERT: A Two-Stage Distillation Framework for Search Relevance
Query-service relevance prediction in e-commerce search systems faces strict latency requirements that prevent the direct application of Large Language Models (LLMs). To bridge this gap, we propose a two-stage reasoning distillation framework to transfer reasoning capabilities from a powerful teacher LLM to a lightweight, deployment-friendly student model. In the first stage, we address the limitations of general-purpose LLMs by constructing a domain-adapted teacher model. This is achieved thro…

@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:36:20

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Zixin Zhang, Kanghao Chen, Xingwang Lin, Lutao Jiang, Xu Zheng, Yuanhuiyi Lyu, Litao Guo, Yinchuan Li, Ying-Cong Chen
https://arxiv.org/abs/2510.09507

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
The ability to use, understand, and create tools is a hallmark of human intelligence, enabling sophisticated interaction with the physical world. For any general-purpose intelligent agent to achieve true versatility, it must also master these fundamental skills. While modern Multimodal Large Language Models (MLLMs) leverage their extensive common knowledge for high-level planning in embodied AI and in downstream Vision-Language-Action (VLA) models, the extent of their true understanding of phys…

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:32:50

Understanding the Effects of Domain Finetuning on LLMs
Eshaan Tanwar, Deepak Nathani, William Yang Wang, Tanmoy Chakraborty
https://arxiv.org/abs/2510.09359 https://

Understanding the Effects of Domain Finetuning on LLMs
Large Language Models (LLMs) fine-tuned for specific domains exhibit strong performance; however, the underlying mechanisms by which this fine-tuning reshapes their parametric space are not well understood. Prior works primarily focus on auto-regressive or general-purpose instruct models, leaving domain-specialised LLMs under-explored. We present the first systematic study of domain-specific fine-tuning in large medical language models. Our analysis reveals that fine-tuning modifies only a smal…

@arXiv_csPL_bot@mastoxiv.page
2025-09-26 08:18:01

Dual-Language General-Purpose Self-Hosted Visual Language and new Textual Programming Language for Applications
Mahmoud Samir Fayed
https://arxiv.org/abs/2509.20426 https://

Dual-Language General-Purpose Self-Hosted Visual Language and new Textual Programming Language for Applications
Most visual programming languages (VPLs) are domain-specific, with few general-purpose VPLs like Programming Without Coding Technology (PWCT). These general-purpose VPLs are developed using textual programming languages and improving them requires textual programming. In this thesis, we designed and developed PWCT2, a dual-language (Arabic/English), general-purpose, self-hosting visual programming language. Before doing so, we specifically designed a textual programming language called Ring for…

@arXiv_csRO_bot@mastoxiv.page
2025-10-10 09:43:19

IntentionVLA: Generalizable and Efficient Embodied Intention Reasoning for Human-Robot Interaction
Yandu Chen, Kefan Gu, Yuqing Wen, Yucheng Zhao, Tiancai Wang, Liqiang Nie
https://arxiv.org/abs/2510.07778

IntentionVLA: Generalizable and Efficient Embodied Intention Reasoning for Human-Robot Interaction
Vision-Language-Action (VLA) models leverage pretrained vision-language models (VLMs) to couple perception with robotic control, offering a promising path toward general-purpose embodied intelligence. However, current SOTA VLAs are primarily pretrained on multimodal tasks with limited relevance to embodied scenarios, and then finetuned to map explicit instructions to actions. Consequently, due to the lack of reasoning-intensive pretraining and reasoning-guided manipulation, these models are una…

@arXiv_csLG_bot@mastoxiv.page
2025-09-25 10:51:12

Video models are zero-shot learners and reasoners
Thadd\"aus Wiedemer, Yuxuan Li, Paul Vicol, Shixiang Shane Gu, Nick Matarese, Kevin Swersky, Been Kim, Priyank Jaini, Robert Geirhos
https://arxiv.org/abs/2509.20328

Video models are zero-shot learners and reasoners
The remarkable zero-shot capabilities of Large Language Models (LLMs) have propelled natural language processing from task-specific models to unified, generalist foundation models. This transformation emerged from simple primitives: large, generative models trained on web-scale data. Curiously, the same primitives apply to today's generative video models. Could video models be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language underst…

@arXiv_csSE_bot@mastoxiv.page
2025-09-30 09:53:51

SolContractEval: A Benchmark for Evaluating Contract-Level Solidity Code Generation
Zhifan Ye, Jiachi Chen, Zhenzhe Shao, Lingfeng Bao, Xiaohu Yang, Zhongxin Liu
https://arxiv.org/abs/2509.23824

SolContractEval: A Benchmark for Evaluating Contract-Level Solidity Code Generation
The rise of blockchain has brought smart contracts into mainstream use, creating a demand for smart contract generation tools. While large language models (LLMs) excel at generating code in general-purpose languages, their effectiveness on Solidity, the primary language for smart contracts, remains underexplored. Solidity constitutes only a small portion of typical LLM training data and differs from general-purpose languages in its version-sensitive syntax and limited flexibility. These factors…

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 07:33:48

ProSEA: Problem Solving via Exploration Agents
William Nguyen, Vinh Luong, Christopher Nguyen
https://arxiv.org/abs/2510.07423 https://arxiv.org/pdf/2510.0…

ProSEA: Problem Solving via Exploration Agents
Large language models (LLMs) have empowered AI agents to tackle increasingly complex tasks. However, most existing agents remain limited to static planning and brittle interactions, falling short of true collaboration or adaptive reasoning. We introduce ProSEA, a modular, general-purpose multi-agent framework designed for iterative problem solving through exploration and plan evolution. ProSEA features a hierarchical architecture in which a Manager Agent orchestrates domain-specialized Expert A…

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:05:42

ModernBERT ColBERT: Enhancing biomedical RAG through an advanced re-ranking retriever
Eduardo Mart\'inez Rivera, Filippo Menolascina
https://arxiv.org/abs/2510.04757 https…

ModernBERT + ColBERT: Enhancing biomedical RAG through an advanced re-ranking retriever
Retrieval-Augmented Generation (RAG) is a powerful technique for enriching Large Language Models (LLMs) with external knowledge, allowing for factually grounded responses, a critical requirement in high-stakes domains such as healthcare. However, the efficacy of RAG systems is fundamentally restricted by the performance of their retrieval module, since irrelevant or semantically misaligned documents directly compromise the accuracy of the final generated response. General-purpose dense retrieve…

@arXiv_csCE_bot@mastoxiv.page
2025-09-16 08:32:46

Large language model-empowered next-generation computer-aided engineering
Jiachen Guo, Chanwook Park, Dong Qian, Thomas J. R. Hughes, Wing Kam Liu
https://arxiv.org/abs/2509.11447

Large language model-empowered next-generation computer-aided engineering
Software development has entered a new era where large language models (LLMs) now serve as general-purpose reasoning engines, enabling natural language interaction and transformative applications across diverse domains. This paradigm is now extending into computer-aided engineering (CAE). Recent applications of LLMs in CAE have successfully automated routine tasks, including CAD model generation and FEM simulations. Nevertheless, these contributions, which primarily serve to reduce manual labor…

@arXiv_mathph_bot@mastoxiv.page
2025-09-29 08:55:08

Factorization Algebras for Linearized Gravity
Filip Dul
https://arxiv.org/abs/2509.21458 https://arxiv.org/pdf/2509.21458

Factorization Algebras for Linearized Gravity
The purpose of this work is to bring gravitational theories into play within the quickly developing framework of factorization algebras. We fit the causal structure of Lorentzian manifolds into categorical language, and in the globally hyperbolic case discover a convenient equivalence of coverages. Then, we show how both perturbative general relativity and perturbative conformal gravity define Batalin-Vilkovisky classical field theories. Finally, we describe how the observables of linearized ge…

@arXiv_csSE_bot@mastoxiv.page
2025-10-01 10:25:47

RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval
Pratik Shah, Rajat Ghosh, Aryan Singhal, Debojyoti Dutta
https://arxiv.org/abs/2509.25257 https://

RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval
General-purpose automated software engineering (ASE) includes tasks such as code completion, retrieval, repair, QA, and summarization. These tasks require a code retrieval system that can handle specific queries about code entities, or code entity queries (for example, locating a specific class or retrieving the dependencies of a function), as well as general queries without explicit code entities, or natural language queries (for example, describing a task and retrieving the corresponding code…

@arXiv_eessSP_bot@mastoxiv.page
2025-09-16 11:29:36

RadarLLM: Adapting Pretrained Large Language Models for Marine Radar Target Detection with Preference-aware Loss
Qiying Hu
https://arxiv.org/abs/2509.12089 https://

RadarLLM: Adapting Pretrained Large Language Models for Marine Radar Target Detection with Preference-aware Loss
Recent advances in pre-trained large language models (LLMs) have demonstrated their capacities to capture universal knowledge, making them promising general-purpose optimization solvers for wireless signal processing. Motivated by these findings, we take the first step towards fine-tuning pre-trained LLMs for the effective analysis of radar signal features in marine target detection tasks. Nevertheless, directly fine-tuning pre-trained LLMs on marine target detection tasks tends to suffer from …

@arXiv_csCV_bot@mastoxiv.page
2025-09-25 07:40:32

iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
Manyi Yao, Bingbing Zhuang, Sparsh Garg, Amit Roy-Chowdhury, Christian Shelton, Manmohan Chandraker, Abhishek Aich
https://arxiv.org/abs/2509.19552

iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
Grounding large language models (LLMs) in domain-specific tasks like post-hoc dash-cam driving video analysis is challenging due to their general-purpose training and lack of structured inductive biases. As vision is often the sole modality available for such analysis (i.e., no LiDAR, GPS, etc.), existing video-based vision-language models (V-VLMs) struggle with spatial reasoning, causal inference, and explainability of events in the input video. To this end, we introduce iFinder, a structured …

@arXiv_csCR_bot@mastoxiv.page
2025-09-17 09:52:30

Bridging Threat Models and Detections: Formal Verification via CADP
Dumitru-Bogdan Prelipcean (Bitdefender, Ia\c{s}i, Romania, Alexandru Ioan Cuza University, Iasi, Romania, LACL, Universite Paris-Est Creteil, France), C\u{a}t\u{a}lin Dima (LACL, Universit\'e Paris-Est Cr\'et\'eil, France)
https://arxiv.org/abs/2509.13035

Bridging Threat Models and Detections: Formal Verification via CADP
Threat detection systems rely on rule-based logic to identify adversarial behaviors, yet the conformance of these rules to high-level threat models is rarely verified formally. We present a formal verification framework that models both detection logic and attack trees as labeled transition systems (LTSs), enabling automated conformance checking via bisimulation and weak trace inclusion. Detection rules specified in the Generic Threat Detection Language (GTDL, a general-purpose detection langua…

@arXiv_csCL_bot@mastoxiv.page
2025-09-25 10:50:22

Language Models that Think, Chat Better
Adithya Bhaskar, Xi Ye, Danqi Chen
https://arxiv.org/abs/2509.20357 https://arxiv.org/pdf/2509.20357

Language Models that Think, Chat Better
Reinforcement learning with verifiable rewards (RLVR) improves language model reasoning by using rule-based rewards in verifiable domains such as mathematics and code. However, RLVR leads to limited generalization for open-ended tasks -- such as writing outline essays or making meal plans -- where humans reason routinely. This paper shows that the RLVR paradigm is effective beyond verifiable domains, and introduces **RL** with **M**odel-rewarded **T**hinking (**RLMT**) for general-purpose chat …

@arXiv_csCV_bot@mastoxiv.page
2025-09-19 10:20:21

EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence
Chaoyin She, Ruifang Lu, Lida Chen, Wei Wang, Qinghua Huang
https://arxiv.org/abs/2509.14977

EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence
Ultrasound imaging has become the preferred imaging modality for early cancer screening due to its advantages of non-ionizing radiation, low cost, and real-time imaging capabilities. However, conventional ultrasound diagnosis heavily relies on physician expertise, presenting challenges of high subjectivity and low diagnostic efficiency. Vision-language models (VLMs) offer promising solutions for this issue, but existing general-purpose models demonstrate limited knowledge in ultrasound medical …

@arXiv_csAI_bot@mastoxiv.page
2025-09-29 10:29:27

InfiMed-Foundation: Pioneering Advanced Multimodal Medical Models with Compute-Efficient Pre-Training and Multi-Stage Fine-Tuning
Guanghao Zhu, Zhitian Hou, Zeyu Liu, Zhijie Sang, Congkai Xie, Hongxia Yang
https://arxiv.org/abs/2509.22261

InfiMed-Foundation: Pioneering Advanced Multimodal Medical Models with Compute-Efficient Pre-Training and Multi-Stage Fine-Tuning
Multimodal large language models (MLLMs) have shown remarkable potential in various domains, yet their application in the medical field is hindered by several challenges. General-purpose MLLMs often lack the specialized knowledge required for medical tasks, leading to uncertain or hallucinatory responses. Knowledge distillation from advanced models struggles to capture domain-specific expertise in radiology and pharmacology. Additionally, the computational cost of continual pretraining with lar…

@arXiv_csSE_bot@mastoxiv.page
2025-10-01 10:43:27

R-Log: Incentivizing Log Analysis Capability in LLMs via Reasoning-based Reinforcement Learning
Yilun Liu, Ziang Chen, Song Xu, Minggui He, Shimin Tao, Weibin Meng, Yuming Xie, Tao Han, Chunguang Zhao, Jingzhou Du, Daimeng Wei, Shenglin Zhang, Yongqian Sun
https://arxiv.org/abs/2509.25987

R-Log: Incentivizing Log Analysis Capability in LLMs via Reasoning-based Reinforcement Learning
The growing complexity of log data in modern software systems has prompted the use of Large Language Models (LLMs) for automated log analysis. Current approaches typically rely on direct supervised fine-tuning (SFT) on log-label pairs. However, this exacerbates the domain discrepancy between general-purpose LLMs and specialized log data, causing overfitting. Furthermore, SFT's imbalanced loss computation often allows lengthy contexts to overwhelm critical, concise details in model answers, lead…

@arXiv_csCL_bot@mastoxiv.page
2025-09-19 10:33:21

Patent Language Model Pretraining with ModernBERT
Amirhossein Yousefiramandi, Ciaran Cooney
https://arxiv.org/abs/2509.14926 https://arxiv.org/pdf/2509.149…

Patent Language Model Pretraining with ModernBERT
Transformer-based language models such as BERT have become foundational in NLP, yet their performance degrades in specialized domains like patents, which contain long, technical, and legally structured text. Prior approaches to patent NLP have primarily relied on fine-tuning general-purpose models or domain-adapted variants pretrained with limited data. In this work, we pretrain 3 domain-specific masked language models for patents, using the ModernBERT architecture and a curated corpus of over …

@arXiv_csIR_bot@mastoxiv.page
2025-09-23 09:26:10

Comparing RAG and GraphRAG for Page-Level Retrieval Question Answering on Math Textbook
Eason Chen, Chuangji Li, Shizhuo Li, Conrad Borchers, Zimo Xiao, Chloe Qianhui Zhao, Jionghao Lin, Kenneth R. Koedinger
https://arxiv.org/abs/2509.16780

Comparing RAG and GraphRAG for Page-Level Retrieval Question Answering on Math Textbook
Technology-enhanced learning environments often help students retrieve relevant learning content for questions arising during self-paced study. Large language models (LLMs) have emerged as novel aids for information retrieval during learning. While LLMs are effective for general-purpose question-answering, they typically lack alignment with the domain knowledge of specific course materials such as textbooks and slides. We investigate Retrieval-Augmented Generation (RAG) and GraphRAG, a knowledg…

@arXiv_csCE_bot@mastoxiv.page
2025-09-23 07:33:27

Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs
Xingang Guo, Yaxin Li, Xiangyi Kong, Yilan Jiang, Xiayu Zhao, Zhihua Gong, Yufan Zhang, Daixuan Li, Tianle Sang, Beixiao Zhu, Gregory Jun, Yingbing Huang, Yiqi Liu, Yuqi Xue, Rahul Dev Kundu, Qi Jian Lim, Yizhou Zhao, Luke Alexander Granger, Mohamed Badr Younis, Darioush Keivan, Nippun Sabharwal, Shreyanka Sinha, Prakhar Agarwal, Kojo Vandyck, Hanlin Mai, Zichen Wang, Aditya Venkatesh, Ayush Barik, Jiankun…

Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs
Today, industry pioneers dream of developing general-purpose AI engineers capable of designing and building humanity's most ambitious projects--from starships that will carry us to distant worlds to Dyson spheres that harness stellar energy. Yet engineering design represents a fundamentally different challenge for large language models (LLMs) compared to traditional textbook-style problem solving or factual question answering. Real-world engineering design demands the synthesis of domain knowle…

@arXiv_csAI_bot@mastoxiv.page
2025-09-29 10:24:37

The Thinking Spectrum: An Emperical Study of Tunable Reasoning in LLMs through Model Merging
Xiaochong Lan, Yu Zheng, Shiteng Cao, Yong Li
https://arxiv.org/abs/2509.22034 https…

The Thinking Spectrum: An Emperical Study of Tunable Reasoning in LLMs through Model Merging
The growing demand for large language models (LLMs) with tunable reasoning capabilities in many real-world applications highlights a critical need for methods that can efficiently produce a spectrum of models balancing reasoning depth and computational cost. Model merging has emerged as a promising, training-free technique to address this challenge by arithmetically combining the weights of a general-purpose model with a specialized reasoning model. While various merging techniques exist, their…

@arXiv_csCV_bot@mastoxiv.page
2025-09-23 13:11:51

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Ye Liu, Zongyang Ma, Junfu Pu, Zhongang Qi, Yang Wu, Ying Shan, Chang Wen Chen
https://arxiv.org/abs/2509.18094

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Recent advances in Large Multi-modal Models (LMMs) have demonstrated their remarkable success as general-purpose multi-modal assistants, with particular focuses on holistic image- and video-language understanding. Conversely, less attention has been given to scaling fine-grained pixel-level understanding capabilities, where the models are expected to realize pixel-level alignment between visual signals and language semantics. Some previous studies have applied LMMs to related tasks such as regi…

@arXiv_csCL_bot@mastoxiv.page
2025-09-22 10:11:11

UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
Chenlong Deng, Zhisong Zhang, Kelong Mao, Shuaiyi Li, Tianqing Fang, Hongming Zhang, Haitao Mi, Dong Yu, Zhicheng Dou
https://arxiv.org/abs/2509.15763

UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
Large language models are increasingly capable of handling long-context inputs, but the memory overhead of key-value (KV) cache remains a major bottleneck for general-purpose deployment. While various compression strategies have been explored, sequence-level compression, which drops the full KV caches for certain tokens, is particularly challenging as it can lead to the loss of important contextual information. To address this, we introduce UniGist, a sequence-level long-context compression fra…

@arXiv_csSE_bot@mastoxiv.page
2025-09-16 10:44:26

From Evaluation to Enhancement: Large Language Models for Zero-Knowledge Proof Code Generation
Zhantong Xue, Pingchuan Ma, Zhaoyu Wang, Shuai Wang
https://arxiv.org/abs/2509.11708

From Evaluation to Enhancement: Large Language Models for Zero-Knowledge Proof Code Generation
Zero-knowledge proofs (ZKPs) are increasingly deployed in domains such as privacy-preserving authentication, blockchain scalability, and secure finance. However, authoring ZK programs remains challenging: unlike mainstream programming, ZK development requires reasoning about finite field arithmetic, constraint systems, and gadgets, making it knowledge-intensive and error-prone. While large language models (LLMs) have demonstrated strong code generation capabilities in general-purpose languages,…

@arXiv_csAI_bot@mastoxiv.page
2025-09-29 09:48:47

ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration
Gaole Dai, Shiqi Jiang, Ting Cao, Yuqing Yang, Yuanchun Li, Rui Tan, Mo Li, Lili Qiu
https://arxiv.org/abs/2509.21823

ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration
Reward is critical to the evaluation and training of large language models (LLMs). However, existing rule-based or model-based reward methods struggle to generalize to GUI agents, where access to ground-truth trajectories or application databases is often unavailable, and static trajectory-based LLM-as-a-Judge approaches suffer from limited accuracy. To address these challenges, we propose ProRe, a proactive reward system that leverages a general-purpose reasoner and domain-specific evaluator a…

@arXiv_csCL_bot@mastoxiv.page
2025-09-16 12:21:37

Is 'Hope' a person or an idea? A pilot benchmark for NER: comparing traditional NLP tools and large language models on ambiguous entities
Payam Latifi
https://arxiv.org/abs/2509.12098

Is 'Hope' a person or an idea? A pilot benchmark for NER: comparing traditional NLP tools and large language models on ambiguous entities
This pilot study presents a small-scale but carefully annotated benchmark of Named Entity Recognition (NER) performance across six systems: three non-LLM NLP tools (NLTK, spaCy, Stanza) and three general-purpose large language models (LLMs: Gemini-1.5-flash, DeepSeek-V3, Qwen-3-4B). The dataset contains 119 tokens covering five entity types (PERSON, LOCATION, ORGANIZATION, DATE, TIME). We evaluated each system's output against the manually annotated gold standard dataset using F1-score. The res…

@arXiv_csCV_bot@mastoxiv.page
2025-09-23 13:10:51

Beyond Diagnosis: Evaluating Multimodal LLMs for Pathology Localization in Chest Radiographs
Advait Gosai, Arun Kavishwar, Stephanie L. McNamara, Soujanya Samineni, Renato Umeton, Alexander Chowdhury, William Lotter
https://arxiv.org/abs/2509.18015

Beyond Diagnosis: Evaluating Multimodal LLMs for Pathology Localization in Chest Radiographs
Recent work has shown promising performance of frontier large language models (LLMs) and their multimodal counterparts in medical quizzes and diagnostic tasks, highlighting their potential for broad clinical utility given their accessible, general-purpose nature. However, beyond diagnosis, a fundamental aspect of medical image interpretation is the ability to localize pathological findings. Evaluating localization not only has clinical and educational relevance but also provides insight into a …

@arXiv_csIR_bot@mastoxiv.page
2025-09-16 08:29:36

DSRAG: A Domain-Specific Retrieval Framework Based on Document-derived Multimodal Knowledge Graph
Mengzheng Yang, Yanfei Ren, David Osei Opoku, Ruochang Li, Peng Ren, Chunxiao Xing
https://arxiv.org/abs/2509.10467

DSRAG: A Domain-Specific Retrieval Framework Based on Document-derived Multimodal Knowledge Graph
Current general-purpose large language models (LLMs) commonly exhibit knowledge hallucination and insufficient domain-specific adaptability in domain-specific tasks, limiting their effectiveness in specialized question answering scenarios. Retrieval-augmented generation (RAG) effectively tackles these challenges by integrating external knowledge to enhance accuracy and relevance. However, traditional RAG still faces limitations in domain knowledge accuracy and context modeling.To enhance domain…

@arXiv_csSE_bot@mastoxiv.page
2025-09-23 09:20:10

RelRepair: Enhancing Automated Program Repair by Retrieving Relevant Code
Shunyu Liu, Guangdong Bai, Mark Utting, Guowei Yang
https://arxiv.org/abs/2509.16701 https://

RelRepair: Enhancing Automated Program Repair by Retrieving Relevant Code
Automated Program Repair (APR) has emerged as a promising paradigm for reducing debugging time and improving the overall efficiency of software development. Recent advances in Large Language Models (LLMs) have demonstrated their potential for automated bug fixing and other software engineering tasks. Nevertheless, the general-purpose nature of LLM pre-training means these models often lack the capacity to perform project-specific repairs, which require understanding of domain-specific identifie…

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:25:41

MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Peng Xu, Shengwu Xiong, Jiajun Zhang, Yaxiong Chen, Bowen Zhou, Chen Change Loy, David A. Clifton, Kyoung Mu Lee, Luc Van Gool, Ruiming He, Ruilin Yao, Xinwei Long, Jirui Huang, Kai Tian, Sa Yang, Yihua Shao, Jin Feng, Yue Zhong, Jiakai Zhou, Cheng Tang, Tianyu Zou, Yifang Zhang, Junming Liang, Guoyou Li, Zhaoxiang Wang, Qiang Zhou, Yichen Zhao, Shili Xiong, Hyeongjin Nam, Jaerin Lee, Jaey…

MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
This paper reviews the MARS2 2025 Challenge on Multimodal Reasoning. We aim to bring together different approaches in multimodal machine learning and LLMs via a large benchmark. We hope it better allows researchers to follow the state-of-the-art in this very dynamic area. Meanwhile, a growing number of testbeds have boosted the evolution of general-purpose large language models. Thus, this year's MARS2 focuses on real-world and specialized scenarios to broaden the multimodal reasoning applicati…

@arXiv_csCL_bot@mastoxiv.page
2025-09-17 10:39:20

Scaling Agents via Continual Pre-training
Liangcai Su, Zhen Zhang, Guangyu Li, Zhuo Chen, Chenxi Wang, Maojia Song, Xinyu Wang, Kuan Li, Jialong Wu, Xuanzhong Chen, Zile Qiao, Zhongwang Zhang, Huifeng Yin, Shihao Cai, Runnan Fang, Zhengwei Tao, Wenbiao Yin, Chenxiong Qian, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou
https://arxiv.org/…

Scaling Agents via Continual Pre-training
Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models consistently underperform in agentic tasks, particularly in open-source implementations. We identify the root cause: the absence of robust agentic foundation models forces models during post-training to simultaneously learn diverse agentic behaviors while aligning them…

Tootfinder

Opt-in global Mastodon full text search. Join the index!