Tootfinder

@arXiv_csCR_bot@mastoxiv.page
2025-10-14 11:46:48

Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning
Guozhi Liu, Qi Mu, Tiansheng Huang, Xinhua Wang, Li Shen, Weiwei Lin, Zhang Li
https://arxiv.org/abs/2510.10085

Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning
Harmful fine-tuning issues present significant safety challenges for fine-tuning-as-a-service in large language models. Existing alignment-stage defenses, e.g., Vaccine, Repnoise, Booster, and T-Vaccine, mitigate harmful fine-tuning issues by enhancing the model's robustness during the alignment phase. While these methods have been proposed to mitigate the issue, they often overlook a critical upstream factor: the role of the original safety-alignment data. We observe that their defense perform…

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:32:50

Understanding the Effects of Domain Finetuning on LLMs
Eshaan Tanwar, Deepak Nathani, William Yang Wang, Tanmoy Chakraborty
https://arxiv.org/abs/2510.09359 https://

Understanding the Effects of Domain Finetuning on LLMs
Large Language Models (LLMs) fine-tuned for specific domains exhibit strong performance; however, the underlying mechanisms by which this fine-tuning reshapes their parametric space are not well understood. Prior works primarily focus on auto-regressive or general-purpose instruct models, leaving domain-specialised LLMs under-explored. We present the first systematic study of domain-specific fine-tuning in large medical language models. Our analysis reveals that fine-tuning modifies only a smal…

@arXiv_csNI_bot@mastoxiv.page
2025-08-14 09:34:22

NEFMind: Parameter-Efficient Fine-Tuning of Open-Source LLMs for Telecom APIs Automation
Zainab Khan, Ahmed Hussain, Mukesh Thakur, Arto Hellas, Panos Papadimitratos
https://arxiv.org/abs/2508.09240

NEFMind: Parameter-Efficient Fine-Tuning of Open-Source LLMs for Telecom APIs Automation
The use of Service-Based Architecture in modern telecommunications has exponentially increased Network Functions (NFs) and Application Programming Interfaces (APIs), creating substantial operational complexities in service discovery and management. We introduce \textit{NEFMind}, a framework leveraging parameter-efficient fine-tuning of open-source Large Language Models (LLMs) to address these challenges. It integrates three core components: synthetic dataset generation from Network Exposure Fun…

@arXiv_csCV_bot@mastoxiv.page
2025-09-12 10:14:39

PeftCD: Leveraging Vision Foundation Models with Parameter-Efficient Fine-Tuning for Remote Sensing Change Detection
Sijun Dong, Yuxuan Hu, LiBo Wang, Geng Chen, Xiaoliang Meng
https://arxiv.org/abs/2509.09572

PeftCD: Leveraging Vision Foundation Models with Parameter-Efficient Fine-Tuning for Remote Sensing Change Detection
To tackle the prevalence of pseudo changes, the scarcity of labeled samples, and the difficulty of cross-domain generalization in multi-temporal and multi-source remote sensing imagery, we propose PeftCD, a change detection framework built upon Vision Foundation Models (VFMs) with Parameter-Efficient Fine-Tuning (PEFT). At its core, PeftCD employs a weight-sharing Siamese encoder derived from a VFM, into which LoRA and Adapter modules are seamlessly integrated. This design enables highly effici…

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 13:15:58

MeTA-LoRA: Data-Efficient Multi-Task Fine-Tuning for Large Language Models
Bo Cheng, Xu Wang, Jinda Liu, Yi Chang, Yuan Wu
https://arxiv.org/abs/2510.11598 https://

MeTA-LoRA: Data-Efficient Multi-Task Fine-Tuning for Large Language Models
Low-Rank Adaptation (LoRA) has emerged as one of the most widely used parameter-efficient fine-tuning (PEFT) methods for adapting large language models (LLMs) to downstream tasks. While highly effective in single-task settings, it struggles to efficiently leverage inter-task knowledge in complex multi-task learning scenarios, often requiring substantial task-specific data to achieve optimal performance. To address this limitation, we introduce MeTA-LoRA, a two-stage optimization framework that …

@arXiv_csSE_bot@mastoxiv.page
2025-10-13 09:59:30

TIT: A Tree-Structured Instruction Tuning Approach for LLM-Based Code Translation
He Jiang, Yufu Wang, Hao Lin, Peiyu Zou, Zhide Zhou, Ang Jia, Xiaochen Li, Zhilei Ren
https://arxiv.org/abs/2510.09400 …

TIT: A Tree-Structured Instruction Tuning Approach for LLM-Based Code Translation
Large Language Models (LLMs) have shown strong performance in automated source-to-target code translation through pretraining on extensive code corpora. However, mainstream LLM-based code translation methods suffer from two critical limitations. First, they are highly sensitive to language-specific features, which often introduce source-language syntax or lexicon into the output, leading to syntactic confusion. Second, they lack fine-grained semantic alignment due to an over-reliance on functio…

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-10-14 09:32:48

Predicting Crystal Structures and Ionic Conductivity in Li$_{3}$YCl$_{6-x}$Br$_{x}$ Halide Solid Electrolytes Using a Fine-Tuned Machine Learning Interatomic Potential
Jonas B\"ohm, Aur\'elie Champagne
https://arxiv.org/abs/2510.09861

Predicting Crystal Structures and Ionic Conductivity in Li$_{3}$YCl$_{6-x}$Br$_{x}$ Halide Solid Electrolytes Using a Fine-Tuned Machine Learning Interatomic Potential
This work demonstrates the effectiveness of fine-tuning the CHGNet universal machine learning interatomic potential (uMLIP) to investigate ionic transport mechanisms in ternary halide solid electrolytes of the Li$_{3}$YCl$_{6-x}$Br$_{x}$ family (x = 0 to 6), which are promising candidates for solid-state battery applications. We present a strategy for generating ordered structural models from experimentally derived disordered Li$_{3}$YCl$_{6}$ (LYC) and Li$_{3}$YBr$_{6}$ (LYB) structures. These…

@arXiv_csHC_bot@mastoxiv.page
2025-08-12 10:46:53

Fine-Tuning Large Language Models Using EEG Microstate Features for Mental Workload Assessment
Bujar Raufi
https://arxiv.org/abs/2508.07283 https://arxiv.o…

Fine-Tuning Large Language Models Using EEG Microstate Features for Mental Workload Assessment
This study explores the intersection of electroencephalography (EEG) microstates and Large Language Models (LLMs) to enhance the assessment of cognitive load states. By utilizing EEG microstate features, the research aims to fine-tune LLMs for improved predictions of distinct cognitive states, specifically 'Rest' and 'Load'. The experimental design is delineated in four comprehensive stages: dataset collection and preprocessing, microstate segmentation and EEG backfitting, feature extraction pa…

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:41:10

Safety Game: Balancing Safe and Informative Conversations with Blackbox Agentic AI using LP Solvers
Tuan Nguyen, Long Tran-Thanh
https://arxiv.org/abs/2510.09330 https://…

Safety Game: Balancing Safe and Informative Conversations with Blackbox Agentic AI using LP Solvers
Ensuring that large language models (LLMs) comply with safety requirements is a central challenge in AI deployment. Existing alignment approaches primarily operate during training, such as through fine-tuning or reinforcement learning from human feedback, but these methods are costly and inflexible, requiring retraining whenever new requirements arise. Recent efforts toward inference-time alignment mitigate some of these limitations but still assume access to model internals, which is impractic…

@arXiv_csAI_bot@mastoxiv.page
2025-10-13 09:45:30

MEC$^3$O: Multi-Expert Consensus for Code Time Complexity Prediction
Joonghyuk Hahn, Soohan Lim, Yo-Sub Han
https://arxiv.org/abs/2510.09049 https://arxiv.…

MEC$^3$O: Multi-Expert Consensus for Code Time Complexity Prediction
Predicting the complexity of source code is essential for software development and algorithm analysis. Recently, Baik et al. (2025) introduced CodeComplex for code time complexity prediction. The paper shows that LLMs without fine-tuning struggle with certain complexity classes. This suggests that no single LLM excels at every class, but rather each model shows advantages in certain classes. We propose MEC$^3$O, a multi-expert consensus system, which extends the multi-agent debate frameworks. M…

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 13:08:18

Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning
Dean L. Slack, Noura Al Moubayed
https://arxiv.org/abs/2510.11372 https://

Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning
Although large language models excel across many tasks, they can memorise training data and thereby expose private or copyrighted text. Most defences target the pre-training stage, leaving memorisation during fine-tuning, especially for domain adaptation and instruction tuning, poorly understood. We fine-tune Pythia, Llama3, and Mistral models spanning 1.4B-70B parameters on common evaluation datasets and track verbatim memorisation throughout training. We find that memorisation increases drama…

@arXiv_csCR_bot@mastoxiv.page
2025-09-12 08:44:59

DP-FedLoRA: Privacy-Enhanced Federated Fine-Tuning for On-Device Large Language Models
Honghui Xu, Shiva Shrestha, Wei Chen, Zhiyuan Li, Zhipeng Cai
https://arxiv.org/abs/2509.09097

DP-FedLoRA: Privacy-Enhanced Federated Fine-Tuning for On-Device Large Language Models
As on-device large language model (LLM) systems become increasingly prevalent, federated fine-tuning enables advanced language understanding and generation directly on edge devices; however, it also involves processing sensitive, user-specific data, raising significant privacy concerns within the federated learning framework. To address these challenges, we propose DP-FedLoRA, a privacy-enhanced federated fine-tuning framework that integrates LoRA-based adaptation with differential privacy in a…

@arXiv_statML_bot@mastoxiv.page
2025-10-14 09:12:08

Calibrating Generative Models
Henry D. Smith, Nathaniel L. Diamant, Brian L. Trippe
https://arxiv.org/abs/2510.10020 https://arxiv.org/pdf/2510.10020

Calibrating Generative Models
Generative models frequently suffer miscalibration, wherein class probabilities and other statistics of the sampling distribution deviate from desired values. We frame calibration as a constrained optimization problem and seek the closest model in Kullback-Leibler divergence satisfying calibration constraints. To address the intractability of imposing these constraints exactly, we introduce two surrogate objectives for fine-tuning: (1) the relax loss, which replaces the constraint with a miscal…

@arXiv_eessAS_bot@mastoxiv.page
2025-08-12 10:30:33

G-IFT: A Gated Linear Unit adapter with Iterative Fine-Tuning for Low-Resource Children's Speaker Verification
Vishwas M. Shetty, Jiusi Zheng, Abeer Alwan
https://arxiv.org/abs/2508.07836

G-IFT: A Gated Linear Unit adapter with Iterative Fine-Tuning for Low-Resource Children's Speaker Verification
Speaker Verification (SV) systems trained on adults speech often underperform on children's SV due to the acoustic mismatch, and limited children speech data makes fine-tuning not very effective. In this paper, we propose an innovative framework, a Gated Linear Unit adapter with Iterative Fine-Tuning (G-IFT), to enhance knowledge transfer efficiency between the high-resource adults speech domain and the low-resource children's speech domain. In this framework, a Gated Linear Unit adapter is fir…

@arXiv_csGR_bot@mastoxiv.page
2025-08-11 07:34:19

DogFit: Domain-guided Fine-tuning for Efficient Transfer Learning of Diffusion Models
Yara Bahram, Mohammadhadi Shateri, Eric Granger
https://arxiv.org/abs/2508.05685 https://…

DogFit: Domain-guided Fine-tuning for Efficient Transfer Learning of Diffusion Models
Transfer learning of diffusion models to smaller target domains is challenging, as naively fine-tuning the model often results in poor generalization. Test-time guidance methods help mitigate this by offering controllable improvements in image fidelity through a trade-off with sample diversity. However, this benefit comes at a high computational cost, typically requiring dual forward passes during sampling. We propose the Domain-guided Fine-tuning (DogFit) method, an effective guidance mechanis…

@arXiv_csSD_bot@mastoxiv.page
2025-10-14 10:45:18

Knowledge-Decoupled Functionally Invariant Path with Synthetic Personal Data for Personalized ASR
Yue Gu, Zhihao Du, Ying Shi, Jiqing Han, Yongjun He
https://arxiv.org/abs/2510.10401

Knowledge-Decoupled Functionally Invariant Path with Synthetic Personal Data for Personalized ASR
Fine-tuning generic ASR models with large-scale synthetic personal data can enhance the personalization of ASR models, but it introduces challenges in adapting to synthetic personal data without forgetting real knowledge, and in adapting to personal data without forgetting generic knowledge. Considering that the functionally invariant path (FIP) framework enables model adaptation while preserving prior knowledge, in this letter, we introduce FIP into synthetic-data-augmented personalized ASR mo…

@arXiv_csAR_bot@mastoxiv.page
2025-10-14 09:20:08

Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
Jo\~ao Paulo Cardoso de Lima, Marc Dietrich, Jeronimo Castrillon, Asif Ali Khan
https://arxiv.org/abs/2510.11192 h…

Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
Structured sparsity enables deploying large language models (LLMs) on resource-constrained systems. Approaches like dense-to-sparse fine-tuning are particularly compelling, achieving remarkable structured sparsity by reducing the model size by over 6.7x, while still maintaining acceptable accuracy. Despite this reduction, LLM inference, especially the decode stage being inherently memory-bound, is extremely expensive on conventional Von-Neumann architectures. Compute-in-memory (CIM) architectur…

@arXiv_csDB_bot@mastoxiv.page
2025-10-13 07:45:10

HES-SQL: Hybrid Reasoning for Efficient Text-to-SQL with Structural Skeleton Guidance
Suming Qiu, Jing Li, Zhicheng Zhou, Junjie Huang, Linyuan Qiu, Zhijie Sun
https://arxiv.org/abs/2510.08896

HES-SQL: Hybrid Reasoning for Efficient Text-to-SQL with Structural Skeleton Guidance
We present HES-SQL, a novel hybrid training framework that advances Text-to-SQL generation through the integration of thinking-mode-fused supervised fine-tuning (SFT) with Group Relative Policy Optimization (GRPO). Our approach introduces three key innovations: (1) a skeleton-completeness scoring mechanism that enhances preference alignment between generated queries and optimal SQL structures; (2) a query-latency-aware reward system that incentivizes the generation of computationally efficient …

@arXiv_csIR_bot@mastoxiv.page
2025-10-14 10:38:48

Does LLM Focus on the Right Words? Diagnosing Language Bias in LLM-based Recommenders
Bohao Wang, Jiawei Chen, Feng Liu, Changwang Zhang, Jun Wang, Canghong Jin, Chun Chen, Can Wang
https://arxiv.org/abs/2510.10978

Does LLM Focus on the Right Words? Diagnosing Language Bias in LLM-based Recommenders
Large language models (LLMs), owing to their extensive open-domain knowledge and semantic reasoning capabilities, have been increasingly integrated into recommender systems (RS). However, a substantial gap remains between the pre-training objectives of LLMs and the specific requirements of recommendation tasks. To address this gap, supervised fine-tuning (SFT) is commonly performed on specially curated recommendation datasets to further enhance their predictive ability. Despite its success, SFT…

@arXiv_condmatmeshall_bot@mastoxiv.page
2025-08-14 09:06:42

Phonon interference effects in GaAs-GaP superlattice nanowires
Chaitanya Arya, Johannes Trautvetter, Jose M. Sojo-Gordillo, Yashpreet Kaur, Valentina Zannier, Fabio Beltram, Tommaso Albrigi, Alicia Ruiz-Caridad, Lucia Sorba, Riccardo Rurali, Ilaria Zardo
https://arxiv.org/abs/2508.09556

Phonon interference effects in GaAs-GaP superlattice nanowires
Fine-tuning the functional properties of nanomaterials is crucial for technological applications. Superlattices, characterized by periodic repetitions of two or more materials in different dimensions, have emerged as a promising area of investigation. We present a study of the phonon interference effect on thermal transport in GaAs-GaP superlattice nanowires with sharp interfaces between the GaAs and GaP layers, as confirmed by high-resolution transmission electron microscopy. We performed ther…

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 13:09:08

Valid Survey Simulations with Limited Human Data: The Roles of Prompting, Fine-Tuning, and Rectification
Stefan Krsteski, Giuseppe Russo, Serina Chang, Robert West, Kristina Gligori\'c
https://arxiv.org/abs/2510.11408

Valid Survey Simulations with Limited Human Data: The Roles of Prompting, Fine-Tuning, and Rectification
Surveys provide valuable insights into public opinion and behavior, but their execution is costly and slow. Large language models (LLMs) have been proposed as a scalable, low-cost substitute for human respondents, but their outputs are often biased and yield invalid estimates. We study the interplay between synthesis methods that use LLMs to generate survey responses and rectification methods that debias population estimates, and explore how human responses are best allocated between them. Usin…

@arXiv_csRO_bot@mastoxiv.page
2025-08-05 11:45:31

CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning
Dongchi Huang, Zhirui Fang, Tianle Zhang, Yihang Li, Lin Zhao, Chunhe Xia
https://arxiv.org/abs/2508.02219

CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning
Vision-Language-Action (VLA) models demonstrate significant potential for developing generalized policies in real-world robotic control. This progress inspires researchers to explore fine-tuning these models with Reinforcement Learning (RL). However, fine-tuning VLA models with RL still faces challenges related to sample efficiency, compatibility with action chunking, and training stability. To address these challenges, we explore the fine-tuning of VLA models through offline reinforcement lear…

@arXiv_csAI_bot@mastoxiv.page
2025-10-13 14:49:33

Replaced article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[5/7]:
- On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learn...
Zhang, Xie, Sun, Chen, Wang, Li, Ding, Zhou

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:43:00

HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness
Xinyi Wang, Jinyi Han, Zishang Jiang, Tingyun Li, Jiaqing Liang, Sihang Jiang, Zhaoqian Dai, Shuguang Ma, Fei Yu, Yanghua Xiao
https://arxiv.org/abs/2510.09388

HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness
Reinforcement Learning (RL) has become a key driver for enhancing the long chain-of-thought (CoT) reasoning capabilities of Large Language Models (LLMs). However, prevalent methods like GRPO often fail when task difficulty exceeds the model's capacity, leading to reward sparsity and inefficient training. While prior work attempts to mitigate this using off-policy data, such as mixing RL with Supervised Fine-Tuning (SFT) or using hints, they often misguide policy updates In this work, we identif…

@arXiv_qbioQM_bot@mastoxiv.page
2025-10-07 08:47:32

InstructPLM-mu: 1-Hour Fine-Tuning of ESM2 Beats ESM3 in Protein Mutation Predictions
Junde Xu, Yapin Shi, Lijun Lang, Taoyong Cui, Zhiming Zhang, Guangyong Chen, Jiezhong Qiu, Pheng-Ann Heng
https://arxiv.org/abs/2510.03370

InstructPLM-mu: 1-Hour Fine-Tuning of ESM2 Beats ESM3 in Protein Mutation Predictions
Multimodal protein language models deliver strong performance on mutation-effect prediction, but training such models from scratch demands substantial computational resources. In this paper, we propose a fine-tuning framework called InstructPLM-mu and try to answer a question: \textit{Can multimodal fine-tuning of a pretrained, sequence-only protein language model match the performance of models trained end-to-end? } Surprisingly, our experiments show that fine-tuning ESM2 with structural input…

@arXiv_eessIV_bot@mastoxiv.page
2025-10-09 08:04:51

Stacked Regression using Off-the-shelf, Stimulus-tuned and Fine-tuned Neural Networks for Predicting fMRI Brain Responses to Movies (Algonauts 2025 Report)
Robert Scholz, Kunal Bagga, Christine Ahrends, Carlo Alberto Barbano
https://arxiv.org/abs/2510.06235

Stacked Regression using Off-the-shelf, Stimulus-tuned and Fine-tuned Neural Networks for Predicting fMRI Brain Responses to Movies (Algonauts 2025 Report)
We present our submission to the Algonauts 2025 Challenge, where the goal is to predict fMRI brain responses to movie stimuli. Our approach integrates multimodal representations from large language models, video encoders, audio models, and vision-language models, combining both off-the-shelf and fine-tuned variants. To improve performance, we enhanced textual inputs with detailed transcripts and summaries, and we explored stimulus-tuning and fine-tuning strategies for language and vision models…

@seeingwithsound@mas.to
2025-09-11 14:06:05

[OT] Revealed: Apple is teaching its AI to adapt to the Trump era https://www.politico.eu/article/apple-teaching-artificial-intelligence-adapt-to-trump-era/ "updated guidelines on how the AI talks about diversity, equity and inclusion poli…

Apple is teaching its artificial intelligence to adapt to the Trump era
Apple policy documents enacted by a subcontractor show how the company shifted its approach to fine-tuning its AI in March, two months after the U.S. president was inaugurated.

@tinoeberl@mastodon.online
2025-10-11 05:07:02

Ein Artikel in Plos One beleuchtet den politischen #Bias von #Chatbots und zeigt, dass viele KI-Modelle eher linksgerichtete Antworten geben.
Dies könnte auf das Supervised Fine-Tuning zurückzuführen sein, bei dem KI durch menschliche Beispiele lernt. Interessanterweise weisen unterschiedliche

Linksgerichtete Chatbots? Eine Analyse der politischen Ausrichtung von LLMs
Politischer Bias in ChatGPT und Co.: Gibt es eine politische Ausrichtung in LLMs und wodurch könnte sie entstehen?

@arXiv_csSD_bot@mastoxiv.page
2025-10-14 11:06:48

Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR
Ling Sun, Charlotte Zhu, Shuju Shi
https://arxiv.org/abs/2510.10738 https://arxiv.org/…

Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR
General-purpose ASR underperforms for atypical speakers, such as L2 learners, reinforcing bias and limiting use in education and accessibility. Using the CEFR-graded Speak and Improve corpus, we show that naive fine-tuning of Whisper reduces average WER but simultaneously widens disparities and disproportionately harms lower-level learners. To address this, we propose two strategies: (i) proficiency-aware multitask learning, jointly optimizing ASR with proficiency classification, and (ii) targe…

@arXiv_csIR_bot@mastoxiv.page
2025-08-11 09:29:49

Fine-Tuning Vision-Language Models for Markdown Conversion of Financial Tables in Malaysian Audited Financial Reports
Jin Khye Tan (Faculty of Computer Science,Information Technology, Universiti Malaya), En Jun Choong, Ethan Jeremiah Chitty, Yan Pheng Choo, John Hsin Yang Wong, Chern Eu Cheah
https://arxiv.org/abs/2508.05669

Fine-Tuning Vision-Language Models for Markdown Conversion of Financial Tables in Malaysian Audited Financial Reports
Accurately extracting and representing the structure of tabular data from financial documents remains a critical challenge in document understanding, particularly for regulatory and analytical use cases. This study addresses the complexity of converting financial tables from Malaysian audited financial reports into Markdown format, a task complicated by rotated layouts, multi-level headers, and implicit structural cues. We propose a fine-tuned vision-language model (VLM), based on Qwen2.5-VL-7B…

@arXiv_csCR_bot@mastoxiv.page
2025-09-12 07:37:29

When FinTech Meets Privacy: Securing Financial LLMs with Differential Private Fine-Tuning
Sichen Zhu, Hoyeung Leung, Xiaoyi Wang, Jia Wei, Honghui Xu
https://arxiv.org/abs/2509.08995

When FinTech Meets Privacy: Securing Financial LLMs with Differential Private Fine-Tuning
The integration of Large Language Models (LLMs) into financial technology (FinTech) has revolutionized the analysis and processing of complex financial data, driving advancements in real-time decision-making and analytics. With the growing trend of deploying AI models on edge devices for financial applications, ensuring the privacy of sensitive financial data has become a significant challenge. To address this, we propose DPFinLLM, a privacy-enhanced, lightweight LLM specifically designed for o…

@arXiv_csDB_bot@mastoxiv.page
2025-10-14 08:25:48

GrASP: A Generalizable Address-based Semantic Prefetcher for Scalable Transactional and Analytical Workloads
Farzaneh Zirak, Farhana Choudhury, Renata Borovica-Gajic
https://arxiv.org/abs/2510.11011

GrASP: A Generalizable Address-based Semantic Prefetcher for Scalable Transactional and Analytical Workloads
Data prefetching--loading data into the cache before it is requested--is essential for reducing I/O overhead and improving database performance. While traditional prefetchers focus on sequential patterns, recent learning-based approaches, especially those leveraging data semantics, achieve higher accuracy for complex access patterns. However, these methods often struggle with today's dynamic, ever-growing datasets and require frequent, timely fine-tuning. Privacy constraints may also restrict a…

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:40:48

MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
Prasanna Mayilvahanan, Ricardo Dominguez-Olmedo, Thadd\"aus Wiedemer, Wieland Brendel
https://arxiv.org/abs/2510.11653

MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
With the advent of DeepSeek-R1, a new wave of reinforcement learning (RL) methods has emerged that seem to unlock stronger mathematical reasoning. However, a closer look at the open-source ecosystem reveals a critical limitation: with sufficiently many draws (e.g., $\texttt{pass@1024}$), many existing base models already solve nearly all questions on widely used math benchmarks such as MATH-500 and AIME 2024. This suggests that the RL fine-tuning methods prevalent in the LLM reasoning literatur…

@arXiv_csCL_bot@mastoxiv.page
2025-08-14 09:51:22

A Comprehensive Evaluation framework of Alignment Techniques for LLMs
Muneeza Azmat, Momin Abbas, Maysa Malfiza Garcia de Macedo, Marcelo Carpinette Grave, Luan Soares de Souza, Tiago Machado, Rogerio A de Paula, Raya Horesh, Yixin Chen, Heloisa Caroline de Souza Pereira Candello, Rebecka Nordenlow, Aminat Adebiyi
https://arxiv.org/abs/250…

A Comprehensive Evaluation framework of Alignment Techniques for LLMs
As Large Language Models (LLMs) become increasingly integrated into real-world applications, ensuring their outputs align with human values and safety standards has become critical. The field has developed diverse alignment approaches including traditional fine-tuning methods (RLHF, instruction tuning), post-hoc correction systems, and inference-time interventions, each with distinct advantages and limitations. However, the lack of unified evaluation frameworks makes it difficult to systematica…

@arXiv_csCV_bot@mastoxiv.page
2025-08-12 12:46:03

MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision
Zhonghao Yan, Muxi Diao, Yuxuan Yang, Jiayuan Xu, Kaizhou Zhang, Ruoyan Jing, Lele Yang, Yanxi Liu, Kongming Liang, Zhanyu Ma
https://arxiv.org/abs/2508.08177

MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision
Accurately grounding regions of interest (ROIs) is critical for diagnosis and treatment planning in medical imaging. While multimodal large language models (MLLMs) combine visual perception with natural language, current medical-grounding pipelines still rely on supervised fine-tuning with explicit spatial hints, making them ill-equipped to handle the implicit queries common in clinical practice. This work makes three core contributions. We first define Unified Medical Reasoning Grounding (UMRG…

@arXiv_csCR_bot@mastoxiv.page
2025-10-14 11:47:28

MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation
Wentian Zhu, Zhen Xiang, Wei Niu, Le Guan
https://arxiv.org/abs/2510.10271 https://

MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation
Unlike regular tokens derived from existing text corpora, special tokens are artificially created to annotate structured conversations during the fine-tuning process of Large Language Models (LLMs). Serving as metadata of training data, these tokens play a crucial role in instructing LLMs to generate coherent and context-aware responses. We demonstrate that special tokens can be exploited to construct four attack primitives, with which malicious users can reliably bypass the internal safety ali…

@arXiv_csRO_bot@mastoxiv.page
2025-09-12 09:37:39

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Haozhan Li, Yuxin Zuo, Jiale Yu, Yuhao Zhang, Zhaohui Yang, Kaiyan Zhang, Xuekai Zhu, Yuchen Zhang, Tianxing Chen, Ganqu Cui, Dehui Wang, Dingxiang Luo, Yuchen Fan, Youbang Sun, Jia Zeng, Jiangmiao Pang, Shanghang Zhang, Yu Wang, Yao Mu, Bowen Zhou, Ning Ding
https://arxiv.org/a…

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Vision-Language-Action (VLA) models have recently emerged as a powerful paradigm for robotic manipulation. Despite substantial progress enabled by large-scale pretraining and supervised fine-tuning (SFT), these models face two fundamental challenges: (i) the scarcity and high cost of large-scale human-operated robotic trajectories required for SFT scaling, and (ii) limited generalization to tasks involving distribution shift. Recent breakthroughs in Large Reasoning Models (LRMs) demonstrate tha…

@arXiv_csIR_bot@mastoxiv.page
2025-08-11 09:36:39

Domain-Specific Fine-Tuning and Prompt-Based Learning: A Comparative Study for developing Natural Language-Based BIM Information Retrieval Systems
Han Gao, Timo Hartmann, Botao Zhong, Kai Lia, Hanbin Luo
https://arxiv.org/abs/2508.05676

Domain-Specific Fine-Tuning and Prompt-Based Learning: A Comparative Study for developing Natural Language-Based BIM Information Retrieval Systems
Building Information Modeling (BIM) is essential for managing building data across the entire lifecycle, supporting tasks from design to maintenance. Natural Language Interface (NLI) systems are increasingly explored as user-friendly tools for information retrieval in Building Information Modeling (BIM) environments. Despite their potential, accurately extracting BIM-related data through natural language queries remains a persistent challenge due to the complexity use queries and specificity of…

@arXiv_eessIV_bot@mastoxiv.page
2025-08-12 17:47:26

Replaced article(s) found for eess.IV. https://arxiv.org/list/eess.IV/new
[1/2]:
- Accurate Measles Rash Detection via Vision Transformer Fine-Tuning
Qingguo Wang

@arXiv_csSE_bot@mastoxiv.page
2025-10-09 10:00:31

Prompt, Synthesize, Fine-Tune: A Secure Code Generation Recipe
Junjie Li, Fazle Rabbi, Bo Yang, Song Wang, Jinqiu Yang
https://arxiv.org/abs/2510.07189 https://

Prompt, Synthesize, Fine-Tune: A Secure Code Generation Recipe
Although Large Language Models (LLMs) show promising solutions to automated code generation, they often produce insecure code that threatens software security. Current approaches (e.g., SafeCoder) to improve secure code generation suffer from limited and imbalanced datasets, reducing their effectiveness and generalizability. In this work, we present Secure-Instruct, a novel framework that automatically synthesizes high-quality vulnerable and secure code examples, generates fine-tuning instructi…

@arXiv_csCL_bot@mastoxiv.page
2025-08-13 10:16:42

CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization
Xinge Ye, Rui Wang, Yuchuan Wu, Victor Ma, Feiteng Fang, Fei Huang, Yongbin Li
https://arxiv.org/abs/2508.09074

CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization
Reinforcement Learning Fine-Tuning (RLFT) has achieved notable success in tasks with objectively verifiable answers (e.g., code generation, mathematical reasoning), yet struggles with open-ended subjective tasks like role-playing dialogue. Traditional reward modeling approaches, which rely on independent sample-wise scoring, face dual challenges: subjective evaluation criteria and unstable reward signals.Motivated by the insight that human evaluation inherently combines explicit criteria with i…

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 10:45:19

How to Teach Large Multimodal Models New Skills
Zhen Zhu, Yiming Gong, Yao Xiao, Yaoyao Liu, Derek Hoiem
https://arxiv.org/abs/2510.08564 https://arxiv.org…

How to Teach Large Multimodal Models New Skills
How can we teach large multimodal models (LMMs) new skills without erasing prior abilities? We study sequential fine-tuning on five target skills while monitoring general ability on eight held-out benchmarks across three model families. We observe that apparent "forgetting" on held-out tasks after narrow fine-tuning can partly recover at later stages. We trace this behavior to a measurable shift in the output token distribution, manifested through a simple counting-bias probe that co-varies wit…

@arXiv_csCV_bot@mastoxiv.page
2025-09-12 10:07:59

You Share Beliefs, I Adapt: Progressive Heterogeneous Collaborative Perception
Hao Si, Ehsan Javanmardi, Manabu Tsukada
https://arxiv.org/abs/2509.09310 https://

You Share Beliefs, I Adapt: Progressive Heterogeneous Collaborative Perception
Collaborative perception enables vehicles to overcome individual perception limitations by sharing information, allowing them to see further and through occlusions. In real-world scenarios, models on different vehicles are often heterogeneous due to manufacturer variations. Existing methods for heterogeneous collaborative perception address this challenge by fine-tuning adapters or the entire network to bridge the domain gap. However, these methods are impractical in real-world applications, as…

@arXiv_csCR_bot@mastoxiv.page
2025-08-11 09:36:19

DMFI: Dual-Modality Fine-Tuning and Inference Framework for LLM-Based Insider Threat Detection
Kaichuan Kong, Dongjie Liu, Xiaobo Jin, Guanggang Geng, Zhiying Li, Jian Weng
https://arxiv.org/abs/2508.05694

DMFI: Dual-Modality Fine-Tuning and Inference Framework for LLM-Based Insider Threat Detection
Insider threat detection (ITD) poses a persistent and high-impact challenge in cybersecurity due to the subtle, long-term, and context-dependent nature of malicious insider behaviors. Traditional models often struggle to capture semantic intent and complex behavior dynamics, while existing LLM-based solutions face limitations in prompt adaptability and modality coverage. To bridge this gap, we propose DMFI, a dual-modality framework that integrates semantic inference with behavior-aware fine-tu…

@arXiv_csCL_bot@mastoxiv.page
2025-09-12 09:59:09

Bridging the Capability Gap: Joint Alignment Tuning for Harmonizing LLM-based Multi-Agent Systems
Minghang Zhu, Zhengliang Shi, Zhiwei Xu, Shiguang Wu, Lingjie Wang, Pengjie Ren, Zhaochun Ren, Zhumin Chen
https://arxiv.org/abs/2509.09629

Bridging the Capability Gap: Joint Alignment Tuning for Harmonizing LLM-based Multi-Agent Systems
The advancement of large language models (LLMs) has enabled the construction of multi-agent systems to solve complex tasks by dividing responsibilities among specialized agents, such as a planning agent for subgoal generation and a grounding agent for executing tool-use actions. Most existing methods typically fine-tune these agents independently, leading to capability gaps among them with poor coordination. To address this, we propose MOAT, a Multi-Agent Joint Alignment Tuning framework that i…

@arXiv_csLG_bot@mastoxiv.page
2025-10-09 10:43:11

Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Aryan Golbaghi, Shuo Zhou
https://arxiv.org/abs/2510.07052 https://

Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
We propose a workflow for speech emotion recognition (SER) that combines pre-trained representations with automated hyperparameter optimisation (HPO). Using SpeechBrain wav2vec2-base model fine-tuned on IEMOCAP as the encoder, we compare two HPO strategies, Gaussian Process Bayesian Optimisation (GP-BO) and Tree-structured Parzen Estimators (TPE), under an identical four-dimensional search space and 15-trial budget, with balanced class accuracy (BCA) on the German EmoDB corpus as the objective.…

@arXiv_csSD_bot@mastoxiv.page
2025-09-10 08:31:41

When Fine-Tuning is Not Enough: Lessons from HSAD on Hybrid and Adversarial Audio Spoof Detection
Bin Hu, Kunyang Huang, Daehan Kwak, Meng Xu, Kuan Huang
https://arxiv.org/abs/2509.07323

When Fine-Tuning is Not Enough: Lessons from HSAD on Hybrid and Adversarial Audio Spoof Detection
The rapid advancement of AI has enabled highly realistic speech synthesis and voice cloning, posing serious risks to voice authentication, smart assistants, and telecom security. While most prior work frames spoof detection as a binary task, real-world attacks often involve hybrid utterances that mix genuine and synthetic speech, making detection substantially more challenging. To address this gap, we introduce the Hybrid Spoofed Audio Dataset (HSAD), a benchmark containing 1,248 clean and 41,0…

@arXiv_csCL_bot@mastoxiv.page
2025-08-13 10:12:52

A Survey on Training-free Alignment of Large Language Models
Birong Pan, Yongqi Li, Weiyu Zhang, Wenpeng Lu, Mayi Xu, Shen Zhou, Yuanyuan Zhu, Ming Zhong, Tieyun Qian
https://arxiv.org/abs/2508.09016

A Survey on Training-free Alignment of Large Language Models
The alignment of large language models (LLMs) aims to ensure their outputs adhere to human values, ethical standards, and legal norms. Traditional alignment methods often rely on resource-intensive fine-tuning (FT), which may suffer from knowledge degradation and face challenges in scenarios where the model accessibility or computational resources are constrained. In contrast, training-free (TF) alignment techniques--leveraging in-context learning, decoding-time adjustments, and post-generation…

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 14:30:10

Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[1/4]:
- Privacy-Preserving Parameter-Efficient Fine-Tuning for Large Language Model Services
Yansong Li, Zhixing Tan, Paula Branco, Yang Liu

@arXiv_csLG_bot@mastoxiv.page
2025-09-12 10:09:49

Conditioning on PDE Parameters to Generalise Deep Learning Emulation of Stochastic and Chaotic Dynamics
Ira J. S. Shokar, Rich R. Kerswell, Peter H. Haynes
https://arxiv.org/abs/2509.09599

Conditioning on PDE Parameters to Generalise Deep Learning Emulation of Stochastic and Chaotic Dynamics
We present a deep learning emulator for stochastic and chaotic spatio-temporal systems, explicitly conditioned on the parameter values of the underlying partial differential equations (PDEs). Our approach involves pre-training the model on a single parameter domain, followed by fine-tuning on a smaller, yet diverse dataset, enabling generalisation across a broad range of parameter values. By incorporating local attention mechanisms, the network is capable of handling varying domain sizes and re…

@arXiv_csRO_bot@mastoxiv.page
2025-09-11 09:37:43

Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation
Steven Yang, Xiaoyu Tian, Kshitij Goel, Wennie Tabib
https://arxiv.org/abs/2509.08159

Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation
This paper presents a methodology to predict metric depth from monocular RGB images and an inertial measurement unit (IMU). To enable collision avoidance during autonomous flight, prior works either leverage heavy sensors (e.g., LiDARs or stereo cameras) or data-intensive and domain-specific fine-tuning of monocular metric depth estimation methods. In contrast, we propose several lightweight zero-shot rescaling strategies to obtain metric depth from relative depth estimates via the sparse 3D fe…

@arXiv_csCL_bot@mastoxiv.page
2025-09-11 09:37:13

Low-Resource Fine-Tuning for Multi-Task Structured Information Extraction with a Billion-Parameter Instruction-Tuned Model
Yu Cheng Chih, Yong Hao Hou
https://arxiv.org/abs/2509.08381

Low-Resource Fine-Tuning for Multi-Task Structured Information Extraction with a Billion-Parameter Instruction-Tuned Model
Deploying large language models (LLMs) for structured data extraction in domains such as financial compliance reporting, legal document analytics, and multilingual knowledge base construction is often impractical for smaller teams due to the high cost of running large architectures and the difficulty of preparing large, high-quality datasets. Most recent instruction-tuning studies focus on seven-billion-parameter or larger models, leaving limited evidence on whether much smaller models can work…

@arXiv_csAI_bot@mastoxiv.page
2025-10-08 09:59:59

MetaVLA: Unified Meta Co-training For Efficient Embodied Adaption
Chen Li, Zhantao Yang, Han Zhang, Fangyi Chen, Chenchen Zhu, Anudeepsekhar Bolimera, Marios Savvides
https://arxiv.org/abs/2510.05580

MetaVLA: Unified Meta Co-training For Efficient Embodied Adaption
Vision-Language-Action (VLA) models show promise in embodied reasoning, yet remain far from true generalists-they often require task-specific fine-tuning, and generalize poorly to unseen tasks. We propose MetaVLA, a unified, backbone-agnostic post-training framework for efficient and scalable alignment. MetaVLA introduces Context-Aware Meta Co-Training, which consolidates diverse target tasks into a single fine-tuning stage while leveraging structurally diverse auxiliary tasks to improve in-dom…

@arXiv_csCL_bot@mastoxiv.page
2025-08-14 07:45:52

Leveraging Large Language Models for Rare Disease Named Entity Recognition
Nan Miles Xi, Yu Deng, Lin Wang
https://arxiv.org/abs/2508.09323 https://arxiv.o…

Leveraging Large Language Models for Rare Disease Named Entity Recognition
Named Entity Recognition (NER) in the rare disease domain poses unique challenges due to limited labeled data, semantic ambiguity between entity types, and long-tail distributions. In this study, we evaluate the capabilities of GPT-4o for rare disease NER under low-resource settings, using a range of prompt-based strategies including zero-shot prompting, few-shot in-context learning, retrieval-augmented generation (RAG), and task-level fine-tuning. We design a structured prompting framework tha…

@arXiv_csCV_bot@mastoxiv.page
2025-08-11 10:17:39

Effective Training Data Synthesis for Improving MLLM Chart Understanding
Yuwei Yang, Zeyu Zhang, Yunzhong Hou, Zhuowan Li, Gaowen Liu, Ali Payani, Yuan-Sen Ting, Liang Zheng
https://arxiv.org/abs/2508.06492

Effective Training Data Synthesis for Improving MLLM Chart Understanding
Being able to effectively read scientific plots, or chart understanding, is a central part toward building effective agents for science. However, existing multimodal large language models (MLLMs), especially open-source ones, are still falling behind with a typical success rate of 30%-50% on challenging benchmarks. Previous studies on fine-tuning MLLMs with synthetic charts are often restricted by their inadequate similarity to the real charts, which could compromise model training and performa…

@arXiv_csLG_bot@mastoxiv.page
2025-08-12 11:39:23

Surgical Knowledge Rewrite in Compact LLMs: An 'Unlearn-then-Learn' Strategy with ($IA^3$) for Localized Factual Modulation and Catastrophic Forgetting Mitigation
Stanley Ngugi
https://arxiv.org/abs/2508.07075

Surgical Knowledge Rewrite in Compact LLMs: An 'Unlearn-then-Learn' Strategy with ($IA^3$) for Localized Factual Modulation and Catastrophic Forgetting Mitigation
Large Language Models (LLMs) struggle with dynamic knowledge updates, especially when new information conflicts with deeply embedded facts. Such conflicting factual edits often lead to two critical issues: resistance to adopting the new fact and severe catastrophic forgetting of unrelated knowledge. This paper introduces and evaluates a novel "unlearn-then-learn" strategy for precise knowledge editing in LLMs, leveraging the parameter-efficient fine-tuning (PEFT) technique, Infused Adapter by I…

@arXiv_csCR_bot@mastoxiv.page
2025-10-06 09:47:19

Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs
Zhixin Xie, Xurui Song, Jun Luo
https://arxiv.org/abs/2510.02833 https://arxiv.org/pdf…

Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs
Despite substantial efforts in safety alignment, recent research indicates that Large Language Models (LLMs) remain highly susceptible to jailbreak attacks. Among these attacks, finetuning-based ones that compromise LLMs' safety alignment via fine-tuning stand out due to its stable jailbreak performance. In particular, a recent study indicates that fine-tuning with as few as 10 harmful question-answer (QA) pairs can lead to successful jailbreaking across various harmful questions. However, such…

@arXiv_csCL_bot@mastoxiv.page
2025-08-14 09:50:22

Assessing the Feasibility of Lightweight Whisper Models for Low-Resource Urdu Transcription
Abdul Rehman Antall, Naveed Akhtar
https://arxiv.org/abs/2508.09865 https://

Assessing the Feasibility of Lightweight Whisper Models for Low-Resource Urdu Transcription
This study evaluates the feasibility of lightweight Whisper models (Tiny, Base, Small) for Urdu speech recognition in low-resource settings. Despite Urdu being the 10th most spoken language globally with over 230 million speakers, its representation in automatic speech recognition (ASR) systems remains limited due to dialectal diversity, code-switching, and sparse training data. We benchmark these models on a curated Urdu dataset using word error rate (WER), without fine-tuning. Results show Wh…

@arXiv_csCV_bot@mastoxiv.page
2025-09-08 09:49:50

Leveraging Transfer Learning and Mobile-enabled Convolutional Neural Networks for Improved Arabic Handwritten Character Recognition
Mohsine El Khayati, Ayyad Maafiri, Yassine Himeur, Hamzah Ali Alkhazaleh, Shadi Atalla, Wathiq Mansoor
https://arxiv.org/abs/2509.05019

Leveraging Transfer Learning and Mobile-enabled Convolutional Neural Networks for Improved Arabic Handwritten Character Recognition
The study explores the integration of transfer learning (TL) with mobile-enabled convolutional neural networks (MbNets) to enhance Arabic Handwritten Character Recognition (AHCR). Addressing challenges like extensive computational requirements and dataset scarcity, this research evaluates three TL strategies--full fine-tuning, partial fine-tuning, and training from scratch--using four lightweight MbNets: MobileNet, SqueezeNet, MnasNet, and ShuffleNet. Experiments were conducted on three benchma…

@arXiv_csCL_bot@mastoxiv.page
2025-09-09 12:09:22

UNH at CheckThat! 2025: Fine-tuning Vs Prompting in Claim Extraction
Joe Wilder, Nikhil Kadapala, Benji Xu, Mohammed Alsaadi, Aiden Parsons, Mitchell Rogers, Palash Agarwal, Adam Hassick, Laura Dietz
https://arxiv.org/abs/2509.06883

UNH at CheckThat! 2025: Fine-tuning Vs Prompting in Claim Extraction
We participate in CheckThat! Task 2 English and explore various methods of prompting and in-context learning, including few-shot prompting and fine-tuning with different LLM families, with the goal of extracting check-worthy claims from social media passages. Our best METEOR score is achieved by fine-tuning a FLAN-T5 model. However, we observe that higher-quality claims can sometimes be extracted using other methods, even when their METEOR scores are lower.

@arXiv_csLG_bot@mastoxiv.page
2025-09-10 10:22:31

Systematic Optimization of Open Source Large Language Models for Mathematical Reasoning
Pranav Pawar, Dhwaj Jain, Varun Gupta, Kaustav Dedhia, Dashrath Kale, Sudhir Dhekane
https://arxiv.org/abs/2509.07238

Systematic Optimization of Open Source Large Language Models for Mathematical Reasoning
This paper presents a practical investigation into fine-tuning model parameters for mathematical reasoning tasks through experimenting with various configurations including randomness control, reasoning depth, and sampling strategies, careful tuning demonstrates substantial improvements in efficiency as well as performance. A holistically optimized framework is introduced for five state-of-the-art models on mathematical reasoning tasks, exhibiting significant performance boosts while maintainin…

@arXiv_csCL_bot@mastoxiv.page
2025-09-11 08:09:13

MVPBench: A Benchmark and Fine-Tuning Framework for Aligning Large Language Models with Diverse Human Values
Yao Liang, Dongcheng Zhao, Feifei Zhao, Guobin Shen, Yuwei Wang, Dongqi Liang, Yi Zeng
https://arxiv.org/abs/2509.08022

MVPBench: A Benchmark and Fine-Tuning Framework for Aligning Large Language Models with Diverse Human Values
The alignment of large language models (LLMs) with human values is critical for their safe and effective deployment across diverse user populations. However, existing benchmarks often neglect cultural and demographic diversity, leading to limited understanding of how value alignment generalizes globally. In this work, we introduce MVPBench, a novel benchmark that systematically evaluates LLMs' alignment with multi-dimensional human value preferences across 75 countries. MVPBench contains 24,020…

@arXiv_csCV_bot@mastoxiv.page
2025-09-01 09:43:42

Federated Fine-tuning of SAM-Med3D for MRI-based Dementia Classification
Kaouther Mouheb, Marawan Elbatel, Janne Papma, Geert Jan Biessels, Jurgen Claassen, Huub Middelkoop, Barbara van Munster, Wiesje van der Flier, Inez Ramakers, Stefan Klein, Esther E. Bron
https://arxiv.org/abs/2508.21458

Federated Fine-tuning of SAM-Med3D for MRI-based Dementia Classification
While foundation models (FMs) offer strong potential for AI-based dementia diagnosis, their integration into federated learning (FL) systems remains underexplored. In this benchmarking study, we systematically evaluate the impact of key design choices: classification head architecture, fine-tuning strategy, and aggregation method, on the performance and efficiency of federated FM tuning using brain MRI data. Using a large multi-cohort dataset, we find that the architecture of the classification…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:10:19

FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts
Heming Zou, Yunliang Zang, Wutong Xu, Yao Zhu, Xiangyang Ji
https://arxiv.org/abs/2510.08396

FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts
Low-Rank Adaptation (LoRA) is a widely used parameter-efficient fine-tuning method for foundation models, but it suffers from parameter interference, resulting in suboptimal performance. Although Mixture-of-Experts (MoE)-based LoRA variants show promise in mitigating intra-task correlations in single-task instruction tuning, they introduce additional router parameters and remain ineffective in multi-task model merging where inter-task interference arises. Inspired by the fly olfactory circuit, …

@arXiv_csCL_bot@mastoxiv.page
2025-09-08 10:11:00

A Study of Large Language Models for Patient Information Extraction: Model Architecture, Fine-Tuning Strategy, and Multi-task Instruction Tuning
Cheng Peng, Xinyu Dong, Mengxian Lyu, Daniel Paredes, Yaoyun Zhang, Yonghui Wu
https://arxiv.org/abs/2509.04753

A Study of Large Language Models for Patient Information Extraction: Model Architecture, Fine-Tuning Strategy, and Multi-task Instruction Tuning
Natural language processing (NLP) is a key technology to extract important patient information from clinical narratives to support healthcare applications. The rapid development of large language models (LLMs) has revolutionized many NLP tasks in the clinical domain, yet their optimal use in patient information extraction tasks requires further exploration. This study examines LLMs' effectiveness in patient information extraction, focusing on LLM architectures, fine-tuning strategies, and multi…

@arXiv_csCR_bot@mastoxiv.page
2025-09-01 08:42:32

zkLoRA: Fine-Tuning Large Language Models with Verifiable Security via Zero-Knowledge Proofs
Guofu Liao, Taotao Wang, Shengli Zhang, Jiqun Zhang, Shi Long, Dacheng Tao
https://arxiv.org/abs/2508.21393 …

zkLoRA: Fine-Tuning Large Language Models with Verifiable Security via Zero-Knowledge Proofs
Fine-tuning large language models (LLMs) is crucial for adapting them to specific tasks, yet it remains computationally demanding and raises concerns about correctness and privacy, particularly in untrusted environments. Although parameter-efficient methods like Low-Rank Adaptation (LoRA) significantly reduce resource requirements, ensuring the security and verifiability of fine-tuning under zero-knowledge constraints remains an unresolved challenge. To address this, we introduce zkLoRA, the fi…

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:29:39

AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
Yurun Song, Zhuoyi Yang, Ian G. Harris, Sangeetha Abdu Jyothi
https://arxiv.org/abs/2510.05468

AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
Large Language Models (LLMs) are scaling rapidly, creating significant challenges for collaborative server client distributed training, particularly in terms of communication efficiency and computational overheads. To address these challenges, we implement Parameter-efficient Split Learning, which effectively balances efficiency and performance for collaborative training on low-resource devices. To reduce communication overhead in collaborative training, we introduce Adaptive Mixed bit Activa…

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:18:02

Resource-Efficient Fine-Tuning of LLaMA-3.2-3B for Medical Chain-of-Thought Reasoning
Imran Mansha
https://arxiv.org/abs/2510.05003 https://arxiv.org/pdf/2…

Resource-Efficient Fine-Tuning of LLaMA-3.2-3B for Medical Chain-of-Thought Reasoning
Large Language Models (LLMs) such as GPT-4 and LLaMA have demonstrated remarkable reasoning abilities but require significant computational resources for fine-tuning. This paper presents a resource-efficient fine-tuning approach for LLaMA-3.2-3B to enhance medical chain-of-thought reasoning while operating under constrained GPU and memory settings. Using parameter-efficient tuning techniques such as LoRA and QLoRA, we adapt the base model on publicly available medical reasoning datasets. The mo…

@arXiv_csCV_bot@mastoxiv.page
2025-10-06 10:05:39

Multimodal Carotid Risk Stratification with Large Vision-Language Models: Benchmarking, Fine-Tuning, and Clinical Insights
Daphne Tsolissou, Theofanis Ganitidis, Konstantinos Mitsis, Stergios CHristodoulidis, Maria Vakalopoulou, Konstantina Nikita
https://arxiv.org/abs/2510.02922

Multimodal Carotid Risk Stratification with Large Vision-Language Models: Benchmarking, Fine-Tuning, and Clinical Insights
Reliable risk assessment for carotid atheromatous disease remains a major clinical challenge, as it requires integrating diverse clinical and imaging information in a manner that is transparent and interpretable to clinicians. This study investigates the potential of state-of-the-art and recent large vision-language models (LVLMs) for multimodal carotid plaque assessment by integrating ultrasound imaging (USI) with structured clinical, demographic, laboratory, and protein biomarker data. A fram…

@arXiv_csCL_bot@mastoxiv.page
2025-09-11 07:41:33

AntiDote: Bi-level Adversarial Training for Tamper-Resistant LLMs
Debdeep Sanyal, Manodeep Ray, Murari Mandal
https://arxiv.org/abs/2509.08000 https://arxi…

AntiDote: Bi-level Adversarial Training for Tamper-Resistant LLMs
The release of open-weight large language models (LLMs) creates a tension between advancing accessible research and preventing misuse, such as malicious fine-tuning to elicit harmful content. Current safety measures struggle to preserve the general capabilities of the LLM while resisting a determined adversary with full access to the model's weights and architecture, who can use full-parameter fine-tuning to erase existing safeguards. To address this, we introduce AntiDote, a bi-level optimizat…

@arXiv_csLG_bot@mastoxiv.page
2025-09-11 10:14:13

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Jeffrey Amico, Gabriel Passamani Andrade, John Donaghy, Ben Fielding, Tristin Forbus, Harry Grieve, Semih Kara, Jari Kolehmainen, Yihua Lou, Christopher Nies, Edward Phillip Flores Nu\~no, Diogo Ortega, Shikhar Rastogi, Austin Virts, Matthew J. Wright
https://

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Post-training language models (LMs) with reinforcement learning (RL) can enhance their complex reasoning capabilities without supervised fine-tuning, as demonstrated by DeepSeek-R1-Zero. However, effectively utilizing RL for LMs requires significant parallelization to scale-up inference, which introduces non-trivial technical challenges (e.g. latency, memory, and reliability) alongside ever-growing financial costs. We present Swarm sAmpling Policy Optimization (SAPO), a fully decentralized and …

@arXiv_csCR_bot@mastoxiv.page
2025-10-03 07:34:30

Fine-Tuning Jailbreaks under Highly Constrained Black-Box Settings: A Three-Pronged Approach
Xiangfang Li, Yu Wang, Bo Li
https://arxiv.org/abs/2510.01342 https://

Fine-Tuning Jailbreaks under Highly Constrained Black-Box Settings: A Three-Pronged Approach
With the rapid advancement of large language models (LLMs), ensuring their safe use becomes increasingly critical. Fine-tuning is a widely used method for adapting models to downstream tasks, yet it is vulnerable to jailbreak attacks. However, most existing studies focus on overly simplified attack scenarios, limiting their practical relevance to real-world defense settings. To make this risk concrete, we present a three-pronged jailbreak attack and evaluate it against provider defenses under a…

@arXiv_csCV_bot@mastoxiv.page
2025-10-10 11:15:59

SliceFine: The Universal Winning-Slice Hypothesis for Pretrained Networks
Md Kowsher, Ali O. Polat, Ehsan Mohammady Ardehaly, Mehrdad Salehi, Zia Ghiasi, Prasanth Murali, Chen Chen
https://arxiv.org/abs/2510.08513

SliceFine: The Universal Winning-Slice Hypothesis for Pretrained Networks
This paper presents a theoretical framework explaining why fine tuning small, randomly selected subnetworks (slices) within pre trained models can be sufficient for downstream adaptation. We prove that pretrained networks exhibit a universal winning slice property arising from two phenomena: (1) spectral balance the eigenspectra of different weight matrix slices are remarkably similar; and (2) high task energy their backbone representations retain rich, task relevant features. This leads to the…

@arXiv_csCL_bot@mastoxiv.page
2025-09-08 10:12:50

L1RA: Dynamic Rank Assignment in LoRA Fine-Tuning
Raul Singh, Nicolo Brunello, Vincenzo Scotti, Mark James Carman
https://arxiv.org/abs/2509.04884 https://…

L1RA: Dynamic Rank Assignment in LoRA Fine-Tuning
The ability of Large Language Models (LLMs) to solve complex tasks has made them crucial in the development of AI-based applications. However, the high computational requirements to fine-tune these LLMs on downstream tasks pose significant challenges, particularly when resources are limited. In response to this challenge, we introduce L1RA, a novel technique aimed at dynamically distributing the rank of low-rank adapters during fine-tuning using LoRA. Given a rank budget (i.e., total sum of ada…

@arXiv_csLG_bot@mastoxiv.page
2025-09-11 10:15:03

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Zhiheng Xi, Jixuan Huang, Chenyang Liao, Baodai Huang, Honglin Guo, Jiaqi Liu, Rui Zheng, Junjie Ye, Jiazheng Zhang, Wenxiang Chen, Wei He, Yiwen Ding, Guanyu Li, Zehui Chen, Zhengyin Du, Xuesong Yao, Yufei Xu, Jiecao Chen, Tao Gui, Zuxuan Wu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Developing autonomous LLM agents capable of making a series of intelligent decisions to solve complex, real-world tasks is a fast-evolving frontier. Like human cognitive development, agents are expected to acquire knowledge and skills through exploration and interaction with the environment. Despite advances, the community still lacks a unified, interactive reinforcement learning (RL) framework that can effectively train such agents from scratch -- without relying on supervised fine-tuning (SFT…

@arXiv_csCL_bot@mastoxiv.page
2025-09-12 07:41:09

Noise or Nuance: An Investigation Into Useful Information and Filtering For LLM Driven AKBC
Alex Clay, Ernesto Jim\'enez-Ruiz, Pranava Madhyastha
https://arxiv.org/abs/2509.08903

Noise or Nuance: An Investigation Into Useful Information and Filtering For LLM Driven AKBC
RAG and fine-tuning are prevalent strategies for improving the quality of LLM outputs. However, in constrained situations, such as that of the 2025 LM-KBC challenge, such techniques are restricted. In this work we investigate three facets of the triple completion task: generation, quality assurance, and LLM response parsing. Our work finds that in this constrained setting: additional information improves generation quality, LLMs can be effective at filtering poor quality triples, and the tradeo…

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 10:24:31

Are LLMs Enough for Hyperpartisan, Fake, Polarized and Harmful Content Detection? Evaluating In-Context Learning vs. Fine-Tuning
Michele Joshua Maggini, Dhia Merzougui, Rabiraj Bandyopadhyay, Ga\"el Dias, Fabrice Maurel, Pablo Gamallo
https://arxiv.org/abs/2509.07768

Are LLMs Enough for Hyperpartisan, Fake, Polarized and Harmful Content Detection? Evaluating In-Context Learning vs. Fine-Tuning
The spread of fake news, polarizing, politically biased, and harmful content on online platforms has been a serious concern. With large language models becoming a promising approach, however, no study has properly benchmarked their performance across different models, usage methods, and languages. This study presents a comprehensive overview of different Large Language Models adaptation paradigms for the detection of hyperpartisan and fake news, harmful tweets, and political bias. Our experimen…

@arXiv_csLG_bot@mastoxiv.page
2025-10-09 10:50:21

A Multi-Agent Framework for Stateful Inference-Time Search
Arshika Lalan, Rajat Ghosh, Aditya Kolsur, Debojyoti Dutta
https://arxiv.org/abs/2510.07147 https://

A Multi-Agent Framework for Stateful Inference-Time Search
Recent work explores agentic inference-time techniques to perform structured, multi-step reasoning. However, stateless inference often struggles on multi-step tasks due to the absence of persistent state. Moreover, task-specific fine-tuning or instruction-tuning often achieve surface-level code generation but remain brittle on tasks requiring deeper reasoning and long-horizon dependencies. To address these limitations, we propose stateful multi-agent evolutionary search, a training-free framewo…

@arXiv_csCL_bot@mastoxiv.page
2025-09-12 09:01:59

Improving LLM Safety and Helpfulness using SFT and DPO: A Study on OPT-350M
Piyush Pant
https://arxiv.org/abs/2509.09055 https://arxiv.org/pdf/2509.09055…

Improving LLM Safety and Helpfulness using SFT and DPO: A Study on OPT-350M
This research investigates the effectiveness of alignment techniques, Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and a combined SFT+DPO approach on improving the safety and helpfulness of the OPT-350M language model. Utilizing the Anthropic Helpful-Harmless RLHF dataset, we train and evaluate four models: the base OPT350M, an SFT model, a DPO model, and a model trained with both SFT and DPO. We introduce three key evaluation metrics: Harmlessness Rate (HmR), Helpfulness…

@arXiv_csLG_bot@mastoxiv.page
2025-09-05 10:25:21

RL's Razor: Why Online Reinforcement Learning Forgets Less
Idan Shenfeld, Jyothish Pari, Pulkit Agrawal
https://arxiv.org/abs/2509.04259 https://arxiv.…

RL's Razor: Why Online Reinforcement Learning Forgets Less
Comparison of fine-tuning models with reinforcement learning (RL) and supervised fine-tuning (SFT) reveals that, despite similar performance at a new task, RL preserves prior knowledge and capabilities significantly better. We find that the degree of forgetting is determined by the distributional shift, measured as the KL-divergence between the fine-tuned and base policy evaluated on the new task. Our analysis reveals that on-policy RL is implicitly biased towards KL-minimal solutions among the…

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 10:30:01

From Detection to Mitigation: Addressing Gender Bias in Chinese Texts via Efficient Tuning and Voting-Based Rebalancing
Chengyan Wu, Yiqiang Cai, Yufei Cheng, Yun Xue
https://arxiv.org/abs/2509.07889

From Detection to Mitigation: Addressing Gender Bias in Chinese Texts via Efficient Tuning and Voting-Based Rebalancing
This paper presents our team's solution to Shared Task 7 of NLPCC-2025, which focuses on sentence-level gender bias detection and mitigation in Chinese. The task aims to promote fairness and controllability in natural language generation by automatically detecting, classifying, and mitigating gender bias. To address this challenge, we adopt a fine-tuning approach based on large language models (LLMs), efficiently adapt to the bias detection task via Low-Rank Adaptation (LoRA). In terms of data …

@arXiv_csCL_bot@mastoxiv.page
2025-08-11 10:02:49

Learning the Topic, Not the Language: How LLMs Classify Online Immigration Discourse Across Languages
Andrea Nasuto, Stefano Maria Iacus, Francisco Rowe, Devika Jain
https://arxiv.org/abs/2508.06435

Learning the Topic, Not the Language: How LLMs Classify Online Immigration Discourse Across Languages
Large language models (LLMs) are transforming social-science research by enabling scalable, precise analysis. Their adaptability raises the question of whether knowledge acquired through fine-tuning in a few languages can transfer to unseen languages that only appeared during pre-training. To examine this, we fine-tune lightweight LLaMA 3.2-3B models on monolingual, bilingual, or multilingual data sets to classify immigration-related tweets from X/Twitter across 13 languages, a domain character…

@arXiv_csLG_bot@mastoxiv.page
2025-10-02 11:10:31

Sample-Efficient Differentially Private Fine-Tuning via Gradient Matrix Denoising
Ali Dadsetan, Frank Rudzicz
https://arxiv.org/abs/2510.01137 https://arxi…

Sample-Efficient Differentially Private Fine-Tuning via Gradient Matrix Denoising
We address the challenge of sample efficiency in differentially private fine-tuning of large language models (LLMs) using DP-SGD. While DP-SGD provides strong privacy guarantees, the added noise significantly increases the entropy of gradient matrices, disrupting their low-rank structure and slowing optimization. We propose a post-processing algorithm that leverages random matrix theory to denoise gradients, restore low-rank structure, and improve alignment with the original signal. Applied to …

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 08:51:41

Towards EnergyGPT: A Large Language Model Specialized for the Energy Sector
Amal Chebbi, Babajide Kolade
https://arxiv.org/abs/2509.07177 https://arxiv.org…

Towards EnergyGPT: A Large Language Model Specialized for the Energy Sector
Large Language Models have demonstrated impressive capabilities across various domains. However, their general-purpose nature often limits their effectiveness in specialized fields such as energy, where deep technical expertise and precise domain knowledge are essential. In this paper, we introduce EnergyGPT, a domain-specialized language model tailored for the energy sector, developed by fine-tuning LLaMA 3.1-8B model using Supervised Fine-Tuning on a high-quality, curated corpus of energy-rel…

@arXiv_csLG_bot@mastoxiv.page
2025-09-30 14:45:31

TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion
Sophia Tang, Yuchen Zhu, Molei Tao, Pranam Chatterjee
https://arxiv.org/abs/2509.25171 https://

TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion
Reinforcement learning with stochastic optimal control offers a promising framework for diffusion fine-tuning, where a pre-trained diffusion model is optimized to generate paths that lead to a reward-tilted distribution. While these approaches enable optimization without access to explicit samples from the optimal distribution, they require training on rollouts under the current fine-tuned model, making them susceptible to reinforcing sub-optimal trajectories that yield poor rewards. To overcom…

@arXiv_csCL_bot@mastoxiv.page
2025-09-12 09:53:49

DeMeVa at LeWiDi-2025: Modeling Perspectives with In-Context Learning and Label Distribution Learning
Daniil Ignatev, Nan Li, Hugh Mee Wong, Anh Dang, Shane Kaszefski Yaschuk
https://arxiv.org/abs/2509.09524

DeMeVa at LeWiDi-2025: Modeling Perspectives with In-Context Learning and Label Distribution Learning
This system paper presents the DeMeVa team's approaches to the third edition of the Learning with Disagreements shared task (LeWiDi 2025; Leonardelli et al., 2025). We explore two directions: in-context learning (ICL) with large language models, where we compare example sampling strategies; and label distribution learning (LDL) methods with RoBERTa (Liu et al., 2019b), where we evaluate several fine-tuning methods. Our contributions are twofold: (1) we show that ICL can effectively predict anno…

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:58:29

Influence Functions for Efficient Data Selection in Reasoning
Prateek Humane, Paolo Cudrano, Daniel Z. Kaplan, Matteo Matteucci, Supriyo Chakraborty, Irina Rish
https://arxiv.org/abs/2510.06108

Influence Functions for Efficient Data Selection in Reasoning
Fine-tuning large language models (LLMs) on chain-of-thought (CoT) data shows that a small amount of high-quality data can outperform massive datasets. Yet, what constitutes "quality" remains ill-defined. Existing reasoning methods rely on indirect heuristics such as problem difficulty or trace length, while instruction-tuning has explored a broader range of automated selection strategies, but rarely in the context of reasoning. We propose to define reasoning data quality using influence functi…

@arXiv_csCL_bot@mastoxiv.page
2025-09-09 12:05:52

Anchoring Refusal Direction: Mitigating Safety Risks in Tuning via Projection Constraint
Yanrui Du, Fenglei Fan, Sendong Zhao, Jiawei Cao, Qika Lin, Kai He, Ting Liu, Bing Qin, Mengling Feng
https://arxiv.org/abs/2509.06795

Anchoring Refusal Direction: Mitigating Safety Risks in Tuning via Projection Constraint
Instruction Fine-Tuning (IFT) has been widely adopted as an effective post-training strategy to enhance various abilities of Large Language Models (LLMs). However, prior studies have shown that IFT can significantly compromise LLMs' safety, particularly their ability to refuse malicious instructions, raising significant concerns. Recent research into the internal mechanisms of LLMs has identified the refusal direction (r-direction) in the hidden states, which plays a pivotal role in governing r…

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 09:53:51

Does This Look Familiar to You? Knowledge Analysis via Model Internal Representations
Sihyun Park
https://arxiv.org/abs/2509.07311 https://arxiv.org/pdf/25…

Does This Look Familiar to You? Knowledge Analysis via Model Internal Representations
Recent advances in large language models (LLMs) have been driven by pretraining, supervised fine tuning (SFT), and alignment tuning. Among these, SFT plays a crucial role in transforming a model 's general knowledge into structured responses tailored to specific tasks. However, there is no clearly established methodology for effective training data selection. Simply increasing the volume of data does not guarantee performance improvements, while preprocessing, sampling, and validation require s…

@arXiv_csCL_bot@mastoxiv.page
2025-09-01 09:48:52

Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance
Yao Wang, Di Liang, Minlong Peng
https://arxiv.org/abs/2508.21741 https://

Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance
Supervised fine-tuning (SFT) is a pivotal approach to adapting large language models (LLMs) for downstream tasks; however, performance often suffers from the ``seesaw phenomenon'', where indiscriminate parameter updates yield progress on certain tasks at the expense of others. To address this challenge, we propose a novel \emph{Core Parameter Isolation Fine-Tuning} (CPI-FT) framework. Specifically, we first independently fine-tune the LLM on each task to identify its core parameter regions by q…

@arXiv_csCL_bot@mastoxiv.page
2025-09-05 10:07:41

SPFT-SQL: Enhancing Large Language Model for Text-to-SQL Parsing by Self-Play Fine-Tuning
Yuhao Zhang, Shaoming Duan, Jinhang Su, Chuanyi Liu, Peiyi Han
https://arxiv.org/abs/2509.03937

SPFT-SQL: Enhancing Large Language Model for Text-to-SQL Parsing by Self-Play Fine-Tuning
Despite the significant advancements of self-play fine-tuning (SPIN), which can transform a weak large language model (LLM) into a strong one through competitive interactions between models of varying capabilities, it still faces challenges in the Text-to-SQL task. SPIN does not generate new information, and the large number of correct SQL queries produced by the opponent model during self-play reduces the main model's ability to generate accurate SQL queries. To address this challenge, we prop…

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 10:28:31

Small Open Models Achieve Near Parity with Large Models in Low Resource Literary Translation at a Fraction of the Cost
Mihai Nadas, Laura Diosan, Andreea Tomescu, Andrei Piscoran
https://arxiv.org/abs/2509.07829

Small Open Models Achieve Near Parity with Large Models in Low Resource Literary Translation at a Fraction of the Cost
Literary translation has recently gained attention as a distinct and complex task in machine translation research. However, the translation by small open models remains an open problem. We contribute to this ongoing research by introducing TINYFABULIST TRANSLATION FRAMEWORK (TF2), a unified framework for dataset creation, fine tuning, and evaluation in English-Romanian literary translations, centred on the creation and open release of both a compact, fine tuned language model (TF2-12B) and larg…

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 10:19:11

ALLabel: Three-stage Active Learning for LLM-based Entity Recognition using Demonstration Retrieval
Zihan Chen, Lei Shi, Weize Wu, Qiji Zhou, Yue Zhang
https://arxiv.org/abs/2509.07512

ALLabel: Three-stage Active Learning for LLM-based Entity Recognition using Demonstration Retrieval
Many contemporary data-driven research efforts in the natural sciences, such as chemistry and materials science, require large-scale, high-performance entity recognition from scientific datasets. Large language models (LLMs) have increasingly been adopted to solve the entity recognition task, with the same trend being observed on all-spectrum NLP tasks. The prevailing entity recognition LLMs rely on fine-tuned technology, yet the fine-tuning process often incurs significant cost. To achieve a b…

@arXiv_csCL_bot@mastoxiv.page
2025-08-11 10:04:19

Post-training for Efficient Communication via Convention Formation
Yilun Hua, Evan Wang, Yoav Artzi
https://arxiv.org/abs/2508.06482 https://arxiv.org/pdf/…

Post-training for Efficient Communication via Convention Formation
Humans communicate with increasing efficiency in multi-turn interactions, by adapting their language and forming ad-hoc conventions. In contrast, prior work shows that LLMs do not naturally show this behavior. We develop a post-training process to develop this ability through targeted fine-tuning on heuristically identified demonstrations of convention formation. We evaluate with two new benchmarks focused on this capability. First, we design a focused, cognitively-motivated interaction benchma…

@arXiv_csCL_bot@mastoxiv.page
2025-09-09 11:50:02

MSLEF: Multi-Segment LLM Ensemble Finetuning in Recruitment
Omar Walid, Mohamed T. Younes, Khaled Shaban, Mai Hassan, Ali Hamdi
https://arxiv.org/abs/2509.06200 https://

MSLEF: Multi-Segment LLM Ensemble Finetuning in Recruitment
This paper presents MSLEF, a multi-segment ensemble framework that employs LLM fine-tuning to enhance resume parsing in recruitment automation. It integrates fine-tuned Large Language Models (LLMs) using weighted voting, with each model specializing in a specific resume segment to boost accuracy. Building on MLAR , MSLEF introduces a segment-aware architecture that leverages field-specific weighting tailored to each resume part, effectively overcoming the limitations of single-model systems by …

@arXiv_csCL_bot@mastoxiv.page
2025-09-22 10:24:11

BEFT: Bias-Efficient Fine-Tuning of Language Models
Baichuan Huang, Ananth Balashankar, Amir Aminifar
https://arxiv.org/abs/2509.15974 https://arxiv.org/pd…

BEFT: Bias-Efficient Fine-Tuning of Language Models
Fine-tuning all-bias-terms stands out among various parameter-efficient fine-tuning (PEFT) techniques, owing to its out-of-the-box usability and competitive performance, especially in low-data regimes. Bias-only fine-tuning has the potential for unprecedented parameter efficiency. However, the link between fine-tuning different bias terms (i.e., bias terms in the query, key, or value projections) and downstream performance remains unclear. The existing approaches, e.g., based on the magnitude o…

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:03:32

TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA
Chanjoo Jung, Jaehyung Kim
https://arxiv.org/abs/2510.04682 https://arxiv.o…

TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA
Large Language Models (LLMs) are widely applied in real world scenarios, but fine-tuning them comes with significant computational and storage costs. Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA mitigate these costs, but the adapted parameters are dependent on the base model and cannot be transferred across different backbones. One way to address this issue is through knowledge distillation, but its effectiveness inherently depends on training data. Recent work such as TransLoRA …

Tootfinder

Opt-in global Mastodon full text search. Join the index!