
2025-06-10 18:54:40
This https://arxiv.org/abs/2505.19912 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…
This https://arxiv.org/abs/2505.19912 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…
TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization
Zengjue Chen, Runliang Niu, He Kong, Qi Wang
https://arxiv.org/abs/2506.08440
May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks
Nishit V. Pandya, Andrey Labunets, Sicun Gao, Earlence Fernandes
https://arxiv.org/abs/2507.07417
A Study on the Fine-Tuning Performance of Universal Machine-Learned Interatomic Potentials (U-MLIPs)
Xiaoqing Liu, Kehan Zeng, Yangshuai Wang, Teng Zhao
https://arxiv.org/abs/2506.07401
This https://arxiv.org/abs/2506.02308 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
DogFit: Domain-guided Fine-tuning for Efficient Transfer Learning of Diffusion Models
Yara Bahram, Mohammadhadi Shateri, Eric Granger
https://arxiv.org/abs/2508.05685 https://…
Reinforcement Fine-Tuning for Reasoning towards Multi-Step Multi-Source Search in Large Language Models
Wentao Shi, Yiqing Shen
https://arxiv.org/abs/2506.08352
Symbiosis: Multi-Adapter Inference and Fine-Tuning
Saransh Gupta, Umesh Deshpande, Travis Janssen, Swami Sundararaman
https://arxiv.org/abs/2507.03220 http…
Narrowing the Gap: Supervised Fine-Tuning of Open-Source LLMs as a Viable Alternative to Proprietary Models for Pedagogical Tools
Lorenzo Lee Solano, Charles Koutcheme, Juho Leinonen, Alexandra Vassar, Jake Renzella
https://arxiv.org/abs/2507.05305
DMFI: Dual-Modality Fine-Tuning and Inference Framework for LLM-Based Insider Threat Detection
Kaichuan Kong, Dongjie Liu, Xiaobo Jin, Guanggang Geng, Zhiying Li, Jian Weng
https://arxiv.org/abs/2508.05694
Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning
Yujia Huo, Jianchun Liu, Hongli Xu, Zhenguo Ma, Shilong Wang, Liusheng Huang
https://arxiv.org/abs/2506.05977
KeyKnowledgeRAG (K^2RAG): An Enhanced RAG method for improved LLM question-answering capabilities
Hruday Markondapatnaikuni, Basem Suleiman, Abdelkarim Erradi, Shijing Chen
https://arxiv.org/abs/2507.07695
Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
Ziyang Wang, Jaehong Yoon, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal
https://arxiv.org/abs/2507.06485
DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search
Zerui Yang, Yuwei Wan, Yinqiao Li, Yudai Matsuda, Tong Xie, Linqi Song
https://arxiv.org/abs/2507.07426
Noise Consistency Regularization for Improved Subject-Driven Image Synthesis
Yao Ni, Song Wen, Piotr Koniusz, Anoop Cherian
https://arxiv.org/abs/2506.06483
This https://arxiv.org/abs/2408.09568 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…
Label-Efficient Chest X-ray Diagnosis via Partial CLIP Adaptation
Heet Nitinkumar Dalsania
https://arxiv.org/abs/2507.07254 https://a…
This https://arxiv.org/abs/2506.06105 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
Yangui Fang, Jing Peng, Xu Li, Yu Xi, Chengwei Zhang, Guohui Zhong, Kai Yu
https://arxiv.org/abs/2506.05671
Efficient Fireworks Algorithm Equipped with an Explosion Mechanism based on Student's T-distribution
Cen Shipeng, Tan Ying
https://arxiv.org/abs/2506.08484
This https://arxiv.org/abs/2504.04178 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…
Integrating Pathology Foundation Models and Spatial Transcriptomics for Cellular Decomposition from Histology Images
Yutong Sun, Sichen Zhu, Peng Qiu
https://arxiv.org/abs/2507.07013
Impact of First-order Electroweak Phase Transition on QCD Axion
Dipendu Bhandari, Soumen Kumar Manna, Arunansu Sil
https://arxiv.org/abs/2507.05353 https:/…
Learning the Topic, Not the Language: How LLMs Classify Online Immigration Discourse Across Languages
Andrea Nasuto, Stefano Maria Iacus, Francisco Rowe, Devika Jain
https://arxiv.org/abs/2508.06435
MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning
Liujian Tang, Shaokang Dong, Yijia Huang, Minqi Xiang, Hongtao Ruan, Bin Wang, Shuo Li, Zhihui Cao, Hailiang Pang, Heng Kong, He Yang, Mingxu Chai, Zhilin Gao, Xingyu Liu, Yingnan Fu, Jiaming Liu, Tao Gui, Xuanjing Huang, Yu-Gang Jiang, Qi Zhang, Kang Wang, Yunke Zhang, Yuran Wang
TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data
Aravind Cheruvu, Shravya Kanchi, Sifat Muhammad Abdullah, Nicholas Kong, Daphne Yao, Murtuza Jadliwala, Bimal Viswanath
https://arxiv.org/abs/2507.05660
Day 7
✅ 24 test suites, 153 tests passing.
Solid coverage across service and controller layers in my modular monorepo. Strict typing (TypeScript), full DTO validation, and realistic mocks across complex relations (TypeORM).
Next: fine-tuning error handling & exploring e2e strategies.
https://write.tyolabs.com/?p=25
Replaced article(s) found for cs.DC. https://arxiv.org/list/cs.DC/new
[1/1]:
- Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach
Timo Fudala, Vasileios Tsouvalas, Nirvana Meratnia
CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning
Dongchi Huang, Zhirui Fang, Tianle Zhang, Yihang Li, Lin Zhao, Chunhe Xia
https://arxiv.org/abs/2508.02219
On the Inherent Privacy of Zeroth Order Projected Gradient Descent
Devansh Gupta, Meisam Razaviyayn, Vatsal Sharan
https://arxiv.org/abs/2507.05610 https:/…
Position: Simulating Society Requires Simulating Thought
Chance Jiajie Li, Jiayi Wu, Zhenze Mo, Ao Qu, Yuhan Tang, Kaiya Ivy Zhao, Yulu Gan, Jie Fan, Jiangbo Yu, Jinhua Zhao, Paul Liang, Luis Alonso, Kent Larson
https://arxiv.org/abs/2506.06958
Movable Antenna Enhanced Federated Fine-Tuning of Large Language Models via Hybrid Client Selection Optimization
Yang Zhao, Yue Xiu, Chengxiao Dai, Ning Wei, Dusit Niyato
https://arxiv.org/abs/2506.00011
CLEP-DG: Contrastive Learning for Speech Emotion Domain Generalization via Soft Prompt Tuning
Jiacheng Shi, Yanfu Zhang, Ye Gao
https://arxiv.org/abs/2507.04048
Fine-tuning physics-informed neural networks for cavity flows using coordinate transformation
Ryuta Takao, Satoshi Ii
https://arxiv.org/abs/2508.01122 https://
Online Fine-Tuning of Carbon Emission Predictions using Real-Time Recurrent Learning for State Space Models
Julian Lemmel, Manuel Kranzl, Adam Lamine, Philipp Neubauer, Radu Grosu, Sophie Neubauer
https://arxiv.org/abs/2508.00804
Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets
Lei Hsiung, Tianyu Pang, Yung-Chen Tang, Linyue Song, Tsung-Yi Ho, Pin-Yu Chen, Yaoqing Yang
https://arxiv.org/abs/2506.05346
This https://arxiv.org/abs/2506.05673 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning
Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy, Basak Guler
https://arxiv.org/abs/2506.04453
A foundation model to predict and capture human cognition.
https://www.nature.com/articles/s41586-025-09215-4
And the reaction and criticism,
Fine-Tuning LLMs For ‘Good’ Behavior Makes Them More Likely To Say No https://www.404media.co/fine-tuning-llms-cognitive-bias-decision-making-study/
GradOT: Training-free Gradient-preserving Offsite-tuning for Large Language Models
Kai Yao, Zhaorui Tan, Penglei Gao, Lichun Li, Kaixin Wu, Yinggui Wang, Yuan Zhao, Yixin Ji, Wei Wang, Jianke Zhu
https://arxiv.org/abs/2507.04455
Hear-Your-Click: Interactive Video-to-Audio Generation via Object-aware Contrastive Audio-Visual Fine-tuning
Yingshan Liang, Keyu Fan, Zhicheng Du, Yiran Wang, Qingyang Shi, Xinyu Zhang, Jiasheng Lu, Peiwu Qin
https://arxiv.org/abs/2507.04959
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
Yifan Hao, Xingyuan Pan, Hanning Zhang, Chenlu Ye, Rui Pan, Tong Zhang
https://arxiv.org/abs/2506.01901
This https://arxiv.org/abs/2506.02916 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen
https://arxiv.org/abs/2506.03930
This https://arxiv.org/abs/2505.01997 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
Analysis and Optimized CXL-Attached Memory Allocation for Long-Context LLM Fine-Tuning
Yong-Cheng Liaw, Shuo-Han Chen
https://arxiv.org/abs/2507.03305 http…
This https://arxiv.org/abs/2505.22942 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…
FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model
Md Jueal Mia, M. Hadi Amini
https://arxiv.org/abs/2506.05640 https://…
Approaching Dialogue State Tracking via Aligning Speech Encoders and LLMs
\v{S}imon Sedl\'a\v{c}ek, Bolaji Yusuf, J\'an \v{S}vec, Pradyoth Hegde, Santosh Kesiraju, Old\v{r}ich Plchot, Jan \v{C}ernock\'y
https://arxiv.org/abs/2506.08633
This https://arxiv.org/abs/2505.23868 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
TTS-CtrlNet: Time varying emotion aligned text-to-speech generation with ControlNet
Jaeseok Jeong, Yuna Lee, Mingi Kwon, Youngjung Uh
https://arxiv.org/abs/2507.04349
Post-training for Efficient Communication via Convention Formation
Yilun Hua, Evan Wang, Yoav Artzi
https://arxiv.org/abs/2508.06482 https://arxiv.org/pdf/…
This https://arxiv.org/abs/2505.00347 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
Your Agent Can Defend Itself against Backdoor Attacks
Li Changjiang, Liang Jiacheng, Cao Bochuan, Chen Jinghui, Wang Ting
https://arxiv.org/abs/2506.08336 …
Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models
Marc Oriol, Quim Motger, Jordi Marco, Xavier Franch
https://arxiv.org/abs/2507.05981
Are You Listening to Me? Fine-Tuning Chatbots for Empathetic Dialogue
Paulo Ricardo Knob, Leonardo Scholler, Juliano Rigatti, Soraia Raupp Musse
https://arxiv.org/abs/2507.02537
Enhancing Food-Domain Question Answering with a Multimodal Knowledge Graph: Hybrid QA Generation and Diversity Analysis
Srihari K B, Pushpak Bhattacharyya
https://arxiv.org/abs/2507.06571
This https://arxiv.org/abs/2506.01790 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
DiWA: Diffusion Policy Adaptation with World Models
Akshay L Chandra, Iman Nematollahi, Chenguang Huang, Tim Welschehold, Wolfram Burgard, Abhinav Valada
https://arxiv.org/abs/2508.03645
This https://arxiv.org/abs/2506.05394 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…
This https://arxiv.org/abs/2410.00527 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…
Memory-Efficient Split Federated Learning for LLM Fine-Tuning on Heterogeneous Mobile Devices
Xiaopei Chen, Liang Li, Fei Ji, Wen Wu
https://arxiv.org/abs/2506.02940
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[4/5]:
- Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach
Timo Fudala, Vasileios Tsouvalas, Nirvana Meratnia
Checklist Engineering Empowers Multilingual LLM Judges
Mohammad Ghiasvand Mohammadkhani, Hamid Beigy
https://arxiv.org/abs/2507.06774 https://
This https://arxiv.org/abs/2505.21920 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCV_…
Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback
Jingyi Chen, Ju Seung Byun, Micha Elsner, Pichao Wang, Andrew Perrault
https://arxiv.org/abs/2508.03123
SafeCOMM: What about Safety Alignment in Fine-Tuned Telecom Large Language Models?
Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche, Walid Saad
https://arxiv.org/abs/2506.00062
Text-to-LoRA: Instant Transformer Adaption
Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, Robert Tjarko Lange
https://arxiv.org/abs/2506.06105 https://
WGLE:Backdoor-free and Multi-bit Black-box Watermarking for Graph Neural Networks
Tingzhi Li, Xuefeng Liu
https://arxiv.org/abs/2506.08602 https://
Whispers of Many Shores: Cultural Alignment through Collaborative Cultural Expertise
Shuai Feng, Wei-Chuang Chan, Srishti Chouhan, Junior Francisco Garcia Ayala, Srujananjali Medicherla, Kyle Clark, Mingwei Shi
https://arxiv.org/abs/2506.00242
EXPO: Stable Reinforcement Learning with Expressive Policies
Perry Dong, Qiyang Li, Dorsa Sadigh, Chelsea Finn
https://arxiv.org/abs/2507.07986 https://arxiv.org/pdf/2507.07986 https://arxiv.org/html/2507.07986
arXiv:2507.07986v1 Announce Type: new
Abstract: We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a unique challenge of stable value maximization. Unlike simpler Gaussian policies commonly used in online RL, expressive policies like diffusion and flow-matching policies are parameterized by a long denoising chain, which hinders stable gradient propagation from actions to policy parameters when optimizing against some value function. Our key insight is that we can address stable value maximization by avoiding direct optimization over value with the expressive policy and instead construct an on-the-fly RL policy to maximize Q-value. We propose Expressive Policy Optimization (EXPO), a sample-efficient online RL algorithm that utilizes an on-the-fly policy to maximize value with two parameterized policies -- a larger expressive base policy trained with a stable imitation learning objective and a light-weight Gaussian edit policy that edits the actions sampled from the base policy toward a higher value distribution. The on-the-fly policy optimizes the actions from the base policy with the learned edit policy and chooses the value maximizing action from the base and edited actions for both sampling and temporal-difference (TD) backup. Our approach yields up to 2-3x improvement in sample efficiency on average over prior methods both in the setting of fine-tuning a pretrained policy given offline data and in leveraging offline data to train online.
toXiv_bot_toot
RecRankerEval: A Flexible and Extensible Framework for Top-k LLM-based Recommendation
Zeyuan Meng, Zixuan Yi, Iadh Ounis
https://arxiv.org/abs/2507.05880 h…
EcoLoRA: Communication-Efficient Federated Fine-Tuning of Large Language Models
Han Liu, Ruoyao Wen, Srijith Nair, Jia Liu, Wenjing Lou, Chongjie Zhang, William Yeoh, Yevgeniy Vorobeychik, Ning Zhang
https://arxiv.org/abs/2506.02001
EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation
Fathinah Izzati, Xinyue Li, Gus Xia
https://arxiv.org/abs/2507.04955 https:/…
Training-free Generation of Temporally Consistent Rewards from VLMs
Yinuo Zhao, Jiale Yuan, Zhiyuan Xu, Xiaoshuai Hao, Xinyi Zhang, Kun Wu, Zhengping Che, Chi Harold Liu, Jian Tang
https://arxiv.org/abs/2507.04789
AdS: Adapter-state Sharing Framework for Multimodal Sarcasm Detection
Soumyadeep Jana, Sahil Danayak, Sanasam Ranbir Singh
https://arxiv.org/abs/2507.04508
GORACS: Group-level Optimal Transport-guided Coreset Selection for LLM-based Recommender Systems
Tiehua Mei, Hengrui Chen, Peng Yu, Jiaqing Liang, Deqing Yang
https://arxiv.org/abs/2506.04015
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Yana Wei, Liang Zhao, Jianjian Sun, Kangheng Lin, Jisheng Yin, Jingcheng Hu, Yinmin Zhang, En Yu, Haoran Lv, Zejia Weng, Jia Wang, Chunrui Han, Yuang Peng, Qi Han, Zheng Ge, Xiangyu Zhang, Daxin Jiang, Vishal M. Patel
https://arxiv.org/abs/2507…
Impact of Fine-Tuning Methods on Memorization in Large Language Models
Jie Hou, Chuxiong Wu, Lannan Luo, Qiang Zeng
https://arxiv.org/abs/2507.00258 https:…
This https://arxiv.org/abs/2505.24200 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSD_…
Enhancing Safety of Foundation Models for Visual Navigation through Collision Avoidance via Repulsive Estimation
Joonkyung Kim, Joonyeol Sim, Woojun Kim, Katia Sycara, Changjoo Nam
https://arxiv.org/abs/2506.03834
SU-ESRGAN: Semantic and Uncertainty-Aware ESRGAN for Super-Resolution of Satellite and Drone Imagery with Fine-Tuning for Cross Domain Evaluation
Prerana Ramkumar
https://arxiv.org/abs/2508.00750
Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches
Dena Mujtaba, Nihar Mahapatra
https://arxiv.org/abs/2506.00853 https://
MLLM-Fabric: Multimodal Large Language Model-Driven Robotic Framework for Fabric Sorting and Selection
Liman Wang, Hanyang Zhong, Tianyuan Wang, Shan Luo, Jihong Zhu
https://arxiv.org/abs/2507.04351
Sentinel: SOTA model to protect against prompt injections
Dror Ivry, Oran Nahum
https://arxiv.org/abs/2506.05446 https://arxiv.org/pd…
Navigating Sparse Molecular Data with Stein Diffusion Guidance
Van Khoa Nguyen, Lionel Blond\'e, Alexandros Kalousis
https://arxiv.org/abs/2507.05482 h…
TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning
Siqi Luo, Haoran Yang, Yi Xin, Mingyang Yi, Guangyang Wu, Guangtao Zhai, Xiaohong Liu
https://arxiv.org/abs/2507.22872
Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning
Yuheng Lei, Sitong Mao, Shunbo Zhou, Hongyuan Zhang, Xuelong Li, Ping Luo
https://arxiv.org/abs/2506.05985
Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification
Payel Bhattacharjee, Fengwei Tian, Ravi Tandon, Joseph Lo, Heidi Hanson, Geoffrey Rubin, Nirav Merchant, John Gounley
https://arxiv.org/abs/2506.04450
Corrector Sampling in Language Models
Itai Gat, Neta Shaul, Uriel Singer, Yaron Lipman
https://arxiv.org/abs/2506.06215 https://arxiv…
Fine-Tuning Lowers Safety and Disrupts Evaluation Consistency
Kathleen C. Fraser, Hillary Dawkins, Isar Nejadgholi, Svetlana Kiritchenko
https://arxiv.org/abs/2506.17209
SDD: Self-Degraded Defense against Malicious Fine-tuning
Zixuan Chen, Weikai Lu, Xin Lin, Ziqian Zeng
https://arxiv.org/abs/2507.21182 https://arxiv.org/pd…
This https://arxiv.org/abs/2506.02308 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
InfoSteer: Steering Information Utility in Language Model Post-Training
Chunyuan Deng, Ruidi Chang, Hanjie Chen
https://arxiv.org/abs/2507.05158 https://…
This https://arxiv.org/abs/2505.03793 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
Hang Lv, Sheng Liang, Hao Wang, Hongchao Gu, Yaxiong Wu, Wei Guo, Defu Lian, Yong Liu, Enhong Chen
https://arxiv.org/abs/2507.04756
SecureT2I: No More Unauthorized Manipulation on AI Generated Images from Prompts
Xiaodong Wu, Xiangman Li, Qi Li, Jianbing Ni, Rongxing Lu
https://arxiv.org/abs/2507.03636
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
Purbesh Mitra, Sennur Ulukus
https://arxiv.org/abs/2507.02851 https://a…
Knowledge-Aware Self-Correction in Language Models via Structured Memory Graphs
Swayamjit Saha
https://arxiv.org/abs/2507.04625 https://