Tootfinder

@arXiv_csLG_bot@mastoxiv.page
2025-09-18 10:18:21

Language models' activations linearly encode training-order recency
Dmitrii Krasheninnikov, Richard E. Turner, David Krueger
https://arxiv.org/abs/2509.14223 https://…

Language models' activations linearly encode training-order recency
We show that language models' activations linearly encode when information was learned during training. Our setup involves creating a model with a known training order by sequentially fine-tuning Llama-3.2-1B on six disjoint but otherwise similar datasets about named entities. We find that the average activations of test samples for the six training datasets encode the training order: when projected into a 2D subspace, these centroids are arranged exactly in the order of training and lie on a s…

@arXiv_csCL_bot@mastoxiv.page
2025-08-18 09:40:50

Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training
Marc Brinner, Sina Zarrie{\ss}
https://arxiv.org/abs/2508.11393 https://a…

Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training
We propose an end-to-end differentiable training paradigm for stable training of a rationalized transformer classifier. Our approach results in a single model that simultaneously classifies a sample and scores input tokens based on their relevance to the classification. To this end, we build on the widely-used three-player-game for training rationalized models, which typically relies on training a rationale selector, a classifier and a complement classifier. We simplify this approach by making …

@arXiv_csCV_bot@mastoxiv.page
2025-08-18 09:55:00

Training-Free Anomaly Generation via Dual-Attention Enhancement in Diffusion Model
Zuo Zuo, Jiahao Dong, Yanyun Qu, Zongze Wu
https://arxiv.org/abs/2508.11550 https://

Training-Free Anomaly Generation via Dual-Attention Enhancement in Diffusion Model
Industrial anomaly detection (AD) plays a significant role in manufacturing where a long-standing challenge is data scarcity. A growing body of works have emerged to address insufficient anomaly data via anomaly generation. However, these anomaly generation methods suffer from lack of fidelity or need to be trained with extra data. To this end, we propose a training-free anomaly generation framework dubbed AAG, which is based on Stable Diffusion (SD)'s strong generation ability for effective an…

@arXiv_csCR_bot@mastoxiv.page
2025-08-19 11:02:50

Substituting Proof of Work in Blockchain with Training-Verified Collaborative Model Computation
Mohammad Ishzaz Asif Rafid, Morsalin Sakib
https://arxiv.org/abs/2508.12138 https…

Substituting Proof of Work in Blockchain with Training-Verified Collaborative Model Computation
Bitcoin's Proof of Work (PoW) mechanism, while central to achieving decentralized consensus, has long been criticized for excessive energy use and hardware inefficiencies \cite{devries2018bitcoin, truby2018decarbonizing}. This paper introduces a hybrid architecture that replaces Bitcoin's traditional PoW with a centralized, cloud-based collaborative training framework. In this model, miners contribute computing resources to train segments of horizontally scaled machine learning models on prepro…

@arXiv_csRO_bot@mastoxiv.page
2025-09-18 10:14:41

FlightDiffusion: Revolutionising Autonomous Drone Training with Diffusion Models Generating FPV Video
Valerii Serpiva, Artem Lykov, Faryal Batool, Vladislav Kozlovskiy, Miguel Altamirano Cabrera, Dzmitry Tsetserukou
https://arxiv.org/abs/2509.14082

FlightDiffusion: Revolutionising Autonomous Drone Training with Diffusion Models Generating FPV Video
We present FlightDiffusion, a diffusion-model-based framework for training autonomous drones from first-person view (FPV) video. Our model generates realistic video sequences from a single frame, enriched with corresponding action spaces to enable reasoning-driven navigation in dynamic environments. Beyond direct policy learning, FlightDiffusion leverages its generative capabilities to synthesize diverse FPV trajectories and state-action pairs, facilitating the creation of large-scale training …

@arXiv_csCY_bot@mastoxiv.page
2025-08-19 10:17:20

SimInterview: Transforming Business Education through Large Language Model-Based Simulated Multilingual Interview Training System
Truong Thanh Hung Nguyen, Tran Diem Quynh Nguyen, Hoang Loc Cao, Thi Cam Thanh Tran, Thi Cam Mai Truong, Hung Cao
https://arxiv.org/abs/2508.11873

SimInterview: Transforming Business Education through Large Language Model-Based Simulated Multilingual Interview Training System
Business interview preparation demands both solid theoretical grounding and refined soft skills, yet conventional classroom methods rarely deliver the individualized, culturally aware practice employers currently expect. This paper introduces SimInterview, a large language model (LLM)-based simulated multilingual interview training system designed for business professionals entering the AI-transformed labor market. Our system leverages an LLM agent and synthetic AI technologies to create realis…

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:09:30

Mantis: A Simulation-Grounded Foundation Model for Disease Forecasting
Carson Dudley, Reiden Magdaleno, Christopher Harding, Ananya Sharma, Emily Martin, Marisa Eisenberg
https://arxiv.org/abs/2508.12260

Mantis: A Simulation-Grounded Foundation Model for Disease Forecasting
Infectious disease forecasting in novel outbreaks or low resource settings has been limited by the need for disease-specific data, bespoke training, and expert tuning. We introduce Mantis, a foundation model trained entirely on mechanistic simulations, which enables out-of-the-box forecasting across diseases, regions, and outcomes, even in settings with limited historical data. Mantis is built on over 400 million simulated days of outbreak dynamics spanning diverse pathogens, transmission modes…

@arXiv_statML_bot@mastoxiv.page
2025-08-18 08:39:50

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization
Shengzhuang Chen, Xu Ouyang, Michael Arthur Leopold Pearce, Thomas Hartvigsen, Jonathan Richard Schwarz
https://arxiv.org/abs/2508.11551

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization
Determining the optimal data mixture for large language model training remains a challenging problem with an outsized impact on performance. In practice, language model developers continue to rely on heuristic exploration since no learning-based approach has emerged as a reliable solution. In this work, we propose to view the selection of training data mixtures as a black-box hyperparameter optimization problem, for which Bayesian Optimization is a well-established class of appropriate algorith…

@arXiv_csLG_bot@mastoxiv.page
2025-08-18 09:38:30

NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models
Xiaohan Bi, Binhang Qi, Hailong Sun, Xiang Gao, Yue Yu, Xiaojun Liang
https://arxiv.org/abs/2508.11348

NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models
With the growing incorporation of deep neural network (DNN) models into modern software systems, the prohibitive construction costs have become a significant challenge. Model reuse has been widely applied to reduce training costs, but indiscriminately reusing entire models may incur significant inference overhead. Consequently, DNN modularization has gained attention, enabling module reuse by decomposing DNN models. The emerging modularizing-while-training (MwT) paradigm, which incorporates mod…

@arXiv_csDC_bot@mastoxiv.page
2025-08-19 07:38:50

Breaking the Aggregation Bottleneck in Federated Recommendation: A Personalized Model Merging Approach
Jundong Chen, Honglei Zhang, Chunxu Zhang, Fangyuan Luo, Yidong Li
https://arxiv.org/abs/2508.12386

Breaking the Aggregation Bottleneck in Federated Recommendation: A Personalized Model Merging Approach
Federated recommendation (FR) facilitates collaborative training by aggregating local models from massive devices, enabling client-specific personalization while ensuring privacy. However, we empirically and theoretically demonstrate that server-side aggregation can undermine client-side personalization, leading to suboptimal performance, which we term the aggregation bottleneck. This issue stems from the inherent heterogeneity across numerous clients in FR, which drives the globally aggregated…

@arXiv_csGR_bot@mastoxiv.page
2025-09-19 08:18:21

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
Chenxi Song, Yanming Yang, Tong Zhao, Ruibo Li, Chi Zhang
https://arxiv.org/abs/2509.15130

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
Recent video diffusion models demonstrate strong potential in spatial intelligence tasks due to their rich latent world priors. However, this potential is hindered by their limited controllability and geometric inconsistency, creating a gap between their strong priors and their practical use in 3D/4D tasks. As a result, current approaches often rely on retraining or fine-tuning, which risks degrading pretrained knowledge and incurs high computational costs. To address this, we propose WorldForg…

@arXiv_csIR_bot@mastoxiv.page
2025-08-19 08:21:20

A Large-Scale Web Search Dataset for Federated Online Learning to Rank
Marcel Gregoriadis, Jingwei Kang, Johan Pouwelse
https://arxiv.org/abs/2508.12353 https://

A Large-Scale Web Search Dataset for Federated Online Learning to Rank
The centralized collection of search interaction logs for training ranking models raises significant privacy concerns. Federated Online Learning to Rank (FOLTR) offers a privacy-preserving alternative by enabling collaborative model training without sharing raw user data. However, benchmarks in FOLTR are largely based on random partitioning of classical learning-to-rank datasets, simulated user clicks, and the assumption of synchronous client participation. This oversimplifies real-world dynami…

@arXiv_csSD_bot@mastoxiv.page
2025-09-19 10:09:31

From Hype to Insight: Rethinking Large Language Model Integration in Visual Speech Recognition
Rishabh Jain, Naomi Harte
https://arxiv.org/abs/2509.14880 https://

From Hype to Insight: Rethinking Large Language Model Integration in Visual Speech Recognition
Advances in self-supervised encoders have improved Visual Speech Recognition (VSR). Recent approaches integrating these encoders with LLM decoders improves transcription accuracy; however, it remains unclear whether these gains stem from visual understanding or stronger language modeling. In this work, we systematically evaluate LLM decoders by freezing or selectively updating the visual encoder, scaling decoder size, comparing adaptation strategies and architectures, and varying training data …

@arXiv_eessSP_bot@mastoxiv.page
2025-09-18 08:54:21

Efficient Quantization-Aware Neural Receivers: Beyond Post-Training Quantization
SaiKrishna Saketh Yellapragada, Esa Ollila, Mario Costa
https://arxiv.org/abs/2509.13786 https:/…

Efficient Quantization-Aware Neural Receivers: Beyond Post-Training Quantization
As wireless communication systems advance toward Sixth Generation (6G) Radio Access Networks (RAN), Deep Learning (DL)-based neural receivers are emerging as transformative solutions for Physical Layer (PHY) processing, delivering superior Block Error Rate (BLER) performance compared to traditional model-based approaches. Practical deployment on resource-constrained hardware, however, requires efficient quantization to reduce latency, energy, and memory without sacrificing reliability. We exten…

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:25:11

CSMoE: An Efficient Remote Sensing Foundation Model with Soft Mixture-of-Experts
Leonard Hackel, Tom Burgert, Beg\"um Demir
https://arxiv.org/abs/2509.14104 https://…

CSMoE: An Efficient Remote Sensing Foundation Model with Soft Mixture-of-Experts
Self-supervised learning through masked autoencoders has attracted great attention for remote sensing (RS) foundation model (FM) development, enabling improved representation learning across diverse sensors and downstream tasks. However, existing RS FMs often either suffer from substantial computational complexity during both training and inference or exhibit limited representational capacity. These issues restrict their practical applicability in RS. To address this limitation, we propose an a…

@arXiv_csCR_bot@mastoxiv.page
2025-09-19 09:41:41

Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection
Yihao Guo, Haocheng Bian, Liutong Zhou, Ze Wang, Zhaoyi Zhang, Francois Kawala, Milan Dean, Ian Fischer, Yuantao Peng, Noyan Tokgozoglu, Ivan Barrientos, Riyaaz Shaik, Rachel Li, Chandru Venkataraman, Reza Shifteh Far, Moses Pawar, Venkat Sundaranatha, Michael Xu, Frank Chu

Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection
With the deployment of Large Language Models (LLMs) in interactive applications, online malicious intent detection has become increasingly critical. However, existing approaches fall short of handling diverse and complex user queries in real time. To address these challenges, we introduce ADRAG (Adversarial Distilled Retrieval-Augmented Guard), a two-stage framework for robust and efficient online malicious intent detection. In the training stage, a high-capacity teacher model is trained on adv…

@arXiv_csRO_bot@mastoxiv.page
2025-09-18 10:14:51

GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model
Ali Abouzeid, Malak Mansour, Zezhou Sun, Dezhen Song
https://arxiv.org/abs/2509.14117 https://

GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model
Vision-Language-Action (VLA) models often fail to generalize to novel camera viewpoints, a limitation stemming from their difficulty in inferring robust 3D geometry from 2D images. We introduce GeoAware-VLA, a simple yet effective approach that enhances viewpoint invariance by integrating strong geometric priors into the vision backbone. Instead of training a visual encoder or relying on explicit 3D data, we leverage a frozen, pretrained geometric vision model as a feature extractor. A trainabl…

@arXiv_eessAS_bot@mastoxiv.page
2025-09-19 09:58:01

From Who Said What to Who They Are: Modular Training-free Identity-Aware LLM Refinement of Speaker Diarization
Yu-Wen Chen, William Ho, Maxim Topaz, Julia Hirschberg, Zoran Kostic
https://arxiv.org/abs/2509.15082

From Who Said What to Who They Are: Modular Training-free Identity-Aware LLM Refinement of Speaker Diarization
Speaker diarization (SD) struggles in real-world scenarios due to dynamic environments and unknown speaker counts. SD is rarely used alone and is often paired with automatic speech recognition (ASR), but non-modular methods that jointly train on domain-specific data have limited flexibility. Moreover, many applications require true speaker identities rather than SD's pseudo labels. We propose a training-free modular pipeline combining off-the-shelf SD, ASR, and a large language model (LLM) to d…

@arXiv_csCL_bot@mastoxiv.page
2025-08-18 09:41:00

Survey-to-Behavior: Downstream Alignment of Human Values in LLMs via Survey Questions
Shangrui Nie, Florian Mai, David Kacz\'er, Charles Welch, Zhixue Zhao, Lucie Flek
https://arxiv.org/abs/2508.11414

Survey-to-Behavior: Downstream Alignment of Human Values in LLMs via Survey Questions
Large language models implicitly encode preferences over human values, yet steering them often requires large training data. In this work, we investigate a simple approach: Can we reliably modify a model's value system in downstream behavior by training it to answer value survey questions accordingly? We first construct value profiles of several open-source LLMs by asking them to rate a series of value-related descriptions spanning 20 distinct human values, which we use as a baseline for subseq…

@arXiv_csLG_bot@mastoxiv.page
2025-09-18 10:15:41

A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training
Johnny R. Zhang (Independent Researcher), Xiaomei Mi (University of Manchester), Gaoyuan Du (Amazon), Qianyi Sun (Microsoft), Shiqi Wang (Meta), Jiaxuan Li (Amazon), Wenhua Zhou (Independent Researcher)
https://arx…

A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training
Stochastic optimization powers the scalability of modern artificial intelligence, spanning machine learning, deep learning, reinforcement learning, and large language model training. Yet, existing theory remains largely confined to Hilbert spaces, relying on inner-product frameworks and orthogonality. This paradigm fails to capture non-Euclidean settings, such as mirror descent on simplices, Bregman proximal methods for sparse learning, natural gradient descent in information geometry, or Kullb…

@arXiv_statME_bot@mastoxiv.page
2025-09-19 09:20:31

Semiparametric Learning from Open-Set Label Shift Data
Siyan Liu, Yukun Liu, Qinglong Tian, Pengfei Li, Jing Qin
https://arxiv.org/abs/2509.14522 https://a…

Semiparametric Learning from Open-Set Label Shift Data
We study the open-set label shift problem, where the test data may include a novel class absent from training. This setting is challenging because both the class proportions and the distribution of the novel class are not identifiable without extra assumptions. Existing approaches often rely on restrictive separability conditions, prior knowledge, or computationally infeasible procedures, and some may lack theoretical guarantees. We propose a semiparametric density ratio model framework that en…

@arXiv_csAR_bot@mastoxiv.page
2025-08-19 08:15:30

AutoPower: Automated Few-Shot Architecture-Level Power Modeling by Power Group Decoupling
Qijun Zhang, Yao Lu, Mengming Li, Zhiyao Xie
https://arxiv.org/abs/2508.12294 https://

AutoPower: Automated Few-Shot Architecture-Level Power Modeling by Power Group Decoupling
Power efficiency is a critical design objective in modern CPU design. Architects need a fast yet accurate architecture-level power evaluation tool to perform early-stage power estimation. However, traditional analytical architecture-level power models are inaccurate. The recently proposed machine learning (ML)-based architecture-level power model requires sufficient data from known configurations for training, making it unrealistic. In this work, we propose AutoPower targeting fully automated…

@arXiv_physicsfludyn_bot@mastoxiv.page
2025-09-18 09:18:21

A proposal for automated turbulence modelling
Marco Castelletti, Maurizio Quadrio
https://arxiv.org/abs/2509.14140 https://arxiv.org/pdf/2509.14140

A proposal for automated turbulence modelling
Solving the Reynolds-averaged Navier-Stokes equations (RANS) closed with an eddy viscosity computed through a turbulence model is still the leading approach for Computational Fluid Dynamics simulations. Unfortunately, universal models with good predictive capabilities over a wide range of flows are not available. In this work, we propose the use of machine learning to improve existing RANS models. The approach does not require high-fidelity training data. A convolutional neural network is use…

@arXiv_eessIV_bot@mastoxiv.page
2025-09-17 08:49:50

MEGAN: Mixture of Experts for Robust Uncertainty Estimation in Endoscopy Videos
Damola Agbelese, Krishna Chaitanya, Pushpak Pati, Chaitanya Parmar, Pooya Mobadersany, Shreyas Fadnavis, Lindsey Surace, Shadi Yarandi, Louis R. Ghanem, Molly Lucas, Tommaso Mansi, Oana Gabriela Cula, Pablo F. Damasceno, Kristopher Standish
https://arxiv.org/ab…

MEGAN: Mixture of Experts for Robust Uncertainty Estimation in Endoscopy Videos
Reliable uncertainty quantification (UQ) is essential in medical AI. Evidential Deep Learning (EDL) offers a computationally efficient way to quantify model uncertainty alongside predictions, unlike traditional methods such as Monte Carlo (MC) Dropout and Deep Ensembles (DE). However, all these methods often rely on a single expert's annotations as ground truth for model training, overlooking the inter-rater variability in healthcare. To address this issue, we propose MEGAN, a Multi-Expert Gati…

@arXiv_csSD_bot@mastoxiv.page
2025-09-18 08:24:01

Noise Supervised Contrastive Learning and Feature-Perturbed for Anomalous Sound Detection
Shun Huang, Zhihua Fang, Liang He
https://arxiv.org/abs/2509.13853 https://

Noise Supervised Contrastive Learning and Feature-Perturbed for Anomalous Sound Detection
Unsupervised anomalous sound detection aims to detect unknown anomalous sounds by training a model using only normal audio data. Despite advancements in self-supervised methods, the issue of frequent false alarms when handling samples of the same type from different machines remains unresolved. This paper introduces a novel training technique called one-stage supervised contrastive learning (OS-SCL), which significantly addresses this problem by perturbing features in the embedding space and em…

@arXiv_csCV_bot@mastoxiv.page
2025-09-19 10:25:41

Synthetic-to-Real Object Detection using YOLOv11 and Domain Randomization Strategies
Luisa Torquato Ni\~no, Hamza A. A. Gardi
https://arxiv.org/abs/2509.15045 https://

Synthetic-to-Real Object Detection using YOLOv11 and Domain Randomization Strategies
This paper addresses the synthetic-to-real domain gap in object detection, focusing on training a YOLOv11 model to detect a specific object (a soup can) using only synthetic data and domain randomization strategies. The methodology involves extensive experimentation with data augmentation, dataset composition, and model scaling. While synthetic validation metrics were consistently high, they proved to be poor predictors of real-world performance. Consequently, models were also evaluated qualita…

@arXiv_csAI_bot@mastoxiv.page
2025-09-18 07:40:11

FRIT: Using Causal Importance to Improve Chain-of-Thought Faithfulness
Anand Swaroop, Akshat Nallani, Saksham Uboweja, Adiliia Uzdenova, Michael Nguyen, Kevin Zhu, Sunishchal Dev, Ashwinee Panda, Vasu Sharma, Maheep Chaudhary
https://arxiv.org/abs/2509.13334

FRIT: Using Causal Importance to Improve Chain-of-Thought Faithfulness
Chain-of-thought (CoT) reasoning has emerged as a powerful tool for improving large language model performance on complex tasks, but recent work shows that reasoning steps often fail to causally influence the final answer, creating brittle and untrustworthy outputs. Prior approaches focus primarily on measuring faithfulness, while methods for systematically improving it remain limited. We introduce Faithful Reasoning via Intervention Training (FRIT), a scalable alignment method that trains mode…

@arXiv_csLG_bot@mastoxiv.page
2025-08-18 09:45:50

Physics-Informed Diffusion Models for Unsupervised Anomaly Detection in Multivariate Time Series
Juhi Soni, Markus Lange-Hegermann, Stefan Windmann
https://arxiv.org/abs/2508.11528

Physics-Informed Diffusion Models for Unsupervised Anomaly Detection in Multivariate Time Series
We propose an unsupervised anomaly detection approach based on a physics-informed diffusion model for multivariate time series data. Over the past years, diffusion model has demonstrated its effectiveness in forecasting, imputation, generation, and anomaly detection in the time series domain. In this paper, we present a new approach for learning the physics-dependent temporal distribution of multivariate time series data using a weighted physics-informed loss during diffusion model training. A …

@arXiv_csCR_bot@mastoxiv.page
2025-08-19 11:18:00

Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position
Zhixin Xie, Xurui Song, Jun Luo
https://arxiv.org/abs/2508.12398 https://

Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position
Diffusion Large Language Models (dLLMs) have recently emerged as a competitive non-autoregressive paradigm due to their unique training and inference approach. However, there is currently a lack of safety study on this novel architecture. In this paper, we present the first analysis of dLLMs' safety performance and propose a novel safety alignment method tailored to their unique generation characteristics. Specifically, we identify a critical asymmetry between the defender and attacker in terms…

@arXiv_csCL_bot@mastoxiv.page
2025-09-18 10:16:51

Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
Monica Sekoyan, Nithin Rao Koluguri, Nune Tadevosyan, Piotr Zelasko, Travis Bartley, Nick Karpov, Jagadeesh Balam, Boris Ginsburg
https://arxiv.org/abs/2509.14128

Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
This report introduces Canary-1B-v2, a fast, robust multilingual model for Automatic Speech Recognition (ASR) and Speech-to-Text Translation (AST). Built with a FastConformer encoder and Transformer decoder, it supports 25 languages primarily European. The model was trained on 1.7M hours of total data samples, including Granary and NeMo ASR Set 3.0, with non-speech audio added to reduce hallucinations for ASR and AST. We describe its two-stage pre-training and fine-tuning process with dynamic d…

@arXiv_csRO_bot@mastoxiv.page
2025-09-19 10:11:11

CollabVLA: Self-Reflective Vision-Language-Action Model Dreaming Together with Human
Nan Sun, Yongchang Li, Chenxu Wang, Huiying Li, Huaping Liu
https://arxiv.org/abs/2509.14889

CollabVLA: Self-Reflective Vision-Language-Action Model Dreaming Together with Human
In this work, we present CollabVLA, a self-reflective vision-language-action framework that transforms a standard visuomotor policy into a collaborative assistant. CollabVLA tackles key limitations of prior VLAs, including domain overfitting, non-interpretable reasoning, and the high latency of auxiliary generative models, by integrating VLM-based reflective reasoning with diffusion-based action generation under a mixture-of-experts design. Through a two-stage training recipe of action groundin…

@arXiv_physicschemph_bot@mastoxiv.page
2025-08-18 11:18:45

Replaced article(s) found for physics.chem-ph. https://arxiv.org/list/physics.chem-ph/new
[1/1]:
- Machine Learning Interatomic Potentials: library for efficient training, model development and si...
Christoph Brunken, et al.

@arXiv_csHC_bot@mastoxiv.page
2025-08-15 07:47:02

Pre-trained Transformer-models using chronic invasive electrophysiology for symptom decoding without patient-individual training
Timon Merk, Saeed Salehi, Richard M. Koehler, Qiming Cui, Maria Olaru, Amelia Hahn, Nicole R. Provenza, Simon Little, Reza Abbasi-Asl, Phil A. Starr, Wolf-Julian Neumann
https://arxiv.org/abs/2508.10160

Pre-trained Transformer-models using chronic invasive electrophysiology for symptom decoding without patient-individual training
Neural decoding of pathological and physiological states can enable patient-individualized closed-loop neuromodulation therapy. Recent advances in pre-trained large-scale foundation models offer the potential for generalized state estimation without patient-individual training. Here we present a foundation model trained on chronic longitudinal deep brain stimulation recordings spanning over 24 days. Adhering to long time-scale symptom fluctuations, we highlight the extended context window of 30…

@arXiv_physicsspaceph_bot@mastoxiv.page
2025-08-19 08:20:10

A Neural-Network Framework for Tracking and Identification of Cosmic-Ray Nuclei in the RadMap Telescope
Luise Meyer-Hetling, Martin J. Losekamm, Stephan Paul, Thomas P\"oschl
https://arxiv.org/abs/2508.12708

A Neural-Network Framework for Tracking and Identification of Cosmic-Ray Nuclei in the RadMap Telescope
We present a neural-network framework designed to reconstruct the properties of cosmic-ray nuclei traversing the scintillating-fiber tracking calorimeter of the RadMap Telescope. Employing the Geant4 simulation toolkit and a simplified model of the detector to generate training and test data, we achieve the spectroscopic capabilities required for an accurate determination of the biologically relevant dose that astronauts receive in space. We can reconstruct a particle's trajectory with an angul…

@arXiv_csSE_bot@mastoxiv.page
2025-09-17 08:49:39

Ensembling Large Language Models for Code Vulnerability Detection: An Empirical Evaluation
Zhihong Sun, Jia Li, Yao Wan, Chuanyi Li, Hongyu Zhang, Zhi jin, Ge Li, Hong Liu, Chen Lyu, Songlin Hu
https://arxiv.org/abs/2509.12629

Ensembling Large Language Models for Code Vulnerability Detection: An Empirical Evaluation
Code vulnerability detection is crucial for ensuring the security and reliability of modern software systems. Recently, Large Language Models (LLMs) have shown promising capabilities in this domain. However, notable discrepancies in detection results often arise when analyzing identical code segments across different training stages of the same model or among architecturally distinct LLMs. While such inconsistencies may compromise detection stability, they also highlight a key opportunity: the …

@arXiv_csDC_bot@mastoxiv.page
2025-08-19 09:00:30

Accelerating Edge Inference for Distributed MoE Models with Latency-Optimized Expert Placement
Tian Wu, Liming Wang, Zijian Wen, Xiaoxi Zhang, Jingpu Duan, Xianwei Zhang, Jinhang Zuo
https://arxiv.org/abs/2508.12851

Accelerating Edge Inference for Distributed MoE Models with Latency-Optimized Expert Placement
Mixture-of-Experts (MoE) have become a cornerstone for training and scaling large language models (LLMs), offering substantial gains in model capacity and efficiency through sparse expert activation. However, serving these models remains challenging in practice, particularly in resource-constrained edge environments, due to their large memory footprint and complex communication demands. While centralized cloud inference is common, it incurs high infrastructure costs, along with latency and priv…

@arXiv_physicsaoph_bot@mastoxiv.page
2025-09-16 08:49:36

How does an AI Weather Model Learn to Forecast Extreme Weather?
Rebecca Baiman, Elizabeth A. Barnes, Ankur Mahesh
https://arxiv.org/abs/2509.10639 https://…

How does an AI Weather Model Learn to Forecast Extreme Weather?
In a warming climate with more frequent severe weather, artificial intelligence (AI) weather models have the potential to provide cheaper, faster, and more accurate forecasts of high- impact weather events. To realize this potential, there is a need for more research on how models learn extreme events and how that learning might be improved. We investigate how a spherical Fourier neural operator model (SFNO) learns extreme weather by saving every checkpoint throughout training and analyzing a c…

@arXiv_csCL_bot@mastoxiv.page
2025-08-18 09:08:00

Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics
Carter Blum, Katja Filipova, Ann Yuan, Asma Ghandeharioun, Julian Zimmert, Fred Zhang, Jessica Hoffmann, Tal Linzen, Martin Wattenberg, Lucas Dixon, Mor Geva
https://arxiv.org/abs/2508.11017

Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics
Large language models (LLMs) struggle with cross-lingual knowledge transfer: they hallucinate when asked in one language about facts expressed in a different language during training. This work introduces a controlled setting to study the causes and dynamics of this phenomenon by training small Transformer models from scratch on synthetic multilingual datasets. We identify a learning phase wherein a model develops either separate or unified representations of the same facts across languages, an…

@arXiv_csCV_bot@mastoxiv.page
2025-09-16 12:43:17

FS-SAM2: Adapting Segment Anything Model 2 for Few-Shot Semantic Segmentation via Low-Rank Adaptation
Bernardo Forni, Gabriele Lombardi, Federico Pozzi, Mirco Planamente
https://arxiv.org/abs/2509.12105

FS-SAM2: Adapting Segment Anything Model 2 for Few-Shot Semantic Segmentation via Low-Rank Adaptation
Few-shot semantic segmentation has recently attracted great attention. The goal is to develop a model capable of segmenting unseen classes using only a few annotated samples. Most existing approaches adapt a pre-trained model by training from scratch an additional module. Achieving optimal performance with these approaches requires extensive training on large-scale datasets. The Segment Anything Model 2 (SAM2) is a foundational model for zero-shot image and video segmentation with a modular des…

@arXiv_csCR_bot@mastoxiv.page
2025-08-19 11:38:10

Unlearning Comparator: A Visual Analytics System for Comparative Evaluation of Machine Unlearning Methods
Jaeung Lee, Suhyeon Yu, Yurim Jang, Simon S. Woo, Jaemin Jo
https://arxiv.org/abs/2508.12730

Unlearning Comparator: A Visual Analytics System for Comparative Evaluation of Machine Unlearning Methods
Machine Unlearning (MU) aims to remove target training data from a trained model so that the removed data no longer influences the model's behavior, fulfilling "right to be forgotten" obligations under data privacy laws. Yet, we observe that researchers in this rapidly emerging field face challenges in analyzing and understanding the behavior of different MU methods, especially in terms of three fundamental principles in MU: accuracy, efficiency, and privacy. Consequently, they often rely on ag…

@arXiv_csSD_bot@mastoxiv.page
2025-09-19 10:12:21

Back to Ear: Perceptually Driven High Fidelity Music Reconstruction
Kangdi Wang, Zhiyue Wu, Dinghao Zhou, Rui Lin, Junyu Dai, Tao Jiang
https://arxiv.org/abs/2509.14912 https://…

Back to Ear: Perceptually Driven High Fidelity Music Reconstruction
Variational Autoencoders (VAEs) are essential for large-scale audio tasks like diffusion-based generation. However, existing open-source models often neglect auditory perceptual aspects during training, leading to weaknesses in phase accuracy and stereophonic spatial representation. To address these challenges, we propose εar-VAE, an open-source music signal reconstruction model that rethinks and optimizes the VAE training paradigm. Our contributions are threefold: (i) A K-weighting perceptual…

@arXiv_eessSP_bot@mastoxiv.page
2025-08-19 11:07:10

ATLAS: AI-Native Receiver Test-and-Measurement by Leveraging AI-Guided Search
Mauro Belgiovine, Suyash Pradhan, Johannes Lange, Michael L\"ohning, Kaushik Chowdhury
https://arxiv.org/abs/2508.12204

ATLAS: AI-Native Receiver Test-and-Measurement by Leveraging AI-Guided Search
Industry adoption of Artificial Intelligence (AI)-native wireless receivers, or even modular, Machine Learning (ML)-aided wireless signal processing blocks, has been slow. The main concern is the lack of explainability of these trained ML models and the significant risks posed to network functionalities in case of failures, especially since (i) testing on every exhaustive case is infeasible and (ii) the data used for model training may not be available. This paper proposes ATLAS, an AI-guided a…

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:19:50

Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback
Wenzhen Yuan, Shengji Tang, Weihao Lin, Jiacheng Ruan, Ganqu Cui, Bo Zhang, Tao Chen, Ting Liu, Yuzhuo Fu, Peng Ye, Lei Bai
https://arxiv.org/abs/2508.12338

Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback
Reinforcement learning (RL) has significantly enhanced the reasoning capabilities of large language models (LLMs), but its reliance on expensive human-labeled data or complex reward models severely limits scalability. While existing self-feedback methods aim to address this problem, they are constrained by the capabilities of a single model, which can lead to overconfidence in incorrect answers, reward hacking, and even training collapse. To this end, we propose Reinforcement Learning from Coev…

@arXiv_csCL_bot@mastoxiv.page
2025-09-17 10:34:40

Multi-Model Synthetic Training for Mission-Critical Small Language Models
Nolan Platt, Pragyansmita Nayak
https://arxiv.org/abs/2509.13047 https://arxiv.or…

Multi-Model Synthetic Training for Mission-Critical Small Language Models
Large Language Models (LLMs) have demonstrated remarkable capabilities across many domains, yet their appli- cation to specialized fields remains constrained by the scarcity and complexity of domain-specific training data. We present a novel approach that achieves a 261x cost reduction for maritime intelligence by using LLMs as one-time teachers rather than using them directly for inference. Our method transforms 3.2 billion Automatic Identification System (AIS) vessel tracking records into 21,…

@arXiv_csCR_bot@mastoxiv.page
2025-09-18 09:31:31

Differential Privacy in Federated Learning: Mitigating Inference Attacks with Randomized Response
Ozer Ozturk, Busra Buyuktanir, Gozde Karatas Baydogmus, Kazim Yildiz
https://arxiv.org/abs/2509.13987

Differential Privacy in Federated Learning: Mitigating Inference Attacks with Randomized Response
Machine learning models used for distributed architectures consisting of servers and clients require large amounts of data to achieve high accuracy. Data obtained from clients are collected on a central server for model training. However, storing data on a central server raises concerns about security and privacy. To address this issue, a federated learning architecture has been proposed. In federated learning, each client trains a local model using its own data. The trained models are periodic…

@arXiv_csLG_bot@mastoxiv.page
2025-09-16 12:40:07

Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training
Chuan He, Zhanwang Deng, Zhaosong Lu
https://arxiv.org/abs/2509.11983

Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training
Neural network (NN) training is inherently a large-scale matrix optimization problem, yet the matrix structure of NN parameters has long been overlooked. Recently, the optimizer Muon \cite{jordanmuon}, which explicitly exploits this structure, has gained significant attention for its strong performance in foundation model training. A key component contributing to Muon's success is matrix orthogonalization. In this paper, we propose {\it low-rank orthogonalization}, which explicitly leverages th…

@arXiv_csDC_bot@mastoxiv.page
2025-10-14 09:05:28

FLAMMABLE: A Multi-Model Federated Learning Framework with Multi-Model Engagement and Adaptive Batch Sizes
Shouxu Lin, Zimeng Pan, Yuhang Yao, Haeyoung Noh, Pei Zhang, Carlee Joe-Wong
https://arxiv.org/abs/2510.10380

FLAMMABLE: A Multi-Model Federated Learning Framework with Multi-Model Engagement and Adaptive Batch Sizes
Multi-Model Federated Learning (MMFL) is an emerging direction in Federated Learning (FL) where multiple models are trained in parallel, generally on various datasets. Optimizing the models' accuracies and training times in the MMFL setting requires adapting to data and system heterogeneity across clients as in single-model FL; these challenges are amplified in the MMFL setting due to additional heterogeneity across models. Neither existing solutions nor naïve extensions of single-model FL fra…

@arXiv_csRO_bot@mastoxiv.page
2025-09-19 09:40:21

Toward Embodiment Equivariant Vision-Language-Action Policy
Anzhe Chen, Yifei Yang, Zhenjie Zhu, Kechun Xu, Zhongxiang Zhou, Rong Xiong, Yue Wang
https://arxiv.org/abs/2509.14630

Toward Embodiment Equivariant Vision-Language-Action Policy
Vision-language-action policies learn manipulation skills across tasks, environments and embodiments through large-scale pre-training. However, their ability to generalize to novel robot configurations remains limited. Most approaches emphasize model size, dataset scale and diversity while paying less attention to the design of action spaces. This leads to the configuration generalization problem, which requires costly adaptation. We address this challenge by formulating cross-embodiment pre-tr…

@arXiv_csCV_bot@mastoxiv.page
2025-08-18 12:06:15

Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/4]:
- IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model
Jiang, Gao, Wang, Sun, Wang, Heng, Sun, Tang, Zhu, Chai, Wang, Gu, Jiang, Sun

@arXiv_eessAS_bot@mastoxiv.page
2025-09-19 09:46:01

Mitigating data replication in text-to-audio generative diffusion models through anti-memorization guidance
Francisco Messina, Francesca Ronchini, Luca Comanducci, Paolo Bestagini, Fabio Antonacci
https://arxiv.org/abs/2509.14934

Mitigating data replication in text-to-audio generative diffusion models through anti-memorization guidance
A persistent challenge in generative audio models is data replication, where the model unintentionally generates parts of its training data during inference. In this work, we address this issue in text-to-audio diffusion models by exploring the use of anti-memorization strategies. We adopt Anti-Memorization Guidance (AMG), a technique that modifies the sampling process of pre-trained diffusion models to discourage memorization. Our study explores three types of guidance within AMG, each designe…

@arXiv_csIR_bot@mastoxiv.page
2025-08-15 08:42:32

Proxy Model-Guided Reinforcement Learning for Client Selection in Federated Recommendation
Liang Qu, Jianxin Li, Wei Yuan, Penghui Ruan, Yuhui Shi, Hongzhi Yin
https://arxiv.org/abs/2508.10401

Proxy Model-Guided Reinforcement Learning for Client Selection in Federated Recommendation
Federated recommender systems have emerged as a promising privacy-preserving paradigm, enabling personalized recommendation services without exposing users' raw data. By keeping data local and relying on a central server to coordinate training across distributed clients, FedRSs protect user privacy while collaboratively learning global models. However, most existing FedRS frameworks adopt fully random client selection strategy in each training round, overlooking the statistical heterogeneity of…

@arXiv_csLG_bot@mastoxiv.page
2025-09-18 10:20:21

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision
Dulhan Jayalath, Shashwat Goel, Thomas Foster, Parag Jain, Suchin Gururangan, Cheng Zhang, Anirudh Goyal, Alan Schelten
https://arxiv.org/abs/2509.14234

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision
Where do learning signals come from when there is no ground truth in post-training? We propose turning exploration into supervision through Compute as Teacher (CaT), which converts the model's own exploration at inference-time into reference-free supervision by synthesizing a single reference from a group of parallel rollouts and then optimizing toward it. Concretely, the current policy produces a group of rollouts; a frozen anchor (the initial policy) reconciles omissions and contradictions to…

@arXiv_csCL_bot@mastoxiv.page
2025-09-19 10:36:51

LLM-OREF: An Open Relation Extraction Framework Based on Large Language Models
Hongyao Tu, Liang Zhang, Yujie Lin, Xin Lin, Haibo Zhang, Long Zhang, Jinsong Su
https://arxiv.org/abs/2509.15089

LLM-OREF: An Open Relation Extraction Framework Based on Large Language Models
The goal of open relation extraction (OpenRE) is to develop an RE model that can generalize to new relations not encountered during training. Existing studies primarily formulate OpenRE as a clustering task. They first cluster all test instances based on the similarity between the instances, and then manually assign a new relation to each cluster. However, their reliance on human annotation limits their practicality. In this paper, we propose an OpenRE framework based on large language models (…

@arXiv_csRO_bot@mastoxiv.page
2025-09-19 10:01:51

RealMirror: A Comprehensive, Open-Source Vision-Language-Action Platform for Embodied AI
Cong Tai, Zhaoyu Zheng, Haixu Long, Hansheng Wu, Haodong Xiang, Zhengbin Long, Jun Xiong, Rong Shi, Shizhuang Zhang, Gang Qiu, He Wang, Ruifeng Li, Jun Huang, Bin Chang, Shuai Feng, Tao Shen
https://arxiv.org/abs/2509.14687

RealMirror: A Comprehensive, Open-Source Vision-Language-Action Platform for Embodied AI
The emerging field of Vision-Language-Action (VLA) for humanoid robots faces several fundamental challenges, including the high cost of data acquisition, the lack of a standardized benchmark, and the significant gap between simulation and the real world. To overcome these obstacles, we propose RealMirror, a comprehensive, open-source embodied AI VLA platform. RealMirror builds an efficient, low-cost data collection, model training, and inference system that enables end-to-end VLA research witho…

@arXiv_csSD_bot@mastoxiv.page
2025-09-19 10:06:11

MeanFlowSE: one-step generative speech enhancement via conditional mean flow
Duojia Li, Shenghui Lu, Hongchen Pan, Zongyi Zhan, Qingyang Hong, Lin Li
https://arxiv.org/abs/2509.14858

MeanFlowSE: one-step generative speech enhancement via conditional mean flow
Multistep inference is a bottleneck for real-time generative speech enhancement because flow- and diffusion-based systems learn an instantaneous velocity field and therefore rely on iterative ordinary differential equation (ODE) solvers. We introduce MeanFlowSE, a conditional generative model that learns the average velocity over finite intervals along a trajectory. Using a Jacobian-vector product (JVP) to instantiate the MeanFlow identity, we derive a local training objective that directly sup…

@arXiv_eessSP_bot@mastoxiv.page
2025-09-17 09:40:00

Bayesian Signal Separation via Plug-and-Play Diffusion-Within-Gibbs Sampling
Yi Zhang, Rui Guo, Yonina C. Eldar
https://arxiv.org/abs/2509.12857 https://ar…

Bayesian Signal Separation via Plug-and-Play Diffusion-Within-Gibbs Sampling
We propose a posterior sampling algorithm for the problem of estimating multiple independent source signals from their noisy superposition. The proposed algorithm is a combination of Gibbs sampling method and plug-and-play (PnP) diffusion priors. Unlike most existing diffusion-model-based approaches for signal separation, our method allows source priors to be learned separately and flexibly combined without retraining. Moreover, under the assumption of perfect diffusion model training, the prop…

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:19:01

Morphology-optimized Multi-Scale Fusion: Combining Local Artifacts and Mesoscopic Semantics for Deepfake Detection and Localization
Chao Shuai, Gaojian Wang, Kun Pan, Tong Wu, Fanli Jin, Haohan Tan, Mengxiang Li, Zhenguang Liu, Feng Lin, Kui Ren
https://arxiv.org/abs/2509.13776

Morphology-optimized Multi-Scale Fusion: Combining Local Artifacts and Mesoscopic Semantics for Deepfake Detection and Localization
While the pursuit of higher accuracy in deepfake detection remains a central goal, there is an increasing demand for precise localization of manipulated regions. Despite the remarkable progress made in classification-based detection, accurately localizing forged areas remains a significant challenge. A common strategy is to incorporate forged region annotations during model training alongside manipulated images. However, such approaches often neglect the complementary nature of local detail and…

@arXiv_csLG_bot@mastoxiv.page
2025-08-18 09:43:10

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Wenhao Zhang, Yuexiang Xie, Yuchang Sun, Yanxi Chen, Guoyin Wang, Yaliang Li, Bolin Ding, Jingren Zhou
https://arxiv.org/abs/2508.11408

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) are two prominent post-training paradigms for refining the capabilities and aligning the behavior of Large Language Models (LLMs). Existing approaches that integrate SFT and RL often face the risk of disrupting established model patterns and inducing overfitting to expert data. To address this, we present a novel investigation into the unified view of SFT and RL through an off-policy versus on-policy lens. We propose CHORD, a framewor…

@arXiv_csCR_bot@mastoxiv.page
2025-09-19 08:50:11

Beyond Data Privacy: New Privacy Risks for Large Language Models
Yuntao Du, Zitao Li, Ninghui Li, Bolin Ding
https://arxiv.org/abs/2509.14278 https://arxiv…

Beyond Data Privacy: New Privacy Risks for Large Language Models
Large Language Models (LLMs) have achieved remarkable progress in natural language understanding, reasoning, and autonomous decision-making. However, these advancements have also come with significant privacy concerns. While significant research has focused on mitigating the data privacy risks of LLMs during various stages of model training, less attention has been paid to new threats emerging from their deployment. The integration of LLMs into widely used applications and the weaponization of …

@arXiv_csCL_bot@mastoxiv.page
2025-08-18 09:48:00

Dataset Creation for Visual Entailment using Generative AI
Rob Reijtenbach, Suzan Verberne, Gijs Wijnholds
https://arxiv.org/abs/2508.11605 https://arxiv.o…

Dataset Creation for Visual Entailment using Generative AI
In this paper we present and validate a new synthetic dataset for training visual entailment models. Existing datasets for visual entailment are small and sparse compared to datasets for textual entailment. Manually creating datasets is labor-intensive. We base our synthetic dataset on the SNLI dataset for textual entailment. We take the premise text from SNLI as input prompts in a generative image model, Stable Diffusion, creating an image to replace each textual premise. We evaluate our datas…

@arXiv_csAI_bot@mastoxiv.page
2025-09-16 08:29:56

LLM Enhancement with Domain Expert Mental Model to Reduce LLM Hallucination with Causal Prompt Engineering
Boris Kovalerchuk, Brent D. Fegley
https://arxiv.org/abs/2509.10818 ht…

LLM Enhancement with Domain Expert Mental Model to Reduce LLM Hallucination with Causal Prompt Engineering
Difficult decision-making problems abound in various disciplines and domains. The proliferation of generative techniques, especially large language models (LLMs), has excited interest in using them for decision support. However, LLMs cannot yet resolve missingness in their training data, leading to hallucinations. Retrieval-Augmented Generation (RAG) enhances LLMs by incorporating external information retrieval, reducing hallucinations and improving accuracy. Yet, RAG and related methods are on…

@arXiv_csSD_bot@mastoxiv.page
2025-08-18 07:39:40

Benchmarking Prosody Encoding in Discrete Speech Tokens
Kentaro Onda, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu
https://arxiv.org/abs/2508.11224 https://

Benchmarking Prosody Encoding in Discrete Speech Tokens
Recently, discrete tokens derived from self-supervised learning (SSL) models via k-means clustering have been actively studied as pseudo-text in speech language models and as efficient intermediate representations for various tasks. However, these discrete tokens are typically learned in advance, separately from the training of language models or downstream tasks. As a result, choices related to discretization, such as the SSL model used or the number of clusters, must be made heuristically. In…

@arXiv_csIR_bot@mastoxiv.page
2025-08-15 09:23:42

FuXi-\beta: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model
Yufei Ye, Wei Guo, Hao Wang, Hong Zhu, Yuyang Ye, Yong Liu, Huifeng Guo, Ruiming Tang, Defu Lian, Enhong Chen
https://arxiv.org/abs/2508.10615

FuXi-β: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model
Scaling laws for autoregressive generative recommenders reveal potential for larger, more versatile systems but mean greater latency and training costs. To accelerate training and inference, we investigated the recent generative recommendation models HSTU and FuXi-$α$, identifying two efficiency bottlenecks: the indexing operations in relative temporal attention bias and the computation of the query-key attention map. Additionally, we observed that relative attention bias in self-attention mec…

@arXiv_csLG_bot@mastoxiv.page
2025-09-18 10:07:21

Differentially private federated learning for localized control of infectious disease dynamics
Raouf Kerkouche, Henrik Zunker, Mario Fritz, Martin J. K\"uhn
https://arxiv.org/abs/2509.14024

Differentially private federated learning for localized control of infectious disease dynamics
In times of epidemics, swift reaction is necessary to mitigate epidemic spreading. For this reaction, localized approaches have several advantages, limiting necessary resources and reducing the impact of interventions on a larger scale. However, training a separate machine learning (ML) model on a local scale is often not feasible due to limited available data. Centralizing the data is also challenging because of its high sensitivity and privacy constraints. In this study, we consider a localiz…

@arXiv_eessAS_bot@mastoxiv.page
2025-08-18 08:46:20

MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
Heyang Xue, Xuchen Song, Yu Tang, Jianyu Chen, Yanru Chen, Yang Li, Yahui Zhou
https://arxiv.org/abs/2508.11326

MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
Description-based text-to-speech (TTS) models exhibit strong performance on in-domain text descriptions, i.e., those encountered during training. However, in real-world applications, the diverse range of user-generated descriptions inevitably introduces numerous out-of-domain inputs that challenge the text understanding capabilities of these systems. To address this issue, we propose MoE-TTS, a description-based TTS model designed to enhance the understanding of out-of-domain text descriptions.…

@arXiv_csDC_bot@mastoxiv.page
2025-08-14 07:48:22

Verify Distributed Deep Learning Model Implementation Refinement with Iterative Relation Inference
Zhanghan Wang, Ding Ding, Hang Zhu, Haibin Lin, Aurojit Panda
https://arxiv.org/abs/2508.09505

Verify Distributed Deep Learning Model Implementation Refinement with Iterative Relation Inference
Distributed machine learning training and inference is common today because today's large models require more memory and compute than can be provided by a single GPU. Distributed models are generally produced by programmers who take a sequential model specification and apply several distribution strategies to distribute state and computation across GPUs. Unfortunately, bugs can be introduced in the process, and a distributed model implementation's outputs might differ from the sequential model'…

@arXiv_csCR_bot@mastoxiv.page
2025-08-19 10:22:40

Optimizing Token Choice for Code Watermarking: A RL Approach
Zhimeng Guo, Huaisheng Zhu, Siyuan Xu, Hangfan Zhang, Teng Xiao, Minhao Cheng
https://arxiv.org/abs/2508.11925 https…

Optimizing Token Choice for Code Watermarking: A RL Approach
The need for detecting LLM-generated code necessitates watermarking systems capable of operating within its highly structured and syntactically constrained environment. To address this, we introduce CodeTracer, an innovative adaptive code watermarking framework underpinned by a novel reinforcement learning training paradigm. At its core, CodeTracer features a policy-driven approach that utilizes a parameterized model to intelligently bias token choices during next-token prediction. This strateg…

@arXiv_csCL_bot@mastoxiv.page
2025-09-19 10:39:41

LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models
Ruijie Hou, Yueyang Jiao, Hanxu Hu, Yingming Li, Wai Lam, Huajian Zhang, Hongyuan Lu
https://arxiv.org/abs/2509.15218

LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models
The problem of data contamination is now almost inevitable during the development of large language models (LLMs), with the training data commonly integrating those evaluation benchmarks even unintentionally. This problem subsequently makes it hard to benchmark LLMs fairly. Instead of constructing contamination-free datasets (quite hard), we propose a novel framework, \textbf{LNE-Blocking}, to restore model performance prior to contamination on potentially leaked datasets. Our framework consist…

@arXiv_csLG_bot@mastoxiv.page
2025-09-18 10:13:01

From Distributional to Quantile Neural Basis Models: the case of Electricity Price Forecasting
Alessandro Brusaferri, Danial Ramin, Andrea Ballarino
https://arxiv.org/abs/2509.14113

From Distributional to Quantile Neural Basis Models: the case of Electricity Price Forecasting
While neural networks are achieving high predictive accuracy in multi-horizon probabilistic forecasting, understanding the underlying mechanisms that lead to feature-conditioned outputs remains a significant challenge for forecasters. In this work, we take a further step toward addressing this critical issue by introducing the Quantile Neural Basis Model, which incorporates the interpretability principles of Quantile Generalized Additive Models into an end-to-end neural network training framewo…

@arXiv_csAI_bot@mastoxiv.page
2025-09-12 07:31:39

ForTIFAI: Fending Off Recursive Training Induced Failure for AI Models
Soheil Zibakhsh Shabgahi, Pedram Aghazadeh, Azalia Mirhosseini, Farinaz Koushanfar
https://arxiv.org/abs/2509.08972

ForTIFAI: Fending Off Recursive Training Induced Failure for AI Models
The increasing reliance on generative AI models has accelerated the generation rate of synthetic data, with some projections suggesting that most available new data for training could be machine-generated by 2030. This shift to a mainly synthetic content presents a critical challenge: repeated training in synthetic data leads to a phenomenon known as model collapse, where model performance degrades over generations of training, eventually rendering the models ineffective. Although prior studies…

@arXiv_csCL_bot@mastoxiv.page
2025-09-18 09:43:51

DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning
Yaxin Gao, Yao Lu, Zongfei Zhang, Jiaqi Nie, Shanqing Yu, Qi Xuan
https://arxiv.org/abs/2509.13723

DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning
Large language models (LLMs) have achieved remarkable success in many natural language processing (NLP) tasks. To achieve more accurate output, the prompts used to drive LLMs have become increasingly longer, which incurs higher computational costs. To address this prompt inflation problem, prompt compression has been proposed. However, most existing methods require training a small auxiliary model for compression, incurring a significant amount of additional computation. To avoid this, we propo…

@arXiv_csRO_bot@mastoxiv.page
2025-10-15 10:12:01

Residual MPC: Blending Reinforcement Learning with GPU-Parallelized Model Predictive Control
Se Hwan Jeon, Ho Jae Lee, Seungwoo Hong, Sangbae Kim
https://arxiv.org/abs/2510.12717

Residual MPC: Blending Reinforcement Learning with GPU-Parallelized Model Predictive Control
Model Predictive Control (MPC) provides interpretable, tunable locomotion controllers grounded in physical models, but its robustness depends on frequent replanning and is limited by model mismatch and real-time computational constraints. Reinforcement Learning (RL), by contrast, can produce highly robust behaviors through stochastic training but often lacks interpretability, suffers from out-of-distribution failures, and requires intensive reward engineering. This work presents a GPU-paralleli…

@arXiv_csCV_bot@mastoxiv.page
2025-08-15 10:23:12

An Efficient Model-Driven Groupwise Approach for Atlas Construction
Ziwei Zou, Bei Zou, Xiaoyan Kui, Wenqi Lu, Haoran Dou, Arezoo Zakeri, Timothy Cootes, Alejandro F Frangi, Jinming Duan
https://arxiv.org/abs/2508.10743

An Efficient Model-Driven Groupwise Approach for Atlas Construction
Atlas construction is fundamental to medical image analysis, offering a standardized spatial reference for tasks such as population-level anatomical modeling. While data-driven registration methods have recently shown promise in pairwise settings, their reliance on large training datasets, limited generalizability, and lack of true inference phases in groupwise contexts hinder their practical use. In contrast, model-driven methods offer training-free, theoretically grounded, and data-efficient …

@arXiv_csSD_bot@mastoxiv.page
2025-09-17 10:00:10

UTI-LLM: A Personalized Articulatory-Speech Therapy Assistance System Based on Multimodal Large Language Model
Yudong Yang, Xiaokang Liu, Shaofeng zhao, Rongfeng Su, Nan Yan, Lan Wang
https://arxiv.org/abs/2509.13145

UTI-LLM: A Personalized Articulatory-Speech Therapy Assistance System Based on Multimodal Large Language Model
Speech therapy plays a critical role in training speech disorders caused by neurological impairments such as stroke. However, traditional manual and computer-assisted systems are limited in real-time accessibility and articulatory motion feedback, constraining their practical utility. Recent advances in multimodal large language models (MLLMs) have demonstrated significant potential in healthcare, particularly through their ability to integrate multimodal data for adaptive assessment and therap…

@arXiv_csCV_bot@mastoxiv.page
2025-08-15 10:26:32

Human-in-Context: Unified Cross-Domain 3D Human Motion Modeling via In-Context Learning
Mengyuan Liu, Xinshun Wang, Zhongbin Fang, Deheng Ye, Xia Li, Tao Tang, Songtao Wu, Xiangtai Li, Ming-Hsuan Yang
https://arxiv.org/abs/2508.10897

Human-in-Context: Unified Cross-Domain 3D Human Motion Modeling via In-Context Learning
This paper aims to model 3D human motion across domains, where a single model is expected to handle multiple modalities, tasks, and datasets. Existing cross-domain models often rely on domain-specific components and multi-stage training, which limits their practicality and scalability. To overcome these challenges, we propose a new setting to train a unified cross-domain model through a single process, eliminating the need for domain-specific components and multi-stage training. We first introd…

@arXiv_csRO_bot@mastoxiv.page
2025-08-15 09:31:42

CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model
Zhuoyuan Yu, Yuxing Long, Zihan Yang, Chengyan Zeng, Hongwei Fan, Jiyao Zhang, Hao Dong
https://arxiv.org/abs/2508.10416

CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model
Existing vision-and-language navigation models often deviate from the correct trajectory when executing instructions. However, these models lack effective error correction capability, hindering their recovery from errors. To address this challenge, we propose Self-correction Flywheel, a novel post-training paradigm. Instead of considering the model's error trajectories on the training set as a drawback, our paradigm emphasizes their significance as a valuable data source. We have developed a me…

@arXiv_csCL_bot@mastoxiv.page
2025-09-17 10:26:50

All Roads Lead to Rome: Graph-Based Confidence Estimation for Large Language Model Reasoning
Caiqi Zhang, Chang Shu, Ehsan Shareghi, Nigel Collier
https://arxiv.org/abs/2509.12908

All Roads Lead to Rome: Graph-Based Confidence Estimation for Large Language Model Reasoning
Confidence estimation is essential for the reliable deployment of large language models (LLMs). Existing methods are primarily designed for factual QA tasks and often fail to generalize to reasoning tasks. To address this gap, we propose a set of training-free, graph-based confidence estimation methods tailored to reasoning tasks. Our approach models reasoning paths as directed graphs and estimates confidence by exploiting graph properties such as centrality, path convergence, and path weightin…

@arXiv_csCR_bot@mastoxiv.page
2025-09-16 11:58:07

MAUI: Reconstructing Private Client Data in Federated Transfer Learning
Ahaan Dabholkar, Atul Sharma, Z. Berkay Celik, Saurabh Bagchi
https://arxiv.org/abs/2509.11451 https://…

MAUI: Reconstructing Private Client Data in Federated Transfer Learning
Recent works in federated learning (FL) have shown the utility of leveraging transfer learning for balancing the benefits of FL and centralized learning. In this setting, federated training happens after a stable point has been reached through conventional training. Global model weights are first centrally pretrained by the server on a public dataset following which only the last few linear layers (the classification head) of the model are finetuned across clients. In this scenario, existing da…

@arXiv_csAI_bot@mastoxiv.page
2025-08-15 09:37:12

Diversity First, Quality Later: A Two-Stage Assumption for Language Model Alignment
Zetian Sun, Dongfang Li, Baotian Hu
https://arxiv.org/abs/2508.10530 https://

Diversity First, Quality Later: A Two-Stage Assumption for Language Model Alignment
The alignment of language models (LMs) with human preferences is critical for building reliable AI systems. The problem is typically framed as optimizing an LM policy to maximize the expected reward that reflects human preferences. Recently, Direct Preference Optimization (DPO) was proposed as a LM alignment method that directly optimize the policy from static preference data, and further improved by incorporating on-policy sampling (i.e., preference candidates generated during the training loo…

@arXiv_csSD_bot@mastoxiv.page
2025-09-15 08:21:21

DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration
Yanru Huo, Ziyue Jiang, Zuoli Tang, Qingyang Hong, Zhou Zhao
https://arxiv.org/abs/2509.09748

DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration
While Diffusion Transformers (DiT) have advanced non-autoregressive (NAR) speech synthesis, their high computational demands remain an limitation. Existing DiT-based text-to-speech (TTS) model acceleration approaches mainly focus on reducing sampling steps through distillation techniques, yet they remain constrained by training costs. We introduce DiTReducio, a training-free acceleration framework that compresses computations in DiT-based TTS models via progressive calibration. We propose two c…

@arXiv_csLG_bot@mastoxiv.page
2025-09-15 09:48:01

FedBiF: Communication-Efficient Federated Learning via Bits Freezing
Shiwei Li, Qunwei Li, Haozhao Wang, Ruixuan Li, Jianbin Lin, Wenliang Zhong
https://arxiv.org/abs/2509.10161

FedBiF: Communication-Efficient Federated Learning via Bits Freezing
Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative model training without sharing local data. Despite its advantages, FL suffers from substantial communication overhead, which can affect training efficiency. Recent efforts have mitigated this issue by quantizing model updates to reduce communication costs. However, most existing methods apply quantization only after local training, introducing quantization errors into the trained parameters a…

@arXiv_csCL_bot@mastoxiv.page
2025-09-17 09:16:00

MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models
Vijay Govindarajan, Pratik Patel, Sahil Tripathi, Md Azizul Hoque, Gautam Siddharth Kashyap
https://arxiv.org/abs/2509.12591

MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models
Automated Audio Captioning (AAC) generates captions for audio clips but faces challenges due to limited datasets compared to image captioning. To overcome this, we propose the zero-shot AAC system that leverages pre-trained models, eliminating the need for extensive training. Our approach uses a pre-trained audio CLIP model to extract auditory features and generate a structured prompt, which guides a Large Language Model (LLM) in caption generation. Unlike traditional greedy decoding, our metho…

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:54:31

ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Long Cui, Weiyun Wang, Jie Shao, Zichen Wen, Gen Luo, Linfeng Zhang, Yanting Zhang, Yu Qiao, Wenhai Wang
https://arxiv.org/abs/2510.12793

ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Existing Multimodal Large Language Models (MLLMs) suffer from increased inference costs due to the additional vision tokens introduced by image inputs. In this work, we propose Visual Consistency Learning (ViCO), a novel training algorithm that enables the model to represent images of varying semantic complexities using different numbers of vision tokens. The key idea behind our method is to employ multiple MLP connectors, each with a different image compression ratio, to downsample the vision …

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 07:59:31

Task-Specific Dual-Model Framework for Comprehensive Traffic Safety Video Description and Analysis
Blessing Agyei Kyem, Neema Jakisa Owor, Andrews Danyo, Joshua Kofi Asamoah, Eugene Denteh, Tanner Muturi, Anthony Dontoh, Yaw Adu-Gyamfi, Armstrong Aboah
https://arxiv.org/abs/2510.11907

Task-Specific Dual-Model Framework for Comprehensive Traffic Safety Video Description and Analysis
Traffic safety analysis requires complex video understanding to capture fine-grained behavioral patterns and generate comprehensive descriptions for accident prevention. In this work, we present a unique dual-model framework that strategically utilizes the complementary strengths of VideoLLaMA and Qwen2.5-VL through task-specific optimization to address this issue. The core insight behind our approach is that separating training for captioning and visual question answering (VQA) tasks minimizes…

@arXiv_csLG_bot@mastoxiv.page
2025-10-15 10:46:41

Laminar: A Scalable Asynchronous RL Post-Training Framework
Guangming Sheng, Yuxuan Tong, Borui Wan, Wang Zhang, Chaobo Jia, Xibin Wu, Yuqi Wu, Xiang Li, Chi Zhang, Yanghua Peng, Haibin Lin, Xin Liu, Chuan Wu
https://arxiv.org/abs/2510.12633

Laminar: A Scalable Asynchronous RL Post-Training Framework
Reinforcement learning (RL) post-training for Large Language Models (LLMs) is now scaling to large clusters and running for extended durations to enhance model reasoning performance. However, the scalability of existing RL frameworks is limited, as extreme long-tail skewness in RL trajectory generation causes severe GPU underutilization. Current asynchronous RL systems attempt to mitigate this, but they rely on global weight synchronization between the actor and all rollouts, which creates a ri…

@arXiv_csCR_bot@mastoxiv.page
2025-08-15 09:12:32

FIDELIS: Blockchain-Enabled Protection Against Poisoning Attacks in Federated Learning
Jane Carney, Kushal Upreti, Gaby G. Dagher, Tim Andersen
https://arxiv.org/abs/2508.10042 …

FIDELIS: Blockchain-Enabled Protection Against Poisoning Attacks in Federated Learning
Federated learning enhances traditional deep learning by enabling the joint training of a model with the use of IoT device's private data. It ensures privacy for clients, but is susceptible to data poisoning attacks during training that degrade model performance and integrity. Current poisoning detection methods in federated learning lack a standardized detection method or take significant liberties with trust. In this paper, we present \Sys, a novel blockchain-enabled poison detection framewor…

@arXiv_csCV_bot@mastoxiv.page
2025-09-10 10:43:41

Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning
Daniel DeAlcala, Aythami Morales, Julian Fierrez, Gonzalo Mancera, Ruben Tolosana, Javier Ortega-Garcia
https://arxiv.org/abs/2509.07879

Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning
Active Membership Inference Test (aMINT) is a method designed to detect whether given data were used during the training of machine learning models. In Active MINT, we propose a novel multitask learning process that involves training simultaneously two models: the original or Audited Model, and a secondary model, referred to as the MINT Model, responsible for identifying the data used for training the Audited Model. This novel multi-task learning approach has been designed to incorporate the au…

@arXiv_csLG_bot@mastoxiv.page
2025-08-12 12:00:23

Revisiting Data Attribution for Influence Functions
Hongbo Zhu, Angelo Cangelosi
https://arxiv.org/abs/2508.07297 https://arxiv.org/pdf/2508.07297

Revisiting Data Attribution for Influence Functions
The goal of data attribution is to trace the model's predictions through the learning algorithm and back to its training data. thereby identifying the most influential training samples and understanding how the model's behavior leads to particular predictions. Understanding how individual training examples influence a model's predictions is fundamental for machine learning interpretability, data debugging, and model accountability. Influence functions, originating from robust statistics, offer …

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:32:00

Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
Yunxiang Zhang, Muhammad Khalifa, Lechen Zhang, Xin Liu, Ayoung Lee, Xinliang Frederick Zhang, Farima Fatahi Bayat, Lu Wang
https://arxiv.org/abs/2510.09354

Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
Large reasoning models exhibit long chain-of-thought reasoning with strategies such as backtracking and self-correction, though recent studies suggest that these abilities typically require additional training. We first investigate whether such behaviors can be elicited without any training. To this end, we propose a decoding-time approach, ThinkLogit, which utilizes logit arithmetic to tune a target large non-reasoning model for long reasoning using a substantially smaller reasoning model as t…

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 11:00:59

Training Dynamics Impact Post-Training Quantization Robustness
Albert Catalan-Tatjer, Niccol\`o Ajroldi, Jonas Geiping
https://arxiv.org/abs/2510.06213 https://

Training Dynamics Impact Post-Training Quantization Robustness
While post-training quantization is widely adopted for efficient deployment of large language models, the mechanisms underlying quantization robustness remain unclear. We conduct a comprehensive analysis of quantization degradation across open-source language model training trajectories up to 32B parameters and 15T training tokens to accurately assess the relationship between training dynamics and quantization performance. Our key finding is that quantization errors in large-scale training runs…

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 13:47:38

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
Haoran Feng, Dizhe Zhang, Xiangtai Li, Bo Du, Lu Qi
https://arxiv.org/abs/2510.11712 https://

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
In this work, we propose DiT360, a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation. For the issues of maintaining geometric fidelity and photorealism in generation quality, we attribute the main reason to the lack of large-scale, high-quality, real-world panoramic data, where such a data-centric view differs from prior methods that focus on model design. Basically, DiT360 has several key modules for inter-domain transformation a…

@arXiv_csLG_bot@mastoxiv.page
2025-09-16 12:46:17

Event2Vec: A Geometric Approach to Learning Composable Representations of Event Sequences
Antonin Sulc
https://arxiv.org/abs/2509.12188 https://arxiv.org/p…

Event2Vec: A Geometric Approach to Learning Composable Representations of Event Sequences
The study of neural representations, both in biological and artificial systems, is increasingly revealing the importance of geometric and topological structures. Inspired by this, we introduce Event2Vec, a novel framework for learning representations of discrete event sequences. Our model leverages a simple, additive recurrent structure to learn composable, interpretable embeddings. We provide a theoretical analysis demonstrating that, under specific training objectives, our model's learned rep…

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:46:21

LayerSync: Self-aligning Intermediate Layers
Yasaman Haghighi, Bastien van Delft, Mariam Hassan, Alexandre Alahi
https://arxiv.org/abs/2510.12581 https://a…

LayerSync: Self-aligning Intermediate Layers
We propose LayerSync, a domain-agnostic approach for improving the generation quality and the training efficiency of diffusion models. Prior studies have highlighted the connection between the quality of generation and the representations learned by diffusion models, showing that external guidance on model intermediate representations accelerates training. We reconceptualize this paradigm by regularizing diffusion models with their own intermediate representations. Building on the observation t…

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:41:38

Representation-Based Exploration for Language Models: From Test-Time to Post-Training
Jens Tuyls, Dylan J. Foster, Akshay Krishnamurthy, Jordan T. Ash
https://arxiv.org/abs/2510.11686

Representation-Based Exploration for Language Models: From Test-Time to Post-Training
Reinforcement learning (RL) promises to expand the capabilities of language models, but it is unclear if current RL techniques promote the discovery of novel behaviors, or simply sharpen those already present in the base model. In this paper, we investigate the value of deliberate exploration -- explicitly incentivizing the model to discover novel and diverse behaviors -- and aim to understand how the knowledge in pre-trained models can guide this search. Our main finding is that exploration wi…

@arXiv_csCV_bot@mastoxiv.page
2025-09-12 10:09:09

Exploring Pre-training Across Domains for Few-Shot Surgical Skill Assessment
Dimitrios Anastasiou, Razvan Caramalau, Nazir Sirajudeen, Matthew Boal, Philip Edwards, Justin Collins, John Kelly, Ashwin Sridhar, Maxine Tran, Faiz Mumtaz, Nevil Pavithran, Nader Francis, Danail Stoyanov, Evangelos B. Mazomenos
https://arxiv.org/abs/2509.09327…

Exploring Pre-training Across Domains for Few-Shot Surgical Skill Assessment
Automated surgical skill assessment (SSA) is a central task in surgical computer vision. Developing robust SSA models is challenging due to the scarcity of skill annotations, which are time-consuming to produce and require expert consensus. Few-shot learning (FSL) offers a scalable alternative enabling model development with minimal supervision, though its success critically depends on effective pre-training. While widely studied for several surgical downstream tasks, pre-training has remained …

@arXiv_csLG_bot@mastoxiv.page
2025-09-15 09:42:11

Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts
Strahinja Nikolic, Ilker Oguz, Demetri Psaltis
https://arxiv.org/abs/2509.10025 https:…

Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts
Understanding the internal organization of neural networks remains a fundamental challenge in deep learning interpretability. We address this challenge by exploring a novel Sparse Mixture of Experts Variational Autoencoder (SMoE-VAE) architecture. We test our model on the QuickDraw dataset, comparing unsupervised expert routing against a supervised baseline guided by ground-truth labels. Surprisingly, we find that unsupervised routing consistently achieves superior reconstruction performance. T…

@arXiv_csLG_bot@mastoxiv.page
2025-09-12 09:22:29

Open-sci-ref-0.01: open and reproducible reference baselines for language model and dataset comparison
Marianna Nezhurina, Taishi Nakamura, Timur Carstensen, Niccol\`o Ajroldi, Ville Komulainen, David Salinas, Jenia Jitsev
https://arxiv.org/abs/2509.09009

Open-sci-ref-0.01: open and reproducible reference baselines for language model and dataset comparison
We introduce open-sci-ref, a family of dense transformer models trained as research baselines across multiple model (0.13B to 1.7B parameters) and token scales (up to 1T) on 8 recent open reference datasets. Evaluating the models on various standardized benchmarks, our training runs set establishes reference points that enable researchers to assess the sanity and quality of alternative training approaches across scales and datasets. Intermediate checkpoints allow comparison and studying of the …

Tootfinder

Opt-in global Mastodon full text search. Join the index!