Tootfinder

@arXiv_csAI_bot@mastoxiv.page
2025-09-16 11:04:06

Learning Representations in Video Game Agents with Supervised Contrastive Imitation Learning
Carlos Celemin, Joseph Brennan, Pierluigi Vito Amadori, Tim Bradley
https://arxiv.org/abs/2509.11880

Learning Representations in Video Game Agents with Supervised Contrastive Imitation Learning
This paper introduces a novel application of Supervised Contrastive Learning (SupCon) to Imitation Learning (IL), with a focus on learning more effective state representations for agents in video game environments. The goal is to obtain latent representations of the observations that capture better the action-relevant factors, thereby modeling better the cause-effect relationship from the observations that are mapped to the actions performed by the demonstrator, for example, the player jumps wh…

@arXiv_csCL_bot@mastoxiv.page
2025-09-16 12:23:37

GTA: Supervised-Guided Reinforcement Learning for Text Classification with Large Language Models
Min Zeng, Jinfei Sun, Xueyou Luo, Caiquan Liu, Shiqi Zhang, Li Xie, Xiaoxin Chen
https://arxiv.org/abs/2509.12108

GTA: Supervised-Guided Reinforcement Learning for Text Classification with Large Language Models
In natural language processing tasks, pure reinforcement learning (RL) fine-tuning methods often suffer from inefficient exploration and slow convergence; while supervised fine-tuning (SFT) methods, although efficient in training, have limited performance ceiling and less solid theoretical foundation compared to RL. To address efficiency-capability trade-off, we propose the Guess-Think-Answer (GTA) framework that combines the efficiency of SFT with the capability gains of RL in a unified traini…

@arXiv_csCV_bot@mastoxiv.page
2025-09-17 10:54:10

Enhancing Dual Network Based Semi-Supervised Medical Image Segmentation with Uncertainty-Guided Pseudo-Labeling
Yunyao Lu, Yihang Wu, Ahmad Chaddad, Tareef Daqqaq, Reem Kateb
https://arxiv.org/abs/2509.13084

Enhancing Dual Network Based Semi-Supervised Medical Image Segmentation with Uncertainty-Guided Pseudo-Labeling
Despite the remarkable performance of supervised medical image segmentation models, relying on a large amount of labeled data is impractical in real-world situations. Semi-supervised learning approaches aim to alleviate this challenge using unlabeled data through pseudo-label generation. Yet, existing semi-supervised segmentation methods still suffer from noisy pseudo-labels and insufficient supervision within the feature space. To solve these challenges, this paper proposes a novel semi-superv…

@arXiv_csLG_bot@mastoxiv.page
2025-09-16 12:40:17

Learning from Uncertain Similarity and Unlabeled Data
Meng Wei, Zhongnian Li, Peng Ying, Xinzheng Xu
https://arxiv.org/abs/2509.11984 https://arxiv.org/pdf…

Learning from Uncertain Similarity and Unlabeled Data
Existing similarity-based weakly supervised learning approaches often rely on precise similarity annotations between data pairs, which may inadvertently expose sensitive label information and raise privacy risks. To mitigate this issue, we propose Uncertain Similarity and Unlabeled Learning (USimUL), a novel framework where each similarity pair is embedded with an uncertainty component to reduce label leakage. In this paper, we propose an unbiased risk estimator that learns from uncertain simil…

@arXiv_eessIV_bot@mastoxiv.page
2025-09-16 09:23:37

Data-Efficient Psychiatric Disorder Detection via Self-supervised Learning on Frequency-enhanced Brain Networks
Mujie Liu, Mengchu Zhu, Qichao Dong, Ting Dang, Jiangang Ma, Jing Ren, Feng Xia
https://arxiv.org/abs/2509.10524

Data-Efficient Psychiatric Disorder Detection via Self-supervised Learning on Frequency-enhanced Brain Networks
Psychiatric disorders involve complex neural activity changes, with functional magnetic resonance imaging (fMRI) data serving as key diagnostic evidence. However, data scarcity and the diverse nature of fMRI information pose significant challenges. While graph-based self-supervised learning (SSL) methods have shown promise in brain network analysis, they primarily focus on time-domain representations, often overlooking the rich information embedded in the frequency domain. To overcome these lim…

@arXiv_quantph_bot@mastoxiv.page
2025-09-16 12:11:07

Learning kernels with quantum optical circuits
A. Mandilara, A. D. Papadopoulos, D. Syvridis
https://arxiv.org/abs/2509.12072 https://arxiv.org/pdf/2509.12…

Learning kernels with quantum optical circuits
Support Vector Machines (SVMs) are a cornerstone of supervised learning, widely used for data classification. A central component of their success lies in kernel functions, which enable efficient computation of inner products in high-dimensional feature spaces. Recent years have seen growing interest in leveraging quantum circuits --both qubit-based and quantum optical-- for computing kernel matrices, with ongoing research exploring potential quantum advantages. In this work, we investigate two…

@arXiv_csSE_bot@mastoxiv.page
2025-09-16 10:07:06

Weakly Supervised Vulnerability Localization via Multiple Instance Learning
Wenchao Gu, Yupan Chen, Yanlin Wang, Hongyu Zhang, Cuiyun Gao, Michael R. Lyu
https://arxiv.org/abs/2509.11312

Weakly Supervised Vulnerability Localization via Multiple Instance Learning
Software vulnerability detection has emerged as a significant concern in the field of software security recently, capturing the attention of numerous researchers and developers. Most previous approaches focus on coarse-grained vulnerability detection, such as at the function or file level. However, the developers would still encounter the challenge of manually inspecting a large volume of code inside the vulnerable function to identify the specific vulnerable statements for modification, indica…

@arXiv_csSI_bot@mastoxiv.page
2025-09-17 08:09:10

Accurate Trust Evaluation for Effective Operation of Social IoT Systems via Hypergraph-Enabled Self-Supervised Contrastive Learning
Botao Zhu, Xianbin Wang
https://arxiv.org/abs/2509.12240

Accurate Trust Evaluation for Effective Operation of Social IoT Systems via Hypergraph-Enabled Self-Supervised Contrastive Learning
Social Internet-of-Things (IoT) enhances collaboration between devices by endowing IoT systems with social attributes. However, calculating trust between devices based on complex and dynamic social attributes-similar to trust formation mechanisms in human society-poses a significant challenge. To address this issue, this paper presents a new hypergraph-enabled self-supervised contrastive learning (HSCL) method to accurately determine trust values between devices. To implement the proposed HSCL,…

@arXiv_statML_bot@mastoxiv.page
2025-09-16 09:57:16

Learning Majority-to-Minority Transformations with MMD and Triplet Loss for Imbalanced Classification
Suman Cha, Hyunjoong Kim
https://arxiv.org/abs/2509.11511 https://

Learning Majority-to-Minority Transformations with MMD and Triplet Loss for Imbalanced Classification
Class imbalance in supervised classification often degrades model performance by biasing predictions toward the majority class, particularly in critical applications such as medical diagnosis and fraud detection. Traditional oversampling techniques, including SMOTE and its variants, generate synthetic minority samples via local interpolation but fail to capture global data distributions in high-dimensional spaces. Deep generative models based on GANs offer richer distribution modeling yet suffe…

@arXiv_csLG_bot@mastoxiv.page
2025-07-17 10:13:20

Online Training and Pruning of Deep Reinforcement Learning Networks
Valentin Frank Ingmar Guenter, Athanasios Sideris
https://arxiv.org/abs/2507.11975 http…

Online Training and Pruning of Deep Reinforcement Learning Networks
Scaling deep neural networks (NN) of reinforcement learning (RL) algorithms has been shown to enhance performance when feature extraction networks are used but the gained performance comes at the significant expense of increased computational and memory complexity. Neural network pruning methods have successfully addressed this challenge in supervised learning. However, their application to RL is underexplored. We propose an approach to integrate simultaneous training and pruning within advance…

@arXiv_csRO_bot@mastoxiv.page
2025-08-14 09:30:32

PPL: Point Cloud Supervised Proprioceptive Locomotion Reinforcement Learning for Legged Robots in Crawl Spaces
Bida Ma, Nuo Xu, Chenkun Qi, Xin Liu, Yule Mo, Jinkai Wang, Chunpeng Lu
https://arxiv.org/abs/2508.09950

PPL: Point Cloud Supervised Proprioceptive Locomotion Reinforcement Learning for Legged Robots in Crawl Spaces
The legged locomotion in spatially constrained structures (called crawl spaces) is challenging. In crawl spaces, current exteroceptive locomotion learning methods are limited by large noises and errors of the sensors in possible low visibility conditions, and current proprioceptive locomotion learning methods are difficult in traversing crawl spaces because only ground features are inferred. In this study, a point cloud supervised proprioceptive locomotion reinforcement learning method for legg…

@arXiv_eessIV_bot@mastoxiv.page
2025-09-16 10:04:07

Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning
Jin Yang, Daniel S. Marcus, Aristeidis Sotiras
https://arxiv.org/abs/2509.10784

Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning
Medical Vision Foundation Models (Med-VFMs) have superior capabilities of interpreting medical images due to the knowledge learned from self-supervised pre-training with extensive unannotated images. To improve their performance on adaptive downstream evaluations, especially segmentation, a few samples from target domains are selected randomly for fine-tuning them. However, there lacks works to explore the way of adapting Med-VFMs to achieve the optimal performance on target domains efficiently…

@arXiv_csCV_bot@mastoxiv.page
2025-08-15 10:24:52

Self-Supervised Stereo Matching with Multi-Baseline Contrastive Learning
Peng Xu, Zhiyu Xiang, Jingyun Fu, Tianyu Pu, Kai Wang, Chaojie Ji, Tingming Bai, Eryun Liu
https://arxiv.org/abs/2508.10838

Self-Supervised Stereo Matching with Multi-Baseline Contrastive Learning
Current self-supervised stereo matching relies on the photometric consistency assumption, which breaks down in occluded regions due to ill-posed correspondences. To address this issue, we propose BaCon-Stereo, a simple yet effective contrastive learning framework for self-supervised stereo network training in both non-occluded and occluded regions. We adopt a teacher-student paradigm with multi-baseline inputs, in which the stereo pairs fed into the teacher and student share the same reference …

@arXiv_quantph_bot@mastoxiv.page
2025-09-15 09:44:51

Loss Behavior in Supervised Learning with Entangled States
Alexander Mandl, Johanna Barzen, Marvin Bechtold, Frank Leymann, Lavinia Stiliadou
https://arxiv.org/abs/2509.10141 ht…

Loss Behavior in Supervised Learning with Entangled States
Quantum Machine Learning (QML) aims to leverage the principles of quantum mechanics to speed up the process of solving machine learning problems or improve the quality of solutions. Among these principles, entanglement with an auxiliary system was shown to increase the quality of QML models in applications such as supervised learning. Recent works focus on the information that can be extracted from entangled training samples and their effect on the approximation error of the trained model. Howe…

@arXiv_eessSP_bot@mastoxiv.page
2025-08-15 09:28:12

Unsupervised Deep Equilibrium Model Learning for Large-Scale Channel Estimation with Performance Guarantees
Haotian Tian, Lixiang Lian
https://arxiv.org/abs/2508.10546 https://

Unsupervised Deep Equilibrium Model Learning for Large-Scale Channel Estimation with Performance Guarantees
Supervised deep learning methods have shown promise for large-scale channel estimation (LCE), but their reliance on ground-truth channel labels greatly limits their practicality in real-world systems. In this paper, we propose an unsupervised learning framework for LCE that does not require ground-truth channels. The proposed approach leverages Generalized Stein's Unbiased Risk Estimate (GSURE) as a principled unsupervised loss function, which provides an unbiased estimate of the projected mean…

@arXiv_csSD_bot@mastoxiv.page
2025-09-15 08:43:31

Prototypical Contrastive Learning For Improved Few-Shot Audio Classification
Christos Sgouropoulos, Christos Nikou, Stefanos Vlachos, Vasileios Theiou, Christos Foukanelis, Theodoros Giannakopoulos
https://arxiv.org/abs/2509.10074

Prototypical Contrastive Learning For Improved Few-Shot Audio Classification
Few-shot learning has emerged as a powerful paradigm for training models with limited labeled data, addressing challenges in scenarios where large-scale annotation is impractical. While extensive research has been conducted in the image domain, few-shot learning in audio classification remains relatively underexplored. In this work, we investigate the effect of integrating supervised contrastive loss into prototypical few shot training for audio classification. In detail, we demonstrate that an…

@arXiv_csCR_bot@mastoxiv.page
2025-09-16 12:09:07

NeuroStrike: Neuron-Level Attacks on Aligned LLMs
Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami, Maximilian Thang, Stjepan Picek, Ahmad-Reza Sadeghi
https://arxiv.org/abs/2509.11864

NeuroStrike: Neuron-Level Attacks on Aligned LLMs
Safety alignment is critical for the ethical deployment of large language models (LLMs), guiding them to avoid generating harmful or unethical content. Current alignment techniques, such as supervised fine-tuning and reinforcement learning from human feedback, remain fragile and can be bypassed by carefully crafted adversarial prompts. Unfortunately, such attacks rely on trial and error, lack generalizability across models, and are constrained by scalability and reliability. This paper presen…

@arXiv_eessAS_bot@mastoxiv.page
2025-10-15 09:09:02

DeePAQ: A Perceptual Audio Quality Metric Based On Foundational Models and Weakly Supervised Learning
Guanxin Jiang, Andreas Brendel, Pablo M. Delgado, J\"urgen Herre
https://arxiv.org/abs/2510.12326

DeePAQ: A Perceptual Audio Quality Metric Based On Foundational Models and Weakly Supervised Learning
This paper presents the Deep learning-based Perceptual Audio Quality metric (DeePAQ) for evaluating general audio quality. Our approach leverages metric learning together with the music foundation model MERT, guided by surrogate labels, to construct an embedding space that captures distortion intensity in general audio. To the best of our knowledge, DeePAQ is the first in the general audio quality domain to leverage weakly supervised labels and metric learning for fine-tuning a music foundation…

@arXiv_physicscompph_bot@mastoxiv.page
2025-09-15 08:55:11

Supervised and unsupervised learning with numerical computation for the Wolfram cellular automata
Kui Tuo, Shengfeng Deng, Yuxiang Yang, Yanyang Wang, Qiuping A. Wang, Wei Li, Wenjun Zhang
https://arxiv.org/abs/2509.10209

Supervised and unsupervised learning with numerical computation for the Wolfram cellular automata
The local rules of Wolfram cellular automata with one-dimensional three-cell neighborhoods are represented by eight-bit binary that encode deterministic update rules. These automata are widely utilized to investigate self-organization phenomena and the dynamics of complex systems. In this work, we employ numerical simulations and computational methods to investigate the asymptotic density and dynamical evolution mechanisms in Wolfram automata. We apply both supervised and unsupervised learning …

@arXiv_csIR_bot@mastoxiv.page
2025-10-14 10:00:58

Self-Supervised Representation Learning with ID-Content Modality Alignment for Sequential Recommendation
Donglin Zhou, Weike Pan, Zhong Ming
https://arxiv.org/abs/2510.10556 htt…

Self-Supervised Representation Learning with ID-Content Modality Alignment for Sequential Recommendation
Sequential recommendation (SR) models often capture user preferences based on the historically interacted item IDs, which usually obtain sub-optimal performance when the interaction history is limited. Content-based sequential recommendation has recently emerged as a promising direction that exploits items' textual and visual features to enhance preference learning. However, there are still three key challenges: (i) how to reduce the semantic gap between different content modality representatio…

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-09-15 08:47:41

Elemental Frequency-Based Supervised Classification Approach for the Search of Novel Topological Materials
Zodinpuia Ralte, Ramesh Kumar, Mukhtiyar Singh
https://arxiv.org/abs/2509.09978

Elemental Frequency-Based Supervised Classification Approach for the Search of Novel Topological Materials
The machine learning based approaches efficiently solves the goal of searching the best materials candidate for the targeted properties. The search for topological materials using traditional first-principles and symmetry-based methods often requires lots of computing power or is limited by the crystalline symmetries. In this study, we present frequency-based statistical descriptors for machine learning-driven topological material's classification that is independent of crystallographic symmetr…

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 10:04:11

SSL-AD: Spatiotemporal Self-Supervised Learning for Generalizability and Adaptability Across Alzheimer's Prediction Tasks and Datasets
Emily Kaczmarek, Justin Szeto, Brennan Nichyporuk, Tal Arbel
https://arxiv.org/abs/2509.10453

SSL-AD: Spatiotemporal Self-Supervised Learning for Generalizability and Adaptability Across Alzheimer's Prediction Tasks and Datasets
Alzheimer's disease is a progressive, neurodegenerative disorder that causes memory loss and cognitive decline. While there has been extensive research in applying deep learning models to Alzheimer's prediction tasks, these models remain limited by lack of available labeled data, poor generalization across datasets, and inflexibility to varying numbers of input scans and time intervals between scans. In this study, we adapt three state-of-the-art temporal self-supervised learning (SSL) approach…

@arXiv_csAI_bot@mastoxiv.page
2025-10-15 10:24:31

Towards Robust Artificial Intelligence: Self-Supervised Learning Approach for Out-of-Distribution Detection
Wissam Salhab, Darine Ameyed, Hamid Mcheick, Fehmi Jaafar
https://arxiv.org/abs/2510.12713

Towards Robust Artificial Intelligence: Self-Supervised Learning Approach for Out-of-Distribution Detection
Robustness in AI systems refers to their ability to maintain reliable and accurate performance under various conditions, including out-of-distribution (OOD) samples, adversarial attacks, and environmental changes. This is crucial in safety-critical systems, such as autonomous vehicles, transportation, or healthcare, where malfunctions could have severe consequences. This paper proposes an approach to improve OOD detection without the need of labeled data, thereby increasing the AI systems' robu…

@arXiv_csLG_bot@mastoxiv.page
2025-10-15 10:51:31

Few Shot Semi-Supervised Learning for Abnormal Stop Detection from Sparse GPS Trajectories
Muhammad Ayub Sabir, Junbiao Pang, Jiaqi Wu, Fatima Ashraf
https://arxiv.org/abs/2510.12686

Few Shot Semi-Supervised Learning for Abnormal Stop Detection from Sparse GPS Trajectories
Abnormal stop detection (ASD) in intercity coach transportation is critical for ensuring passenger safety, operational reliability, and regulatory compliance. However, two key challenges hinder ASD effectiveness: sparse GPS trajectories, which obscure short or unauthorized stops, and limited labeled data, which restricts supervised learning. Existing methods often assume dense sampling or regular movement patterns, limiting their applicability. To address data sparsity, we propose a Sparsity-Aw…

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 10:44:11

Reasoning Pattern Matters: Learning to Reason without Human Rationales
Chaoxu Pang, Yixuan Cao, Ping Luo
https://arxiv.org/abs/2510.12643 https://arxiv.org…

Reasoning Pattern Matters: Learning to Reason without Human Rationales
Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities under the widely adopted SFT+RLVR paradigm, which first performs Supervised Fine-Tuning (SFT) on human-annotated reasoning trajectories (rationales) to establish initial reasoning behaviors, then applies Reinforcement Learning with Verifiable Rewards (RLVR) to optimize the model using verifiable signals without golden rationales. However, annotating high-quality rationales for the SFT stage remains prohibitively ex…

@arXiv_csRO_bot@mastoxiv.page
2025-09-15 09:37:51

Self-supervised Learning Of Visual Pose Estimation Without Pose Labels By Classifying LED States
Nicholas Carlotti, Mirko Nava, Alessandro Giusti
https://arxiv.org/abs/2509.10405

Self-supervised Learning Of Visual Pose Estimation Without Pose Labels By Classifying LED States
We introduce a model for monocular RGB relative pose estimation of a ground robot that trains from scratch without pose labels nor prior knowledge about the robot's shape or appearance. At training time, we assume: (i) a robot fitted with multiple LEDs, whose states are independent and known at each frame; (ii) knowledge of the approximate viewing direction of each LED; and (iii) availability of a calibration image with a known target distance, to address the ambiguity of monocular depth estima…

@arXiv_quantph_bot@mastoxiv.page
2025-10-15 10:24:21

Detection of quantum information masking via machine learning
Sheng-Ao Mao, Lin Zhang, Bo Li
https://arxiv.org/abs/2510.12507 https://arxiv.org/pdf/2510.12…

Detection of quantum information masking via machine learning
Recently, machine learning has been widely applied in the field of quantum information, notably in tasks such as entanglement detection, steering characterization, and nonlocality verification. However, few studies have focused on utilizing machine learning to detect quantum information masking. In this work, we investigate supervised machine learning for detecting quantum information masking in both pure and mixed qubit states. For pure qubit states, we randomly generate the corresponding dens…

@arXiv_csHC_bot@mastoxiv.page
2025-10-10 08:41:59

A Systematic Evaluation of Self-Supervised Learning for Label-Efficient Sleep Staging with Wearable EEG
Emilio Estevan, Mar\'ia Sierra-Torralba, Eduardo L\'opez-Larraz, Luis Montesano
https://arxiv.org/abs/2510.07960

A Systematic Evaluation of Self-Supervised Learning for Label-Efficient Sleep Staging with Wearable EEG
Wearable EEG devices have emerged as a promising alternative to polysomnography (PSG). As affordable and scalable solutions, their widespread adoption results in the collection of massive volumes of unlabeled data that cannot be analyzed by clinicians at scale. Meanwhile, the recent success of deep learning for sleep scoring has relied on large annotated datasets. Self-supervised learning (SSL) offers an opportunity to bridge this gap, leveraging unlabeled signals to address label scarcity and …

@arXiv_astrophEP_bot@mastoxiv.page
2025-08-14 08:46:32

Machine Learning for Exoplanet Detection: A Comparative Analysis Using Kepler Data
Reihaneh Karimi, Mahdiyar Mousavi-Sadr, Mohammad H. Zhoolideh Haghighi, Fatemeh S. Tabatabaei
https://arxiv.org/abs/2508.09689

Machine Learning for Exoplanet Detection: A Comparative Analysis Using Kepler Data
The discovery of exoplanets has expanded our understanding of planetary systems and opened new avenues for astronomical research. In this study, we present a machine learning (ML) framework for exoplanet identification using a time-series photometric dataset from the Kepler Space Telescope, comprising 3,198 flux measurements across 5,074 stars. We investigate the performance of four supervised classification algorithms, namely Random Forest, k-Nearest Neighbors (KNN), Decision Tree, and Logisti…

@arXiv_mathNA_bot@mastoxiv.page
2025-08-14 09:20:02

Fast and Simple Multiclass Data Segmentation: An Eigendecomposition and Projection-Free Approach
Chiara Faccio, Margherita Porcelli, Francesco Rinaldi, Martin Stoll
https://arxiv.org/abs/2508.09738

Fast and Simple Multiclass Data Segmentation: An Eigendecomposition and Projection-Free Approach
Graph-based machine learning has seen an increased interest over the last decade with many connections to other fields of applied mathematics. Learning based on partial differential equations, such as the phase-field Allen-Cahn equation, allows efficient handling of semi-supervised learning approaches on graphs. The numerical solution of the graph Allen-Cahn equation via a convexity splitting or the Merriman-Bence-Osher (MBO) scheme, albeit being a widely used approach, requires the calculation…

@arXiv_csSD_bot@mastoxiv.page
2025-10-14 11:06:29

SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation
Ummy Maria Muna, Md Mehedi Hasan Shawon, Md Jobayer, Sumaiya Akter, Md Rakibul Hasan, Md. Golam Rabiul Alam
https://arxiv.org/abs/2510.10719

SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation
The automated analysis of phonocardiograms is vital for the early diagnosis of cardiovascular disease, yet supervised deep learning is often constrained by the scarcity of expert-annotated data. In this paper, we propose the Self-Supervised Dual-Path Prototypical Network (SS-DPPN), a foundation model for cardiac audio representation and classification from unlabeled data. The framework introduces a dual-path contrastive learning based architecture that simultaneously processes 1D waveforms and …

@arXiv_astrophHE_bot@mastoxiv.page
2025-10-13 08:14:30

Identification of Gamma Ray Pulsar Candidates in the \emph{Fermi}-LAT 4FGL-DR4 Unassociated Sources Using Supervised Machine Learning
A. Pathania, K. K. Singh, S. K. Singh, A. Tolamatti, B. B. Singh, K. K. Yadav
https://arxiv.org/abs/2510.08654

Identification of Gamma Ray Pulsar Candidates in the \emph{Fermi}-LAT 4FGL-DR4 Unassociated Sources Using Supervised Machine Learning
The Large Area Telescope (LAT) on board the \emph{Fermi} Gamma-ray Space Telescope has been continuously providing good quality survey data of the entire sky in the high energy range from 30 MeV to 500 GeV and above since August 2008. A succession of gamma-ray source catalogs is published after a comprehensive analysis of the \emph{Fermi}--LAT data. The most recent release of data in the fourth \emph{Fermi}--LAT catalog of gamma-ray sources (4FGL-DR4), based on the first 14 years of obser…

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:51:21

PET Head Motion Estimation Using Supervised Deep Learning with Attention
Zhuotong Cai, Tianyi Zeng, Jiazhen Zhang, El\'eonore V. Lieffrig, Kathryn Fontaine, Chenyu You, Enette Mae Revilla, James S. Duncan, Jingmin Xin, Yihuan Lu, John A. Onofrey
https://arxiv.org/abs/2510.12758

PET Head Motion Estimation Using Supervised Deep Learning with Attention
Head movement poses a significant challenge in brain positron emission tomography (PET) imaging, resulting in image artifacts and tracer uptake quantification inaccuracies. Effective head motion estimation and correction are crucial for precise quantitative image analysis and accurate diagnosis of neurological disorders. Hardware-based motion tracking (HMT) has limited applicability in real-world clinical practice. To overcome this limitation, we propose a deep-learning head motion correction a…

@arXiv_csCR_bot@mastoxiv.page
2025-08-15 09:33:32

BERTector: Intrusion Detection Based on Joint-Dataset Learning
Haoyang Hu, Xun Huang, Chenyu Wu, Shiwen Liu, Zhichao Lian, Shuangquan Zhang
https://arxiv.org/abs/2508.10327 http…

BERTector: Intrusion Detection Based on Joint-Dataset Learning
Intrusion detection systems (IDS) are facing challenges in generalization and robustness due to the heterogeneity of network traffic and the diversity of attack patterns. To address this issue, we propose a new joint-dataset training paradigm for IDS and propose a scalable BERTector framework based on BERT. BERTector integrates three key components: NSS-Tokenizer for traffic-aware semantic tokenization, supervised fine-tuning with a hybrid dataset, and low-rank adaptation (LoRA) for efficient t…

@arXiv_csCL_bot@mastoxiv.page
2025-08-15 10:10:02

Making Qwen3 Think in Korean with Reinforcement Learning
Jungyup Lee, Jemin Kim, Sang Park, SeungJae Lee
https://arxiv.org/abs/2508.10355 https://arxiv.org…

Making Qwen3 Think in Korean with Reinforcement Learning
We present a two-stage fine-tuning approach to make the large language model Qwen3 14B "think" natively in Korean. In the first stage, supervised fine-tuning (SFT) on a high-quality Korean reasoning dataset establishes a strong foundation in Korean logical reasoning, yielding notable improvements in Korean-language tasks and even some gains in general reasoning ability. In the second stage, we employ reinforcement learning with a customized Group Relative Policy Optimization (GRPO) algorithm to…

@arXiv_csLG_bot@mastoxiv.page
2025-07-17 13:51:38

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[1/5]:
- A Computational Theory and Semi-Supervised Algorithm for Clustering
Nassir Mohammad

@arXiv_qbioTO_bot@mastoxiv.page
2025-09-16 08:07:36

MAE-SAM2: Mask Autoencoder-Enhanced SAM2 for Clinical Retinal Vascular Leakage Segmentation
Xin Xing, Irmak Karaca, Samira Badrloo, Quan Dong Nguyen, Mahadevan Subramaniam
https://arxiv.org/abs/2509.10554

MAE-SAM2: Mask Autoencoder-Enhanced SAM2 for Clinical Retinal Vascular Leakage Segmentation
We propose MAE-SAM2, a novel foundation model for retinal vascular leakage segmentation on fluorescein angiography images. Due to the small size and dense distribution of the leakage areas, along with the limited availability of labeled clinical data, this presents a significant challenge for segmentation tasks. Our approach integrates a Self-Supervised learning (SSL) strategy, Masked Autoencoder (MAE), with SAM2. In our implementation, we explore different loss functions and conclude a task-sp…

@arXiv_statML_bot@mastoxiv.page
2025-09-10 08:28:51

Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space
M. Hadi Sepanj, Benyamin Ghojogh, Paul Fieguth
https://arxiv.org/abs/2509.07289 https://

Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space
Self-supervised learning (SSL) has emerged as a powerful paradigm for representation learning by optimizing geometric objectives--such as invariance to augmentations, variance preservation, and feature decorrelation--without requiring labels. However, most existing methods operate in Euclidean space, limiting their ability to capture nonlinear dependencies and geometric structures. In this work, we propose Kernel VICReg, a novel self-supervised learning framework that lifts the VICReg objective…

@arXiv_eessSP_bot@mastoxiv.page
2025-10-13 09:35:30

IF-D: A High-Frequency, General-Purpose Inertial Foundation Dataset for Self-Supervised Learning
Patrick Ferreira, Paula Costa
https://arxiv.org/abs/2510.09539 https://

IF-D: A High-Frequency, General-Purpose Inertial Foundation Dataset for Self-Supervised Learning
We present IF-D, a large-scale inertial dataset designed to enable self-supervised and foundational learning for IMU time series. IF-D comprises continuous, long-duration multichannel recordings (accelerometer, gyroscope, magnetometer) sampled at 200Hz using a UM7 IMU mounted inside a 3D-printed spherical enclosure that promotes diverse, free rotations during vehicle traversal. The collection spans approximately 135 minutes of recording, yielding around 1.6 million samples across nine sensor ch…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-11 08:36:40

Value Function Approximation for Nonlinear MPC: Learning a Terminal Cost Function with a Descent Property
T. M. J. T. Baltussen, C. A. Orrico, A. Katriniok, W. P. M. H. Heemels, D. Krishnamoorthy
https://arxiv.org/abs/2508.05804

Value Function Approximation for Nonlinear MPC: Learning a Terminal Cost Function with a Descent Property
We present a novel method to synthesize a terminal cost function for a nonlinear model predictive controller (MPC) through value function approximation using supervised learning. Existing methods enforce a descent property on the terminal cost function by construction, thereby restricting the class of terminal cost functions, which in turn can limit the performance and applicability of the MPC. We present a method to approximate the true cost-to-go with a general function approximator that is c…

@arXiv_csAI_bot@mastoxiv.page
2025-09-12 08:14:09

Anti-Money Laundering Machine Learning Pipelines; A Technical Analysis on Identifying High-risk Bank Clients with Supervised Learning
Khashayar Namdar, Pin-Chien Wang, Tushar Raju, Steven Zheng, Fiona Li, Safwat Tahmin Khan
https://arxiv.org/abs/2509.09127

Anti-Money Laundering Machine Learning Pipelines; A Technical Analysis on Identifying High-risk Bank Clients with Supervised Learning
Anti-money laundering (AML) actions and measurements are among the priorities of financial institutions, for which machine learning (ML) has shown to have a high potential. In this paper, we propose a comprehensive and systematic approach for developing ML pipelines to identify high-risk bank clients in a dataset curated for Task 1 of the University of Toronto 2023-2024 Institute for Management and Innovation (IMI) Big Data and Artificial Intelligence Competition. The dataset included 195,789 c…

@arXiv_condmatdisnn_bot@mastoxiv.page
2025-08-14 07:58:52

Learning complexity of many-body quantum sign structures through the lens of Boolean Fourier analysis
Ilya Schurov, Anna Kravchenko, Mikhail I. Katsnelson, Andrey A. Bagrov, Tom Westerhout
https://arxiv.org/abs/2508.09870

Learning complexity of many-body quantum sign structures through the lens of Boolean Fourier analysis
We study sign structures of the ground states of spin-$1/2$ magnetic systems using the methods of Boolean Fourier analysis. Previously it was shown that the sign structures of frustrated systems are of complex nature: specifically, neural networks of popular architectures lack the generalization ability necessary to effectively reconstruct sign structures in supervised learning settings. This is believed to be an obstacle for applications of neural quantum states to frustrated systems. In the p…

@arXiv_physicsaoph_bot@mastoxiv.page
2025-09-10 08:14:21

Understanding Ice Crystal Habit Diversity with Self-Supervised Learning
Joseph Ko, Hariprasath Govindarajan, Fredrik Lindsten, Vanessa Przybylo, Kara Sulia, Marcus van Lier-Walqui, Kara Lamb
https://arxiv.org/abs/2509.07688

Understanding Ice Crystal Habit Diversity with Self-Supervised Learning
Ice-containing clouds strongly impact climate, but they are hard to model due to ice crystal habit (i.e., shape) diversity. We use self-supervised learning (SSL) to learn latent representations of crystals from ice crystal imagery. By pre-training a vision transformer with many cloud particle images, we learn robust representations of crystal morphology, which can be used for various science-driven tasks. Our key contributions include (1) validating that our SSL approach can be used to learn me…

@arXiv_csCE_bot@mastoxiv.page
2025-10-13 07:31:50

Few-shot Molecular Property Prediction: A Survey
Zeyu Wang, Tianyi Jiang, Huanchang Ma, Yao Lu, Xiaoze Bao, Shanqing Yu, Qi Xuan, Shirui Pan, Xin Zheng
https://arxiv.org/abs/2510.08900

Few-shot Molecular Property Prediction: A Survey
AI-assisted molecular property prediction has become a promising technique in early-stage drug discovery and materials design in recent years. However, due to high-cost and complex wet-lab experiments, real-world molecules usually experience the issue of scarce annotations, leading to limited labeled data for effective supervised AI model learning. In light of this, few-shot molecular property prediction (FSMPP) has emerged as an expressive paradigm that enables learning from only a few labeled…

@arXiv_csSD_bot@mastoxiv.page
2025-09-16 09:35:47

Improving Out-of-Domain Audio Deepfake Detection via Layer Selection and Fusion of SSL-Based Countermeasures
Pierre Serrano, Rapha\"el Duroselle, Florian Angulo, Jean-Fran\c{c}ois Bonastre, Olivier Boeffard
https://arxiv.org/abs/2509.12003

Improving Out-of-Domain Audio Deepfake Detection via Layer Selection and Fusion of SSL-Based Countermeasures
Audio deepfake detection systems based on frozen pre-trained self-supervised learning (SSL) encoders show a high level of performance when combined with layer-weighted pooling methods, such as multi-head factorized attentive pooling (MHFA). However, they still struggle to generalize to out-of-domain (OOD) conditions. We tackle this problem by studying the behavior of six different pre-trained SSLs, on four different test corpora. We perform a layer-by-layer analysis to determine which layers co…

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 09:59:41

LayerLock: Non-collapsing Representation Learning with Progressive Freezing
Goker Erdogan, Nikhil Parthasarathy, Catalin Ionescu, Drew Hudson, Alexander Lerchner, Andrew Zisserman, Mehdi Sajjadi, Joao Carreira
https://arxiv.org/abs/2509.10156

LayerLock: Non-collapsing Representation Learning with Progressive Freezing
We introduce LayerLock, a simple yet effective approach for self-supervised visual representation learning, that gradually transitions from pixel to latent prediction through progressive layer freezing. First, we make the observation that during training of video masked-autoencoding (MAE) models, ViT layers converge in the order of their depth: shallower layers converge early, deeper layers converge late. We then show that this observation can be exploited to accelerate standard MAE by progress…

@arXiv_csLG_bot@mastoxiv.page
2025-09-15 09:53:01

Data distribution impacts the performance and generalisability of contrastive learning-based foundation models of electrocardiograms
Gul Rukh Khattak, Konstantinos Patlatzoglou, Joseph Barker, Libor Pastika, Boroumand Zeidaabadi, Ahmed El-Medany, Hesham Aggour, Yixiu Liang, Antonio H. Ribeiro, Jeffrey Annis, Antonio Luiz Pinho Ribeiro, Junbo Ge, Daniel B. Kramer, Jonathan W. Waks, Evan Brittain, Nicholas Peters, Fu Siong Ng, Arunashis Sau

Data distribution impacts the performance and generalisability of contrastive learning-based foundation models of electrocardiograms
Contrastive learning is a widely adopted self-supervised pretraining strategy, yet its dependence on cohort composition remains underexplored. We present Contrasting by Patient Augmented Electrocardiograms (CAPE) foundation model and pretrain on four cohorts (n = 5,203,352), from diverse populations across three continents (North America, South America, Asia). We systematically assess how cohort demographics, health status, and population diversity influence the downstream performance for predi…

@arXiv_eessIV_bot@mastoxiv.page
2025-10-15 08:11:01

Normalization-equivariant Diffusion Models: Learning Posterior Samplers From Noisy And Partial Measurements
Brett Levac, Jon Tamir, Marcelo Pereyra, Julian Tachella
https://arxiv.org/abs/2510.11964

Normalization-equivariant Diffusion Models: Learning Posterior Samplers From Noisy And Partial Measurements
Diffusion models (DMs) have rapidly emerged as a powerful framework for image generation and restoration. However, existing DMs are primarily trained in a supervised manner by using a large corpus of clean images. This reliance on clean data poses fundamental challenges in many real-world scenarios, where acquiring noise-free data is hard or infeasible, and only noisy and potentially incomplete measurements are available. While some methods can train DMs using noisy data, they are generally eff…

@arXiv_eessAS_bot@mastoxiv.page
2025-08-15 08:05:12

Layer-Wise Analysis of Self-Supervised Representations for Age and Gender Classification in Children's Speech
Abhijit Sinha, Harishankar Kumar, Mohit Joshi, Hemant Kumar Kathania, Shrikanth Narayanan, Sudarsana Reddy Kadiri
https://arxiv.org/abs/2508.10332

Layer-Wise Analysis of Self-Supervised Representations for Age and Gender Classification in Children's Speech
Children's speech presents challenges for age and gender classification due to high variability in pitch, articulation, and developmental traits. While self-supervised learning (SSL) models perform well on adult speech tasks, their ability to encode speaker traits in children remains underexplored. This paper presents a detailed layer-wise analysis of four Wav2Vec2 variants using the PFSTAR and CMU Kids datasets. Results show that early layers (1-7) capture speaker-specific cues more effectivel…

@arXiv_csLG_bot@mastoxiv.page
2025-10-15 10:49:31

SG-XDEAT: Sparsity-Guided Cross-Dimensional and Cross-Encoding Attention with Target-Aware Conditioning in Tabular Learning
Chih-Chuan Cheng, Yi-Ju Tseng
https://arxiv.org/abs/2510.12659

SG-XDEAT: Sparsity-Guided Cross-Dimensional and Cross-Encoding Attention with Target-Aware Conditioning in Tabular Learning
We propose SG-XDEAT (Sparsity-Guided Cross Dimensional and Cross-Encoding Attention with Target Aware Conditioning), a novel framework designed for supervised learning on tabular data. At its core, SG-XDEAT employs a dual-stream encoder that decomposes each input feature into two parallel representations: a raw value stream and a target-conditioned (label-aware) stream. These dual representations are then propagated through a hierarchical stack of attention-based modules. SG-XDEAT integrates th…

@arXiv_csSD_bot@mastoxiv.page
2025-09-16 09:10:06

Acoustic Overspecification in Electronic Dance Music Taxonomy
Weilun Xu, Tianhao Dai, Oscar Goudet, Xiaoxuan Wang
https://arxiv.org/abs/2509.11474 https://…

Acoustic Overspecification in Electronic Dance Music Taxonomy
Electronic Dance Music (EDM) classification typically relies on industry-defined taxonomies with numerous subgenres, yet the acoustic basis for these distinctions remains unclear. Current approaches use supervised learning with prescribed genre labels, assuming their validity without systematic evaluation. In this paper, we propose an unsupervised approach to discover the natural acoustic structure of EDM independent of commercial labels. Our method combines novel tempogram-based features captu…

@arXiv_csSI_bot@mastoxiv.page
2025-08-12 09:23:53

Towards Real-World Rumor Detection: Anomaly Detection Framework with Graph Supervised Contrastive Learning
Chaoqun Cui, Caiyan Jia
https://arxiv.org/abs/2508.07205 https://

Towards Real-World Rumor Detection: Anomaly Detection Framework with Graph Supervised Contrastive Learning
Current rumor detection methods based on propagation structure learning predominately treat rumor detection as a class-balanced classification task on limited labeled data. However, real-world social media data exhibits an imbalanced distribution with a minority of rumors among massive regular posts. To address the data scarcity and imbalance issues, we construct two large-scale conversation datasets from Weibo and Twitter and analyze the domain distributions. We find obvious differences betwee…

@arXiv_csCV_bot@mastoxiv.page
2025-09-16 12:44:27

Multi Anatomy X-Ray Foundation Model
Nishank Singla, Krisztian Koos, Farzin Haddadpour, Amin Honarmandi Shandiz, Lovish Chum, Xiaojian Xu, Qing Jin, Erhan Bas
https://arxiv.org/abs/2509.12146

Multi Anatomy X-Ray Foundation Model
X-ray imaging is a ubiquitous in radiology, yet most existing AI foundation models are limited to chest anatomy and fail to generalize across broader clinical tasks. In this work, we introduce XR-0, the multi-anatomy X-ray foundation model using self-supervised learning on a large, private dataset of 1.15 million images spanning diverse anatomical regions and evaluated across 12 datasets and 20 downstream tasks, including classification, retrieval, segmentation, localization, visual grounding, …

@arXiv_csCR_bot@mastoxiv.page
2025-10-08 10:05:39

PhishSSL: Self-Supervised Contrastive Learning for Phishing Website Detection
Wenhao Li, Selvakumar Manickam, Yung-Wey Chong, Shankar Karuppayah, Priyadarsi Nanda, Binyong Li
https://arxiv.org/abs/2510.05900

PhishSSL: Self-Supervised Contrastive Learning for Phishing Website Detection
Phishing websites remain a persistent cybersecurity threat by mimicking legitimate sites to steal sensitive user information. Existing machine learning-based detection methods often rely on supervised learning with labeled data, which not only incurs substantial annotation costs but also limits adaptability to novel attack patterns. To address these challenges, we propose PhishSSL, a self-supervised contrastive learning framework that eliminates the need for labeled phishing data during trainin…

@arXiv_statML_bot@mastoxiv.page
2025-10-13 09:11:10

Interpretable Generative and Discriminative Learning for Multimodal and Incomplete Clinical Data
Albert Belenguer-Llorens, Carlos Sevilla-Salcedo, Janaina Mourao-Miranda, Vanessa G\'omez-Verdejo
https://arxiv.org/abs/2510.09513

Interpretable Generative and Discriminative Learning for Multimodal and Incomplete Clinical Data
Real-world clinical problems are often characterized by multimodal data, usually associated with incomplete views and limited sample sizes in their cohorts, posing significant limitations for machine learning algorithms. In this work, we propose a Bayesian approach designed to efficiently handle these challenges while providing interpretable solutions. Our approach integrates (1) a generative formulation to capture cross-view relationships with a semi-supervised strategy, and (2) a discriminati…

@arXiv_csCV_bot@mastoxiv.page
2025-09-17 10:50:40

ICDAR 2025 Competition on FEw-Shot Text line segmentation of ancient handwritten documents (FEST)
Silvia Zottin, Axel De Nardin, Giuseppe Branca, Claudio Piciarelli, Gian Luca Foresti
https://arxiv.org/abs/2509.12965

ICDAR 2025 Competition on FEw-Shot Text line segmentation of ancient handwritten documents (FEST)
Text line segmentation is a critical step in handwritten document image analysis. Segmenting text lines in historical handwritten documents, however, presents unique challenges due to irregular handwriting, faded ink, and complex layouts with overlapping lines and non-linear text flow. Furthermore, the scarcity of large annotated datasets renders fully supervised learning approaches impractical for such materials. To address these challenges, we introduce the Few-Shot Text Line Segmentation of …

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 10:32:11

DPO-Tuned Large Language Models for Segmentation in Simultaneous Speech Translation
Zeyu Yang, Satoshi Nakamura
https://arxiv.org/abs/2510.12195 https://ar…

DPO-Tuned Large Language Models for Segmentation in Simultaneous Speech Translation
Simultaneous speech translation requires accurate segmentation to balance translation quality and latency. Recent studies such as SHAS have introduced pretrained segmentation models, achieving stronger performance than heuristic rules. However, segmentation models such as SHAS, though pretrained and more robust than heuristic methods, are still constrained by supervised learning objectives and do not incorporate human preference alignment, which is crucial for natural real-time interpretation. …

@arXiv_eessAS_bot@mastoxiv.page
2025-08-13 09:05:02

Selection of Layers from Self-supervised Learning Models for Predicting Mean-Opinion-Score of Speech
Xinyu Liang, Fredrik Cumlin, Victor Ungureanu, Chandan K. A. Reddy, Christian Schuldt, Saikat Chatterjee
https://arxiv.org/abs/2508.08962

Selection of Layers from Self-supervised Learning Models for Predicting Mean-Opinion-Score of Speech
Self-supervised learning (SSL) models like Wav2Vec2, HuBERT, and WavLM have been widely used in speech processing. These transformer-based models consist of multiple layers, each capturing different levels of representation. While prior studies explored their layer-wise representations for efficiency and performance, speech quality assessment (SQA) models predominantly rely on last-layer features, leaving intermediate layers underexamined. In this work, we systematically evaluate different laye…

@arXiv_csCV_bot@mastoxiv.page
2025-08-15 10:26:12

MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data
Antoine Labatie, Michael Vaccaro, Nina Lardiere, Anatol Garioud, Nicolas Gonthier
https://arxiv.org/abs/2508.10894

MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data
Self-supervised learning holds great promise for remote sensing, but standard self-supervised methods must be adapted to the unique characteristics of Earth observation data. We take a step in this direction by conducting a comprehensive benchmark of fusion strategies and reconstruction target normalization schemes for multimodal, multitemporal, and multispectral Earth observation data. Based on our findings, we propose MAESTRO, a novel adaptation of the Masked Autoencoder, featuring optimized …

@arXiv_quantph_bot@mastoxiv.page
2025-08-05 11:43:51

Enhancement of Quantum Semi-Supervised Learning via Improved Laplacian and Poisson Methods
Hamed Gholipour, Farid Bozorgnia, Hamzeh Mohammadigheymasi, Kailash Hambarde, Javier Mancilla, Hugo Proenca, Joao Neves, Moharram Challenger
https://arxiv.org/abs/2508.02054

Enhancement of Quantum Semi-Supervised Learning via Improved Laplacian and Poisson Methods
This paper develops a hybrid quantum approach for graph-based semi-supervised learning to enhance performance in scenarios where labeled data is scarce. We introduce two enhanced quantum models, the Improved Laplacian Quantum Semi-Supervised Learning (ILQSSL) and the Improved Poisson Quantum Semi-Supervised Learning (IPQSSL), that incorporate advanced label propagation strategies within variational quantum circuits. These models utilize QR decomposition to embed graph structure directly into qu…

@arXiv_csSD_bot@mastoxiv.page
2025-10-14 11:37:28

Automatic Music Sample Identification with Multi-Track Contrastive Learning
Alain Riou, Joan Serr\`a, Yuki Mitsufuji
https://arxiv.org/abs/2510.11507 https://

Automatic Music Sample Identification with Multi-Track Contrastive Learning
Sampling, the technique of reusing pieces of existing audio tracks to create new music content, is a very common practice in modern music production. In this paper, we tackle the challenging task of automatic sample identification, that is, detecting such sampled content and retrieving the material from which it originates. To do so, we adopt a self-supervised learning approach that leverages a multi-track dataset to create positive pairs of artificial mixes, and design a novel contrastive lear…

@arXiv_csRO_bot@mastoxiv.page
2025-09-12 09:37:39

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Haozhan Li, Yuxin Zuo, Jiale Yu, Yuhao Zhang, Zhaohui Yang, Kaiyan Zhang, Xuekai Zhu, Yuchen Zhang, Tianxing Chen, Ganqu Cui, Dehui Wang, Dingxiang Luo, Yuchen Fan, Youbang Sun, Jia Zeng, Jiangmiao Pang, Shanghang Zhang, Yu Wang, Yao Mu, Bowen Zhou, Ning Ding
https://arxiv.org/a…

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Vision-Language-Action (VLA) models have recently emerged as a powerful paradigm for robotic manipulation. Despite substantial progress enabled by large-scale pretraining and supervised fine-tuning (SFT), these models face two fundamental challenges: (i) the scarcity and high cost of large-scale human-operated robotic trajectories required for SFT scaling, and (ii) limited generalization to tasks involving distribution shift. Recent breakthroughs in Large Reasoning Models (LRMs) demonstrate tha…

@arXiv_csCR_bot@mastoxiv.page
2025-10-06 08:09:49

An Investigation into the Performance of Non-Contrastive Self-Supervised Learning Methods for Network Intrusion Detection
Hamed Fard, Tobias Schalau, Gerhard Wunder
https://arxiv.org/abs/2510.02349

An Investigation into the Performance of Non-Contrastive Self-Supervised Learning Methods for Network Intrusion Detection
Network intrusion detection, a well-explored cybersecurity field, has predominantly relied on supervised learning algorithms in the past two decades. However, their limitations in detecting only known anomalies prompt the exploration of alternative approaches. Motivated by the success of self-supervised learning in computer vision, there is a rising interest in adapting this paradigm for network intrusion detection. While prior research mainly delved into contrastive self-supervised methods, th…

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 07:56:01

MammoDINO: Anatomically Aware Self-Supervision for Mammographic Images
Sicheng Zhou, Lei Wu, Cao Xiao, Parminder Bhatia, Taha Kass-Hout
https://arxiv.org/abs/2510.11883 https://…

MammoDINO: Anatomically Aware Self-Supervision for Mammographic Images
Self-supervised learning (SSL) has transformed vision encoder training in general domains but remains underutilized in medical imaging due to limited data and domain specific biases. We present MammoDINO, a novel SSL framework for mammography, pretrained on 1.4 million mammographic images. To capture clinically meaningful features, we introduce a breast tissue aware data augmentation sampler for both image-level and patch-level supervision and a cross-slice contrastive learning objective that l…

@arXiv_csAI_bot@mastoxiv.page
2025-10-14 12:23:38

From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
Beining Wang, Weihang Su, Hongtao Tian, Tao Yang, Yujia Zhou, Ting Yao, Qingyao Ai, Yiqun Liu
https://arxiv.org/abs/2510.11457

From to : Multidimensional Supervision of Reasoning Process for LLM Optimization
Improving the multi-step reasoning ability of Large Language Models (LLMs) is a critical yet challenging task. The dominant paradigm, outcome-supervised reinforcement learning (RLVR), rewards only correct final answers, often propagating flawed reasoning and suffering from sparse reward signals. While process-level reward models (PRMs) provide denser, step-by-step feedback, they lack generalizability and interpretability, requiring task-specific segmentation of the reasoning process. To this en…

@arXiv_eessSP_bot@mastoxiv.page
2025-09-04 09:32:41

Self-supervised Radio Representation Learning: Can we Learn Multiple Tasks?
Ogechukwu Kanu, Ashkan Eshaghbeigi, Hatem Abou-Zeid
https://arxiv.org/abs/2509.03077 https://

Self-supervised Radio Representation Learning: Can we Learn Multiple Tasks?
Artificial intelligence (AI) is anticipated to play a pivotal role in 6G. However, a key challenge in developing AI-powered solutions is the extensive data collection and labeling efforts required to train supervised deep learning models. To overcome this, self-supervised learning (SSL) approaches have recently demonstrated remarkable success across various domains by leveraging large volumes of unlabeled data to achieve near-supervised performance. In this paper, we propose an effective SSL sc…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:09:19

Contrastive Self-Supervised Learning at the Edge: An Energy Perspective
Fernanda Fam\'a, Roberto Pereira, Charalampos Kalalas, Paolo Dini, Lorena Qendro, Fahim Kawsar, Mohammad Malekzadeh
https://arxiv.org/abs/2510.08374

Contrastive Self-Supervised Learning at the Edge: An Energy Perspective
While contrastive learning (CL) shows considerable promise in self-supervised representation learning, its deployment on resource-constrained devices remains largely underexplored. The substantial computational demands required for training conventional CL frameworks pose a set of challenges, particularly in terms of energy consumption, data availability, and memory usage. We conduct an evaluation of four widely used CL frameworks: SimCLR, MoCo, SimSiam, and Barlow Twins. We focus on the practi…

@arXiv_csCV_bot@mastoxiv.page
2025-09-12 10:11:59

Semantic Concentration for Self-Supervised Dense Representations Learning
Peisong Wen, Qianqian Xu, Siran Dai, Runmin Cong, Qingming Huang
https://arxiv.org/abs/2509.09429 https…

Semantic Concentration for Self-Supervised Dense Representations Learning
Recent advances in image-level self-supervised learning (SSL) have made significant progress, yet learning dense representations for patches remains challenging. Mainstream methods encounter an over-dispersion phenomenon that patches from the same instance/category scatter, harming downstream performance on dense tasks. This work reveals that image-level SSL avoids over-dispersion by involving implicit semantic concentration. Specifically, the non-strict spatial alignment ensures intra-instance…

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:40:50

Residual-Informed Learning of Solutions to Algebraic Loops
Felix Brandt, Andreas Heuermann, Philip Hannebohm, Bernhard Bachmann
https://arxiv.org/abs/2510.09317 https://

Residual-Informed Learning of Solutions to Algebraic Loops
This paper presents a residual-informed machine learning approach for replacing algebraic loops in equation-based Modelica models with neural network surrogates. A feedforward neural network is trained using the residual (error) of the algebraic loop directly in its loss function, eliminating the need for a supervised dataset. This training strategy also resolves the issue of ambiguous solutions, allowing the surrogate to converge to a consistent solution rather than averaging multiple valid on…

@arXiv_eessIV_bot@mastoxiv.page
2025-10-08 09:24:29

Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations
Jakub Frac, Alexander Schmatz, Qiang Li, Guido Van Wingen, Shujian Yu
https://arxiv.org/abs/2510.05177

Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies. Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs, which can be problematic for neuroimaging data where defining appropriate contrasts is non-trivial. We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured f…

@arXiv_csCV_bot@mastoxiv.page
2025-08-15 10:24:12

VasoMIM: Vascular Anatomy-Aware Masked Image Modeling for Vessel Segmentation
De-Xing Huang, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Tian-Yu Xiang, Rui-Ze Ma, Nu-Fang Xiao, Zeng-Guang Hou
https://arxiv.org/abs/2508.10794

VasoMIM: Vascular Anatomy-Aware Masked Image Modeling for Vessel Segmentation
Accurate vessel segmentation in X-ray angiograms is crucial for numerous clinical applications. However, the scarcity of annotated data presents a significant challenge, which has driven the adoption of self-supervised learning (SSL) methods such as masked image modeling (MIM) to leverage large-scale unlabeled data for learning transferable representations. Unfortunately, conventional MIM often fails to capture vascular anatomy because of the severe class imbalance between vessel and background…

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 10:44:59

Agent Learning via Early Experience
Kai Zhang, Xiangchao Chen, Bo Liu, Tianci Xue, Zeyi Liao, Zhihan Liu, Xiyao Wang, Yuting Ning, Zhaorun Chen, Xiaohan Fu, Jian Xie, Yuxuan Sun, Boyu Gou, Qi Qi, Zihang Meng, Jianwei Yang, Ning Zhang, Xian Li, Ashish Shah, Dat Huynh, Hengduo Li, Zi Yang, Sara Cao, Lawrence Jang, Shuyan Zhou, Jiacheng Zhu, Huan Sun, Jason Weston, Yu Su, Yifan Wu

Agent Learning via Early Experience
A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current agents rely on supervised fine-tuning on expert data, which is challenging to s…

@arXiv_csLG_bot@mastoxiv.page
2025-08-12 12:06:53

Unsupervised operator learning approach for dissipative equations via Onsager principle
Zhipeng Chang, Zhenye Wen, Xiaofei Zhao
https://arxiv.org/abs/2508.07440 https://

Unsupervised operator learning approach for dissipative equations via Onsager principle
Existing operator learning methods rely on supervised training with high-fidelity simulation data, introducing significant computational cost. In this work, we propose the deep Onsager operator learning (DOOL) method, a novel unsupervised framework for solving dissipative equations. Rooted in the Onsager variational principle (OVP), DOOL trains a deep operator network by directly minimizing the OVP-defined Rayleighian functional, requiring no labeled data, and then proceeds in time explicitly t…

@arXiv_csCV_bot@mastoxiv.page
2025-08-13 10:18:12

Uncertainty-aware Cross-training for Semi-supervised Medical Image Segmentation
Kaiwen Huang, Tao Zhou, Huazhu Fu, Yizhe Zhang, Yi Zhou, Xiao-Jun Wu
https://arxiv.org/abs/2508.09014

Uncertainty-aware Cross-training for Semi-supervised Medical Image Segmentation
Semi-supervised learning has gained considerable popularity in medical image segmentation tasks due to its capability to reduce reliance on expert-examined annotations. Several mean-teacher (MT) based semi-supervised methods utilize consistency regularization to effectively leverage valuable information from unlabeled data. However, these methods often heavily rely on the student model and overlook the potential impact of cognitive biases within the model. Furthermore, some methods employ co-tr…

@arXiv_csCL_bot@mastoxiv.page
2025-09-12 09:44:09

From scratch to silver: Creating trustworthy training data for patent-SDG classification using Large Language Models
Grazia Sveva Ascione, Nicol\`o Tamagnone
https://arxiv.org/abs/2509.09303

From scratch to silver: Creating trustworthy training data for patent-SDG classification using Large Language Models
Classifying patents by their relevance to the UN Sustainable Development Goals (SDGs) is crucial for tracking how innovation addresses global challenges. However, the absence of a large, labeled dataset limits the use of supervised learning. Existing methods, such as keyword searches, transfer learning, and citation-based heuristics, lack scalability and generalizability. This paper frames patent-to-SDG classification as a weak supervision problem, using citations from patents to SDG-tagged sci…

@arXiv_csLG_bot@mastoxiv.page
2025-09-04 10:27:11

Autonomous Learning From Success and Failure: Goal-Conditioned Supervised Learning with Negative Feedback
Zeqiang Zhang, Fabian Wurzberger, Gerrit Schmid, Sebastian Gottwald, Daniel A. Braun
https://arxiv.org/abs/2509.03206

Autonomous Learning From Success and Failure: Goal-Conditioned Supervised Learning with Negative Feedback
Reinforcement learning faces significant challenges when applied to tasks characterized by sparse reward structures. Although imitation learning, within the domain of supervised learning, offers faster convergence, it relies heavily on human-generated demonstrations. Recently, Goal-Conditioned Supervised Learning (GCSL) has emerged as a potential solution by enabling self-imitation learning for autonomous systems. By strategically relabelling goals, agents can derive policy insights from their …

@arXiv_csSD_bot@mastoxiv.page
2025-09-12 07:42:59

MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection
Zihan Pan, Sailor Hardik Bhupendra, Jinyang Wu
https://arxiv.org/abs/2509.09175 http…

MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection
While self-supervised learning (SSL)-based models have boosted audio deepfake detection accuracy, fully finetuning them is computationally expensive. To address this, we propose a parameter-efficient framework that combines Low-Rank Adaptation with a Mixture-of-Experts router, called Mixture of LoRA Experts (MoLEx). It preserves pre-trained knowledge of SSL models while efficiently finetuning only selected experts, reducing training costs while maintaining robust performance. The observed utili…

@arXiv_csCV_bot@mastoxiv.page
2025-08-11 10:16:39

SPARSE Data, Rich Results: Few-Shot Semi-Supervised Learning via Class-Conditioned Image Translation
Guido Manni, Clemente Lauretti, Loredana Zollo, Paolo Soda
https://arxiv.org/abs/2508.06429

SPARSE Data, Rich Results: Few-Shot Semi-Supervised Learning via Class-Conditioned Image Translation
Deep learning has revolutionized medical imaging, but its effectiveness is severely limited by insufficient labeled training data. This paper introduces a novel GAN-based semi-supervised learning framework specifically designed for low labeled-data regimes, evaluated across settings with 5 to 50 labeled samples per class. Our approach integrates three specialized neural networks -- a generator for class-conditioned image translation, a discriminator for authenticity assessment and classificatio…

@arXiv_csLG_bot@mastoxiv.page
2025-09-15 09:42:11

Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts
Strahinja Nikolic, Ilker Oguz, Demetri Psaltis
https://arxiv.org/abs/2509.10025 https:…

Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts
Understanding the internal organization of neural networks remains a fundamental challenge in deep learning interpretability. We address this challenge by exploring a novel Sparse Mixture of Experts Variational Autoencoder (SMoE-VAE) architecture. We test our model on the QuickDraw dataset, comparing unsupervised expert routing against a supervised baseline guided by ground-truth labels. Surprisingly, we find that unsupervised routing consistently achieves superior reconstruction performance. T…

@arXiv_csCV_bot@mastoxiv.page
2025-09-11 09:52:13

Maximally Useful and Minimally Redundant: The Key to Self Supervised Learning for Imbalanced Data
Yash Kumar Sharma, Vineet Nair, Wilson Naik
https://arxiv.org/abs/2509.08469 ht…

Maximally Useful and Minimally Redundant: The Key to Self Supervised Learning for Imbalanced Data
The robustness of contrastive self-supervised learning (CSSL) for imbalanced datasets is largely unexplored. CSSL usually makes use of \emph{multi-view} assumptions to learn discriminatory features via similar and dissimilar data samples. CSSL works well on balanced datasets, but does not generalize well for imbalanced datasets. In a very recent paper, as part of future work, Yann LeCun pointed out that the self-supervised multiview framework can be extended to cases involving \emph{more than t…

@arXiv_csLG_bot@mastoxiv.page
2025-10-15 10:52:11

Multitask finetuning and acceleration of chemical pretrained models for small molecule drug property prediction
Matthew Adrian, Yunsie Chung, Kevin Boyd, Saee Paliwal, Srimukh Prasad Veccham, Alan C. Cheng
https://arxiv.org/abs/2510.12719

Multitask finetuning and acceleration of chemical pretrained models for small molecule drug property prediction
Chemical pretrained models, sometimes referred to as foundation models, are receiving considerable interest for drug discovery applications. The general chemical knowledge extracted from self-supervised training has the potential to improve predictions for critical drug discovery endpoints, including on-target potency and ADMET properties. Multi-task learning has previously been successfully leveraged to improve predictive models. Here, we show that enabling multitasking in finetuning of chemic…

@arXiv_csCV_bot@mastoxiv.page
2025-08-13 10:19:42

ALFred: An Active Learning Framework for Real-world Semi-supervised Anomaly Detection with Adaptive Thresholds
Shanle Yao, Ghazal Alinezhad Noghre, Armin Danesh Pazho, Hamed Tabkhi
https://arxiv.org/abs/2508.09058

ALFred: An Active Learning Framework for Real-world Semi-supervised Anomaly Detection with Adaptive Thresholds
Video Anomaly Detection (VAD) can play a key role in spotting unusual activities in video footage. VAD is difficult to use in real-world settings due to the dynamic nature of human actions, environmental variations, and domain shifts. Traditional evaluation metrics often prove inadequate for such scenarios, as they rely on static assumptions and fall short of identifying a threshold that distinguishes normal from anomalous behavior in dynamic settings. To address this, we introduce an active le…

@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:27:40

Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation
Wenyao Zhang, Hongsi Liu, Bohan Li, Jiawei He, Zekun Qi, Yunnan Wang, Shengyang Zhao, Xinqiang Yu, Wenjun Zeng, Xin Jin
https://arxiv.org/abs/2510.09320

Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation
Current self-supervised monocular depth estimation (MDE) approaches encounter performance limitations due to insufficient semantic-spatial knowledge extraction. To address this challenge, we propose Hybrid-depth, a novel framework that systematically integrates foundation models (e.g., CLIP and DINO) to extract visual priors and acquire sufficient contextual information for MDE. Our approach introduces a coarse-to-fine progressive learning framework: 1) Firstly, we aggregate multi-grained featu…

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 14:33:53

Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/5]:
- Enhancing Representations through Heterogeneous Self-Supervised Learning
Zhong-Yu Li, Bo-Wen Yin, Yongxiang Liu, Li Liu, Ming-Ming Cheng

@arXiv_csLG_bot@mastoxiv.page
2025-10-09 10:51:31

Bridged Clustering for Representation Learning: Semi-Supervised Sparse Bridging
Patrick Peixuan Ye, Chen Shani, Ellen Vitercik
https://arxiv.org/abs/2510.07182 https://

Bridged Clustering for Representation Learning: Semi-Supervised Sparse Bridging
We introduce Bridged Clustering, a semi-supervised framework to learn predictors from any unpaired input $X$ and output $Y$ dataset. Our method first clusters $X$ and $Y$ independently, then learns a sparse, interpretable bridge between clusters using only a few paired examples. At inference, a new input $x$ is assigned to its nearest input cluster, and the centroid of the linked output cluster is returned as the prediction $\hat{y}$. Unlike traditional SSL, Bridged Clustering explicitly levera…

@arXiv_csCV_bot@mastoxiv.page
2025-09-11 09:19:03

EVDI : Event-based Video Deblurring and Interpolation via Self-Supervised Learning
Chi Zhang, Xiang Zhang, Chenxu Jiang, Gui-Song Xia, Lei Yu
https://arxiv.org/abs/2509.08260 h…

EVDI++: Event-based Video Deblurring and Interpolation via Self-Supervised Learning
Frame-based cameras with extended exposure times often produce perceptible visual blurring and information loss between frames, significantly degrading video quality. To address this challenge, we introduce EVDI++, a unified self-supervised framework for Event-based Video Deblurring and Interpolation that leverages the high temporal resolution of event cameras to mitigate motion blur and enable intermediate frame prediction. Specifically, the Learnable Double Integral (LDI) network is designed …

@arXiv_csLG_bot@mastoxiv.page
2025-09-11 10:15:03

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Zhiheng Xi, Jixuan Huang, Chenyang Liao, Baodai Huang, Honglin Guo, Jiaqi Liu, Rui Zheng, Junjie Ye, Jiazheng Zhang, Wenxiang Chen, Wei He, Yiwen Ding, Guanyu Li, Zehui Chen, Zhengyin Du, Xuesong Yao, Yufei Xu, Jiecao Chen, Tao Gui, Zuxuan Wu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Developing autonomous LLM agents capable of making a series of intelligent decisions to solve complex, real-world tasks is a fast-evolving frontier. Like human cognitive development, agents are expected to acquire knowledge and skills through exploration and interaction with the environment. Despite advances, the community still lacks a unified, interactive reinforcement learning (RL) framework that can effectively train such agents from scratch -- without relying on supervised fine-tuning (SFT…

@arXiv_csCV_bot@mastoxiv.page
2025-08-12 07:50:33

Frequency Prior Guided Matching: A Data Augmentation Approach for Generalizable Semi-Supervised Polyp Segmentation
Haoran Xi, Chen Liu, Xiaolin Li
https://arxiv.org/abs/2508.06517

Frequency Prior Guided Matching: A Data Augmentation Approach for Generalizable Semi-Supervised Polyp Segmentation
Automated polyp segmentation is essential for early diagnosis of colorectal cancer, yet developing robust models remains challenging due to limited annotated data and significant performance degradation under domain shift. Although semi-supervised learning (SSL) reduces annotation requirements, existing methods rely on generic augmentations that ignore polyp-specific structural properties, resulting in poor generalization to new imaging centers and devices. To address this, we introduce Frequen…

@arXiv_csLG_bot@mastoxiv.page
2025-10-09 10:54:31

Discriminative Feature Feedback with General Teacher Classes
Omri Bar Oz, Tosca Lechner, Sivan Sabato
https://arxiv.org/abs/2510.07245 https://arxiv.org/pd…

Discriminative Feature Feedback with General Teacher Classes
We study the theoretical properties of the interactive learning protocol Discriminative Feature Feedback (DFF) (Dasgupta et al., 2018). The DFF learning protocol uses feedback in the form of discriminative feature explanations. We provide the first systematic study of DFF in a general framework that is comparable to that of classical protocols such as supervised learning and online learning. We study the optimal mistake bound of DFF in the realizable and the non-realizable settings, and obtain …

@arXiv_csCV_bot@mastoxiv.page
2025-09-09 12:26:02

Investigating Location-Regularised Self-Supervised Feature Learning for Seafloor Visual Imagery
Cailei Liang, Adrian Bodenmann, Emma J Curtis, Samuel Simmons, Kazunori Nagano, Stan Brown, Adam Riese, Blair Thornton
https://arxiv.org/abs/2509.06660

Investigating Location-Regularised Self-Supervised Feature Learning for Seafloor Visual Imagery
High-throughput interpretation of robotically gathered seafloor visual imagery can increase the efficiency of marine monitoring and exploration. Although recent research has suggested that location metadata can enhance self-supervised feature learning (SSL), its benefits across different SSL strategies, models and seafloor image datasets are underexplored. This study evaluates the impact of location-based regularisation on six state-of-the-art SSL frameworks, which include Convolutional Neural …

@arXiv_csLG_bot@mastoxiv.page
2025-08-12 12:01:23

When Is Prior Knowledge Helpful? Exploring the Evaluation and Selection of Unsupervised Pretext Tasks from a Neuro-Symbolic Perspective
Lin-Han Jia, Si-Yu Han, Wen-Chao Hu, Jie-Jing Shao, Wen-Da Wei, Zhi Zhou, Lan-Zhe Guo, Yu-Feng Li
https://arxiv.org/abs/2508.07299

When Is Prior Knowledge Helpful? Exploring the Evaluation and Selection of Unsupervised Pretext Tasks from a Neuro-Symbolic Perspective
Neuro-symbolic (Nesy) learning improves the target task performance of models by enabling them to satisfy knowledge, while semi/self-supervised learning (SSL) improves the target task performance by designing unsupervised pretext tasks for unlabeled data to make models satisfy corresponding assumptions. We extend the Nesy theory based on reliable knowledge to the scenario of unreliable knowledge (i.e., assumptions), thereby unifying the theoretical frameworks of SSL and Nesy. Through rigorous t…

@arXiv_csCV_bot@mastoxiv.page
2025-09-08 09:55:00

SL-SLR: Self-Supervised Representation Learning for Sign Language Recognition
Ariel Basso Madjoukeng, J\'er\^ome Fink, Pierre Poitier, Edith Belise Kenmogne, Benoit Frenay
https://arxiv.org/abs/2509.05188

SL-SLR: Self-Supervised Representation Learning for Sign Language Recognition
Sign language recognition (SLR) is a machine learning task aiming to identify signs in videos. Due to the scarcity of annotated data, unsupervised methods like contrastive learning have become promising in this field. They learn meaningful representations by pulling positive pairs (two augmented versions of the same instance) closer and pushing negative pairs (different from the positive pairs) apart. In SLR, in a sign video, only certain parts provide information that is truly useful for its r…

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:43:00

HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness
Xinyi Wang, Jinyi Han, Zishang Jiang, Tingyun Li, Jiaqing Liang, Sihang Jiang, Zhaoqian Dai, Shuguang Ma, Fei Yu, Yanghua Xiao
https://arxiv.org/abs/2510.09388

HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness
Reinforcement Learning (RL) has become a key driver for enhancing the long chain-of-thought (CoT) reasoning capabilities of Large Language Models (LLMs). However, prevalent methods like GRPO often fail when task difficulty exceeds the model's capacity, leading to reward sparsity and inefficient training. While prior work attempts to mitigate this using off-policy data, such as mixing RL with Supervised Fine-Tuning (SFT) or using hints, they often misguide policy updates In this work, we identif…

@arXiv_csCV_bot@mastoxiv.page
2025-08-12 12:46:03

MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision
Zhonghao Yan, Muxi Diao, Yuxuan Yang, Jiayuan Xu, Kaizhou Zhang, Ruoyan Jing, Lele Yang, Yanxi Liu, Kongming Liang, Zhanyu Ma
https://arxiv.org/abs/2508.08177

MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision
Accurately grounding regions of interest (ROIs) is critical for diagnosis and treatment planning in medical imaging. While multimodal large language models (MLLMs) combine visual perception with natural language, current medical-grounding pipelines still rely on supervised fine-tuning with explicit spatial hints, making them ill-equipped to handle the implicit queries common in clinical practice. This work makes three core contributions. We first define Unified Medical Reasoning Grounding (UMRG…

@arXiv_csCV_bot@mastoxiv.page
2025-10-09 10:05:11

Self-supervised Physics-guided Model with Implicit Representation Regularization for Fast MRI Reconstruction
Jingran Xu, Yuanyuan Liu, Yanjie Zhu
https://arxiv.org/abs/2510.06611

Self-supervised Physics-guided Model with Implicit Representation Regularization for Fast MRI Reconstruction
Magnetic Resonance Imaging (MRI) is a vital clinical diagnostic tool, yet its widespread application is limited by prolonged scan times. Fast MRI reconstruction techniques effectively reduce acquisition duration by reconstructing high-fidelity MR images from undersampled k-space data. In recent years, deep learning-based methods have demonstrated remarkable progress in this field, with self-supervised and unsupervised learning approaches proving particularly valuable in scenarios where fully sa…

@arXiv_csLG_bot@mastoxiv.page
2025-09-10 13:50:13

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[4/5]:
- SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning
Yu Zhang, Yuqi Xie, Huihan Liu, Rutav Shah, Michael Wan, Linxi Fan, Yuke Zhu

@arXiv_csCV_bot@mastoxiv.page
2025-10-02 10:55:11

Equivariant Splitting: Self-supervised learning from incomplete data
Victor Sechaud, J\'er\'emy Scanvic, Quentin Barth\'elemy, Patrice Abry, Juli\'an Tachella
https://arxiv.org/abs/2510.00929

Equivariant Splitting: Self-supervised learning from incomplete data
Self-supervised learning for inverse problems allows to train a reconstruction network from noise and/or incomplete data alone. These methods have the potential of enabling learning-based solutions when obtaining ground-truth references for training is expensive or even impossible. In this paper, we propose a new self-supervised learning strategy devised for the challenging setting where measurements are observed via a single incomplete observation model. We introduce a new definition of equiva…

@arXiv_csCV_bot@mastoxiv.page
2025-08-12 12:46:33

RedDino: A foundation model for red blood cell analysis
Luca Zedda, Andrea Loddo, Cecilia Di Ruberto, Carsten Marr
https://arxiv.org/abs/2508.08180 https://

RedDino: A foundation model for red blood cell analysis
Red blood cells (RBCs) are essential to human health, and their precise morphological analysis is important for diagnosing hematological disorders. Despite the promise of foundation models in medical diagnostics, comprehensive AI solutions for RBC analysis remain scarce. We present RedDino, a self-supervised foundation model designed for RBC image analysis. RedDino uses an RBC-specific adaptation of the DINOv2 self-supervised learning framework and is trained on a curated dataset of 1.25 millio…

@arXiv_csCV_bot@mastoxiv.page
2025-10-09 10:44:31

Resolution scaling governs DINOv3 transfer performance in chest radiograph classification
Soroosh Tayebi Arasteh, Mina Shaigan, Christiane Kuhl, Jakob Nikolas Kather, Sven Nebelung, Daniel Truhn
https://arxiv.org/abs/2510.07191

Resolution scaling governs DINOv3 transfer performance in chest radiograph classification
Self-supervised learning (SSL) has advanced visual representation learning, but its value in chest radiography, a high-volume imaging modality with fine-grained findings, remains unclear. Meta's DINOv3 extends earlier SSL models through Gram-anchored self-distillation. Whether these design choices improve transfer learning for chest radiography has not been systematically tested. We benchmarked DINOv3 against DINOv2 and ImageNet initialization across seven datasets (n>814,000). Two representati…

@arXiv_csCV_bot@mastoxiv.page
2025-08-08 10:29:02

Test-Time Reinforcement Learning for GUI Grounding via Region Consistency
Yong Du, Yuchen Yan, Fei Tang, Zhengxi Lu, Chang Zong, Weiming Lu, Shengpei Jiang, Yongliang Shen
https://arxiv.org/abs/2508.05615

Test-Time Reinforcement Learning for GUI Grounding via Region Consistency
Graphical User Interface (GUI) grounding, the task of mapping natural language instructions to precise screen coordinates, is fundamental to autonomous GUI agents. While existing methods achieve strong performance through extensive supervised training or reinforcement learning with labeled rewards, they remain constrained by the cost and availability of pixel-level annotations. We observe that when models generate multiple predictions for the same GUI element, the spatial overlap patterns revea…

Tootfinder

Opt-in global Mastodon full text search. Join the index!