Tootfinder

@arXiv_csLG_bot@mastoxiv.page
2025-09-12 09:19:09

Fast attention mechanisms: a tale of parallelism
Jingwen Liu, Hantao Yu, Clayton Sanford, Alexandr Andoni, Daniel Hsu
https://arxiv.org/abs/2509.09001 https://

Fast attention mechanisms: a tale of parallelism
Transformers have the representational capacity to simulate Massively Parallel Computation (MPC) algorithms, but they suffer from quadratic time complexity, which severely limits their scalability. We introduce an efficient attention mechanism called Approximate Nearest Neighbor Attention (ANNA) with sub-quadratic time complexity. We prove that ANNA-transformers (1) retain the expressive power previously established for standard attention in terms of matching the capabilities of MPC algorithms,…

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 09:54:21

Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation
Nakyung Lee, Yeongoon Kim, Minhae Oh, Suhwan Kim, Jin Woo Koo, Hyewon Jo, Jungwoo Lee
https://arxiv.org/abs/2509.07324

Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation
Transformer-based self-attention mechanism serves as the core of modern language models, yet it often suffers from localization, where attentions collapse onto a limited subset of tokens and fail to capture long-range dependencies. To address this issue, we propose Self-Attention One-step Belief Propagation (SAOBP), a refinement framework that injects multi-hop relationships through a belief propagation process. To interpret and quantify these interactions, we introduce Global Token Dependency …

@arXiv_csSD_bot@mastoxiv.page
2025-09-12 08:30:29

Efficient Transformer-Based Piano Transcription With Sparse Attention Mechanisms
Weixing Wei, Kazuyoshi Yoshii
https://arxiv.org/abs/2509.09318 https://arx…

Efficient Transformer-Based Piano Transcription With Sparse Attention Mechanisms
This paper investigates automatic piano transcription based on computationally-efficient yet high-performant variants of the Transformer that can capture longer-term dependency over the whole musical piece. Recently, transformer-based sequence-to-sequence models have demonstrated excellent performance in piano transcription. These models, however, fail to deal with the whole piece at once due to the quadratic complexity of the self-attention mechanism, and music signals are thus typically proce…

@arXiv_csCV_bot@mastoxiv.page
2025-10-10 11:11:09

Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning
Andrew Lee, Ian Chuang, Dechen Gao, Kai Fukazawa, Iman Soltani
https://arxiv.org/abs/2510.08442

Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning
Visual Reinforcement Learning (RL) agents must learn to act based on high-dimensional image data where only a small fraction of the pixels is task-relevant. This forces agents to waste exploration and computational resources on irrelevant features, leading to sample-inefficient and unstable learning. To address this, inspired by human visual foveation, we introduce Gaze on the Prize. This framework augments visual RL with a learnable foveal attention mechanism (Gaze), guided by a self-supervise…

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 09:11:11

Causal Attention with Lookahead Keys
Zhuoqing Song, Peng Sun, Huizhuo Yuan, Quanquan Gu
https://arxiv.org/abs/2509.07301 https://arxiv.org/pdf/2509.07301…

Causal Attention with Lookahead Keys
In standard causal attention, each token's query, key, and value (QKV) are static and encode only preceding context. We introduce CAuSal aTtention with Lookahead kEys (CASTLE), an attention mechanism that continually updates each token's keys as the context unfolds. We term these updated keys lookahead keys because they belong to earlier positions yet integrate information from tokens that appear later relative to those positions, while strictly preserving the autoregressive property. Although …

@arXiv_csLG_bot@mastoxiv.page
2025-10-09 10:37:21

Grouped Differential Attention
Junghwan Lim, Sungmin Lee, Dongseok Kim, Wai Ting Cheung, Beomgyu Kim, Taehwan Kim, Haesol Lee, Junhyeok Lee, Dongpin Oh, Eunhwan Park
https://arxiv.org/abs/2510.06949

Grouped Differential Attention
The self-attention mechanism, while foundational to modern Transformer architectures, suffers from a critical inefficiency: it frequently allocates substantial attention to redundant or noisy context. Differential Attention addressed this by using subtractive attention maps for signal and noise, but its required balanced head allocation imposes rigid constraints on representational flexibility and scalability. To overcome this, we propose Grouped Differential Attention (GDA), a novel approach…

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-10-10 08:21:39

Attention to Order: Transformers Discover Phase Transitions via Learnability
\c{S}ener \"Oz\"onder
https://arxiv.org/abs/2510.07401 https://arxiv…

Attention to Order: Transformers Discover Phase Transitions via Learnability
Phase transitions mark qualitative reorganizations of collective behavior, yet identifying their boundaries remains challenging whenever analytic solutions are absent and conventional simulations fail. Here we introduce learnability as a universal criterion, defined as the ability of a transformer model containing attention mechanism to extract structure from microscopic states. Using self-supervised learning and Monte Carlo generated configurations of the two-dimensional Ising model, we show t…

@arXiv_csCV_bot@mastoxiv.page
2025-10-09 10:38:51

DADO: A Depth-Attention framework for Object Discovery
Federico Gonzalez, Estefania Talavera, Petia Radeva
https://arxiv.org/abs/2510.07089 https://arxiv.o…

DADO: A Depth-Attention framework for Object Discovery
Unsupervised object discovery, the task of identifying and localizing objects in images without human-annotated labels, remains a significant challenge and a growing focus in computer vision. In this work, we introduce a novel model, DADO (Depth-Attention self-supervised technique for Discovering unseen Objects), which combines an attention mechanism and a depth model to identify potential objects in images. To address challenges such as noisy attention maps or complex scenes with varying depth…

@arXiv_eessAS_bot@mastoxiv.page
2025-09-11 09:22:43

Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching
Siratish Sakpiboonchit
https://arxiv.org/abs/2509.08696 https://arxi…

Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching
This paper presents a method to accelerate the inference process of diffusion transformer (DiT)-based text-to-speech (TTS) models by applying a selective caching mechanism to transformer layers. Specifically, I integrate SmoothCache into the F5-TTS architecture, focusing on caching outputs of self-attention and feed-forward network layers to reduce redundant computations during the denoising process. A calibration phase is introduced to analyze L1 relative errors between timesteps, guiding the …

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:15:29

In-Context Clustering with Large Language Models
Ying Wang, Mengye Ren, Andrew Gordon Wilson
https://arxiv.org/abs/2510.08466 https://arxiv.org/pdf/2510.08…

In-Context Clustering with Large Language Models
We propose In-Context Clustering (ICC), a flexible LLM-based procedure for clustering data from diverse distributions. Unlike traditional clustering algorithms constrained by predefined similarity measures, ICC flexibly captures complex relationships among inputs through an attention mechanism. We show that pretrained LLMs exhibit impressive zero-shot clustering capabilities on text-encoded numeric data, with attention matrices showing salient cluster patterns. Spectral clustering using attenti…

@arXiv_csAI_bot@mastoxiv.page
2025-10-01 11:45:57

HilbertA: Hilbert Attention for Image Generation with Diffusion Models
Shaoyi Zheng, Wenbo Lu, Yuxuan Xia, Haomin Liu, Shengjie Wang
https://arxiv.org/abs/2509.26538 https://

HilbertA: Hilbert Attention for Image Generation with Diffusion Models
Designing sparse attention for diffusion transformers requires reconciling two-dimensional spatial locality with GPU efficiency, a trade-off that current methods struggle to achieve. Existing approaches enforce two-dimensional spatial locality but often incur uncoalesced memory access. We present HilbertA, a 2D-aware and GPU-efficient sparse attention mechanism. HilbertA reorders image tokens along Hilbert curves to achieve a contiguous memory layout while preserving spatial neighborhoods, and …

@arXiv_csCV_bot@mastoxiv.page
2025-09-09 12:26:52

Cortex-Synth: Differentiable Topology-Aware 3D Skeleton Synthesis with Hierarchical Graph Attention
Mohamed Zayaan S
https://arxiv.org/abs/2509.06705 https://

Cortex-Synth: Differentiable Topology-Aware 3D Skeleton Synthesis with Hierarchical Graph Attention
We present Cortex Synth, a novel end-to-end differentiable framework for joint 3D skeleton geometry and topology synthesis from single 2D images. Our architecture introduces three key innovations: (1) A hierarchical graph attention mechanism with multi-scale skeletal refinement, (2) Differentiable spectral topology optimization via Laplacian eigen decomposition, and (3) Adversarial geometric consistency training for pose structure alignment. The framework integrates four synergistic modules: a …

@arXiv_condmatsuprcon_bot@mastoxiv.page
2025-09-10 08:47:51

Examining density wave correlations in high pressure $\rm{La_3Ni_2O_7}$ through variational Monte Carlo
Yanxin Chen, Haoxiang Chen, Tonghuan Jiang, Ji Chen
https://arxiv.org/abs/2509.07219

Examining density wave correlations in high pressure $\rm{La_3Ni_2O_7}$ through variational Monte Carlo
$\rm La_3Ni_2O_7$, a nickelate compound with a reported superconducting transition temperature of $\rm 80~K$, has attracted significant attention in recent years.Density-wave phenomena arising from strong electron correlations are widely regarded as key to unraveling the superconductivity mechanism, but the ordering and stability of these density waves remain a subject of contention in existing theoretical studies. In this work, we employ the variational Monte Carlo (VMC) method to thoroughly e…

@arXiv_csMA_bot@mastoxiv.page
2025-09-09 08:15:31

Orchestrator: Active Inference for Multi-Agent Systems in Long-Horizon Tasks
Lukas Beckenbauer, Johannes-Lucas Loewe, Ge Zheng, Alexandra Brintrup
https://arxiv.org/abs/2509.05651

Orchestrator: Active Inference for Multi-Agent Systems in Long-Horizon Tasks
Complex, non-linear tasks challenge LLM-enhanced multi-agent systems (MAS) due to partial observability and suboptimal coordination. We propose Orchestrator, a novel MAS framework that leverages attention-inspired self-emergent coordination and reflective benchmarking to optimize global task performance. Orchestrator introduces a monitoring mechanism to track agent-environment dynamics, using active inference benchmarks to optimize system behavior. By tracking agent-to-agent and agent-to-enviro…

@arXiv_csSD_bot@mastoxiv.page
2025-10-10 08:33:28

Personality-Enhanced Multimodal Depression Detection in the Elderly
Honghong Wang, Jing Deng, Rong Zheng
https://arxiv.org/abs/2510.08004 https://arxiv.org…

Personality-Enhanced Multimodal Depression Detection in the Elderly
This paper presents our solution to the Multimodal Personality-aware Depression Detection (MPDD) challenge at ACM MM 2025. We propose a multimodal depression detection model in the Elderly that incorporates personality characteristics. We introduce a multi-feature fusion approach based on a co-attention mechanism to effectively integrate LLDs, MFCCs, and Wav2Vec features in the audio modality. For the video modality, we combine representations extracted from OpenFace, ResNet, and DenseNet to co…

@arXiv_csIR_bot@mastoxiv.page
2025-09-30 10:46:51

Multi-Item-Query Attention for Stable Sequential Recommendation
Mingshi Xu, Haoren Zhu, Wilfred Siu Hung Ng
https://arxiv.org/abs/2509.24424 https://arxiv.…

Multi-Item-Query Attention for Stable Sequential Recommendation
The inherent instability and noise in user interaction data challenge sequential recommendation systems. Prevailing masked attention models, relying on a single query from the most recent item, are sensitive to this noise, reducing prediction reliability. We propose the Multi-Item-Query attention mechanism (MIQ-Attn) to enhance model stability and accuracy. MIQ-Attn constructs multiple diverse query vectors from user interactions, effectively mitigating noise and improving consistency. It is de…

@arXiv_astrophEP_bot@mastoxiv.page
2025-09-08 08:49:40

Identifying Exoplanets with Deep Learning: A CNN and RNN Classifier for Kepler DR25 and Candidate Vetting
Bibin Thomas, Vittal Bhat M, Salman Arafath Mohammed, Abdul Wase Mohammed, Adis Abebaw Dessalegn, Mohit Mittal
https://arxiv.org/abs/2509.04793

Identifying Exoplanets with Deep Learning: A CNN and RNN Classifier for Kepler DR25 and Candidate Vetting
The rapid expansion of exoplanet survey missions such as Kepler, TESS, and the upcoming PLATO mission has generated massive light-curve datasets that challenge traditional vetting pipelines. We introduce a hybrid deep-learning framework that integrates convolutional networks, bidirectional LSTMs, and an attention mechanism to identify planetary transit signals with improved accuracy and interpretability. Trained on Kepler DR25 data, the model achieves F1 = $0.910 \pm 0.008$ (AUC--ROC = $0.984 \…

@arXiv_csMM_bot@mastoxiv.page
2025-09-08 07:46:20

An Emotion Recognition Framework via Cross-modal Alignment of EEG and Eye Movement Data
Jianlu Wang, Yanan Wang, Tong Liu
https://arxiv.org/abs/2509.04938 https://

An Emotion Recognition Framework via Cross-modal Alignment of EEG and Eye Movement Data
Emotion recognition is essential for applications in affective computing and behavioral prediction, but conventional systems relying on single-modality data often fail to capture the complexity of affective states. To address this limitation, we propose an emotion recognition framework that achieves accurate multimodal alignment of Electroencephalogram (EEG) and eye movement data through a hybrid architecture based on cross-modal attention mechanism. Experiments on the SEED-IV dataset demonstra…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:13:29

Synthetic Series-Symbol Data Generation for Time Series Foundation Models
Wenxuan Wang, Kai Wu, Yujian Betterest Li, Dan Wang, Xiaoyu Zhang
https://arxiv.org/abs/2510.08445 http…

Synthetic Series-Symbol Data Generation for Time Series Foundation Models
Foundation models for time series analysis (TSA) have attracted significant attention. However, challenges such as training data scarcity and imbalance continue to hinder their development. Inspired by complex dynamic system theories, we design a series-symbol data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic expressions. To leverage series-symbol data pairs with strong correlations, we develop \texttt{SymTime}, a p…

@arXiv_csAI_bot@mastoxiv.page
2025-10-01 11:28:27

LMILAtt: A Deep Learning Model for Depression Detection from Social Media Users Enhanced by Multi-Instance Learning Based on Attention Mechanism
Yukun Yang
https://arxiv.org/abs/2509.26145

LMILAtt: A Deep Learning Model for Depression Detection from Social Media Users Enhanced by Multi-Instance Learning Based on Attention Mechanism
Depression is a major global public health challenge and its early identification is crucial. Social media data provides a new perspective for depression detection, but existing methods face limitations such as insufficient accuracy, insufficient utilization of time series features, and high annotation costs. To this end, this study proposes the LMILAtt model, which innovatively integrates Long Short-Term Memory autoencoders and attention mechanisms: firstly, the temporal dynamic features of us…

@arXiv_csLG_bot@mastoxiv.page
2025-09-05 10:20:51

Attention as an Adaptive Filter
Peter Racioppo
https://arxiv.org/abs/2509.04154 https://arxiv.org/pdf/2509.04154 …

Attention as an Adaptive Filter
We introduce Adaptive Filter Attention (AFA), a novel attention mechanism that incorporates a learnable dynamics model directly into the computation of attention weights. Rather than comparing queries and keys directly, we model the input sequence as discrete observations of a linear stochastic differential equation (SDE). By imposing a linear dynamics model with simultaneously diagonalizable state matrices and noise covariances, we can make use of a closed-form solution to the differential Lya…

@arXiv_csCL_bot@mastoxiv.page
2025-10-03 10:40:01

Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention
Zhaoxin Feng, Jianfei Ma, Emmanuele Chersoni, Xiaojing Zhao, Xiaoyi Bao
https://arxiv.org/abs/2510.01652

Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention
Autoregressive Large Language Models (LLMs) demonstrate exceptional performance in language understanding and generation. However, their application in text embedding tasks has been relatively slow, along with the analysis of their semantic representation in probing tasks, due to the constraints of the unidirectional attention mechanism. This paper aims to explore whether such constraints can be overcome by enabling bidirectional attention in LLMs. We tested different variants of the Llama ar…

@arXiv_csCV_bot@mastoxiv.page
2025-09-09 12:30:52

BIR-Adapter: A Low-Complexity Diffusion Model Adapter for Blind Image Restoration
Cem Eteke, Alexander Griessel, Wolfgang Kellerer, Eckehard Steinbach
https://arxiv.org/abs/2509.06904

BIR-Adapter: A Low-Complexity Diffusion Model Adapter for Blind Image Restoration
This paper introduces BIR-Adapter, a low-complexity blind image restoration adapter for diffusion models. The BIR-Adapter enables the utilization of the prior of pre-trained large-scale diffusion models on blind image restoration without training any auxiliary feature extractor. We take advantage of the robustness of pretrained models. We extract features from degraded images via the model itself and extend the self-attention mechanism with these degraded features. We introduce a sampling guida…

@arXiv_csIT_bot@mastoxiv.page
2025-09-22 08:37:11

Interplay Between Belief Propagation and Transformer: Differential-Attention Message Passing Transformer
Chin Wa Lau, Xiang Shi, Ziyan Zheng, Haiwen Cao, Nian Guo
https://arxiv.org/abs/2509.15637

Interplay Between Belief Propagation and Transformer: Differential-Attention Message Passing Transformer
Transformer-based neural decoders have emerged as a promising approach to error correction coding, combining data-driven adaptability with efficient modeling of long-range dependencies. This paper presents a novel decoder architecture that integrates classical belief propagation principles with transformer designs. We introduce a differentiable syndrome loss function leveraging global codebook structure and a differential-attention mechanism optimizing bit and syndrome embedding interactions. E…

@arXiv_csSD_bot@mastoxiv.page
2025-10-01 09:43:38

The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures
Andrea Diecidue, Carlo Alberto Barbano, Piero Fraternali, Mathieu Fontaine, Enzo Tartaglione
https://arxiv.org/abs/2509.26207

The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures
Transformer-based models have become the state of the art across multiple domains, from natural language processing to machine listening, thanks to attention mechanisms. However, the attention layers require a large number of parameters and high-end hardware for both training and inference. We propose a novel pruning technique targeted explicitly at the attention mechanism, where we decouple the pruning of the four layers in the attention block, namely: query, keys, values and outputs' projecti…

@arXiv_csCE_bot@mastoxiv.page
2025-09-22 07:31:21

SPH-Net: A Co-Attention Hybrid Model for Accurate Stock Price Prediction
Yiyang Wu, Hanyu Ma, Muxin Ge, Xiaoli Ma, Yadi Liu, Ye Aung Moe, Zeyu Han, Weizheng Xie
https://arxiv.org/abs/2509.15414

SPH-Net: A Co-Attention Hybrid Model for Accurate Stock Price Prediction
Prediction of stock price movements presents a formidable challenge in financial analytics due to the inherent volatility, non-stationarity, and nonlinear characteristics of market data. This paper introduces SPH-Net (Stock Price Prediction Hybrid Neural Network), an innovative deep learning framework designed to enhance the accuracy of time series forecasting in financial markets. The proposed architecture employs a novel co-attention mechanism that initially processes temporal patterns throug…

@arXiv_csNI_bot@mastoxiv.page
2025-09-22 08:52:11

Smart Interrupted Routing Based on Multi-head Attention Mask Mechanism-Driven MARL in Software-defined UASNs
Zhenyu Wang, Chuan Lin, Guangjie Han, Shengchao Zhu, Ruoyuan Wu, Tongwei Zhang
https://arxiv.org/abs/2509.15856

Smart Interrupted Routing Based on Multi-head Attention Mask Mechanism-Driven MARL in Software-defined UASNs
Routing-driven timely data collection in Underwater Acoustic Sensor Networks (UASNs) is crucial for marine environmental monitoring, disaster warning and underwater resource exploration, etc. However, harsh underwater conditions, including high delays, limited bandwidth, and dynamic topologies - make efficient routing decisions challenging in UASNs. In this paper, we propose a smart interrupted routing scheme for UASNs to address dynamic underwater challenges. We first model underwater noise in…

@arXiv_csLG_bot@mastoxiv.page
2025-10-07 13:04:42

On Structured State-Space Duality
Jerry Yao-Chieh Hu, Xiwen Zhang, Weimin Wu, Han Liu
https://arxiv.org/abs/2510.04944 https://arxiv.org/pdf/2510.04944

On Structured State-Space Duality
Structured State-Space Duality (SSD) [Dao & Gu, ICML 2024] is an equivalence between a simple Structured State-Space Model (SSM) and a masked attention mechanism. In particular, a state-space model with a scalar-times-identity state matrix is equivalent to a masked self-attention with a $1$-semiseparable causal mask. Consequently, the same sequence transformation (model) has two algorithmic realizations: as a linear-time $O(T)$ recurrence or as a quadratic-time $O(T^2)$ attention. In this note,…

@arXiv_csCV_bot@mastoxiv.page
2025-09-03 15:03:13

Enhancing Fitness Movement Recognition with Attention Mechanism and Pre-Trained Feature Extractors
Shanjid Hasan Nishat, Srabonti Deb, Mohiuddin Ahmed
https://arxiv.org/abs/2509.02511

Enhancing Fitness Movement Recognition with Attention Mechanism and Pre-Trained Feature Extractors
Fitness movement recognition, a focused subdomain of human activity recognition (HAR), plays a vital role in health monitoring, rehabilitation, and personalized fitness training by enabling automated exercise classification from video data. However, many existing deep learning approaches rely on computationally intensive 3D models, limiting their feasibility in real-time or resource-constrained settings. In this paper, we present a lightweight and effective framework that integrates pre-trained…

@arXiv_csCV_bot@mastoxiv.page
2025-08-25 09:56:40

Attention Mechanism in Randomized Time Warping
Yutaro Hiraoka, Kazuya Okamura, Kota Suto, Kazuhiro Fukui
https://arxiv.org/abs/2508.16366 https://arxiv.org…

Attention Mechanism in Randomized Time Warping
This paper reveals that we can interpret the fundamental function of Randomized Time Warping (RTW) as a type of self-attention mechanism, a core technology of Transformers in motion recognition. The self-attention is a mechanism that enables models to identify and weigh the importance of different parts of an input sequential pattern. On the other hand, RTW is a general extension of Dynamic Time Warping (DTW), a technique commonly used for matching and comparing sequential patterns. In essence,…

@arXiv_physicschemph_bot@mastoxiv.page
2025-09-22 08:36:31

DeepMech: A Machine Learning Framework for Chemical Reaction Mechanism Prediction
Manajit Das, Ajnabiul Hoque, Mayank Baranwal, Raghavan B. Sunoj
https://arxiv.org/abs/2509.15872

DeepMech: A Machine Learning Framework for Chemical Reaction Mechanism Prediction
Prediction of complete step-by-step chemical reaction mechanisms (CRMs) remains a major challenge. Whereas the traditional approaches in CRM tasks rely on expert-driven experiments or costly quantum chemical computations, contemporary deep learning (DL) alternatives ignore key intermediates and mechanistic steps and often suffer from hallucinations. We present DeepMech, an interpretable graph-based DL framework employing atom- and bond-level attention, guided by generalized templates of mechani…

@arXiv_eessIV_bot@mastoxiv.page
2025-10-13 08:46:00

Progressive Uncertainty-Guided Evidential U-KAN for Trustworthy Medical Image Segmentation
Zhen Yang, Yansong Ma, Lei Chen
https://arxiv.org/abs/2510.08949 https://

Progressive Uncertainty-Guided Evidential U-KAN for Trustworthy Medical Image Segmentation
Trustworthy medical image segmentation aims at deliver accurate and reliable results for clinical decision-making. Most existing methods adopt the evidence deep learning (EDL) paradigm due to its computational efficiency and theoretical robustness. However, the EDL-based methods often neglect leveraging uncertainty maps rich in attention cues to refine ambiguous boundary segmentation. To address this, we propose a progressive evidence uncertainty guided attention (PEUA) mechanism to guide the m…

@arXiv_csCL_bot@mastoxiv.page
2025-09-23 12:57:41

Cross-Attention is Half Explanation in Speech-to-Text Models
Sara Papi, Dennis Fucci, Marco Gaido, Matteo Negri, Luisa Bentivogli
https://arxiv.org/abs/2509.18010 https://

Cross-Attention is Half Explanation in Speech-to-Text Models
Cross-attention is a core mechanism in encoder-decoder architectures, widespread in many fields, including speech-to-text (S2T) processing. Its scores have been repurposed for various downstream applications--such as timestamp estimation and audio-text alignment--under the assumption that they reflect the dependencies between input speech representation and the generated text. While the explanatory nature of attention mechanisms has been widely debated in the broader NLP literature, this assump…

@arXiv_csIR_bot@mastoxiv.page
2025-10-14 09:15:38

Integrating Structure-Aware Attention and Knowledge Graphs in Explainable Recommendation Systems
Shuangquan Lyu, Ming Wang, Huajun Zhang, Jiasen Zheng, Junjiang Lin, Xiaoxuan Sun
https://arxiv.org/abs/2510.10109

Integrating Structure-Aware Attention and Knowledge Graphs in Explainable Recommendation Systems
This paper designs and implements an explainable recommendation model that integrates knowledge graphs with structure-aware attention mechanisms. The model is built on graph neural networks and incorporates a multi-hop neighbor aggregation strategy. By integrating the structural information of knowledge graphs and dynamically assigning importance to different neighbors through an attention mechanism, the model enhances its ability to capture implicit preference relationships. In the proposed me…

@arXiv_csLG_bot@mastoxiv.page
2025-09-30 14:44:01

High-Dimensional Analysis of Single-Layer Attention for Sparse-Token Classification
Nicholas Barnfield, Hugo Cui, Yue M. Lu
https://arxiv.org/abs/2509.25153 https://

High-Dimensional Analysis of Single-Layer Attention for Sparse-Token Classification
When and how can an attention mechanism learn to selectively attend to informative tokens, thereby enabling detection of weak, rare, and sparsely located features? We address these questions theoretically in a sparse-token classification model in which positive samples embed a weak signal vector in a randomly chosen subset of tokens, whereas negative samples are pure noise. In the long-sequence limit, we show that a simple single-layer attention classifier can in principle achieve vanishing tes…

@arXiv_csAI_bot@mastoxiv.page
2025-10-15 10:10:21

Biased-Attention Guided Risk Prediction for Safe Decision-Making at Unsignalized Intersections
Chengyang Dong, Nan Guo
https://arxiv.org/abs/2510.12428 https://

Biased-Attention Guided Risk Prediction for Safe Decision-Making at Unsignalized Intersections
Autonomous driving decision-making at unsignalized intersections is highly challenging due to complex dynamic interactions and high conflict risks. To achieve proactive safety control, this paper proposes a deep reinforcement learning (DRL) decision-making framework integrated with a biased attention mechanism. The framework is built upon the Soft Actor-Critic (SAC) algorithm. Its core innovation lies in the use of biased attention to construct a traffic risk predictor. This predictor assesses …

@arXiv_csCR_bot@mastoxiv.page
2025-08-14 07:48:32

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference
Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, Zhan Qin
https://arxiv.org/abs/2508.09442

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference
The Key-Value (KV) cache, which stores intermediate attention computations (Key and Value pairs) to avoid redundant calculations, is a fundamental mechanism for accelerating Large Language Model (LLM) inference. However, this efficiency optimization introduces significant yet underexplored privacy risks. This paper provides the first comprehensive analysis of these vulnerabilities, demonstrating that an attacker can reconstruct sensitive user inputs directly from the KV-cache. We design and imp…

@arXiv_csLG_bot@mastoxiv.page
2025-09-25 10:38:22

Pi-Transformer: A Physics-informed Attention Mechanism for Time Series Anomaly Detection
Sepehr Maleki, Negar Pourmoazemi
https://arxiv.org/abs/2509.19985 https://

Pi-Transformer: A Physics-informed Attention Mechanism for Time Series Anomaly Detection
Anomalies in multivariate time series often arise from temporal context and cross-channel coordination rather than isolated outliers. We present Pi-Transformer, a physics-informed transformer with two attention pathways: a data-driven series attention and a smoothly evolving prior attention that encodes temporal invariants such as scale-related self-similarity and phase synchrony. The prior acts as a stable reference that calibrates reconstruction error. During training, we pair a reconstructio…

@arXiv_physicsfludyn_bot@mastoxiv.page
2025-09-19 08:54:31

On the algebraic stretching dynamics of variable-density mixing in shock-bubble interaction
Xu Han, Bin Yu, Hong Liu
https://arxiv.org/abs/2509.14607 https://

On the algebraic stretching dynamics of variable-density mixing in shock-bubble interaction
The mixing mechanism within a single-vortex has been a theoretical focus for decades, while remains unclear especially under variable-density (VD) scenario. This study investigates canonical single-vortex VD mixing in shock-bubble interactions (SBI) through high-resolution numerical simulations. Special attention is paid to examine the stretching dynamics and its impact on VD mixing within a single-vortex, and this problem is investigated by quantitatively characterizing the scalar dissipation …

@arXiv_csSD_bot@mastoxiv.page
2025-09-25 08:47:12

Eliminating stability hallucinations in llm-based tts models via attention guidance
ShiMing Wang, ZhiHao Du, Yang Xiang, TianYu Zhao, Han Zhao, Qian Chen, XianGang Li, HanJie Guo, ZhenHua Ling
https://arxiv.org/abs/2509.19852

Eliminating stability hallucinations in llm-based tts models via attention guidance
This paper focuses on resolving stability hallucinations (e.g., repetitive or omitted speech) in LLM-based Text-to-Speech (TTS) models by improving and leveraging the attention mechanism. First, we analyzed the alignment mechanism between text tokens and speech tokens in LLMs. We then proposed a metric termed the Optimal Alignment Score (OAS), which employs the Viterbi algorithm to evaluate text-speech alignment quality. Subsequently, OAS was integrated into the training of CosyVoice2 to assist…

@arXiv_eessSP_bot@mastoxiv.page
2025-10-15 08:27:42

A Deep Multi-Task Learning Approach to Impulsive Noise Parameter Estimation
Abdullahi Mohammad, Bdah Eya, Bassant Selim
https://arxiv.org/abs/2510.12179 https://

A Deep Multi-Task Learning Approach to Impulsive Noise Parameter Estimation
Impulsive noise poses a significant challenge to the reliability of wireless communication systems, necessitating accurate estimation of its statistical parameters for effective mitigation. This paper introduces a multitask learning (MTL) framework based on a CNN-LSTM architecture enhanced with an attention mechanism for the joint estimation of impulsive noise parameters. The proposed model leverages a unified weighted-loss function to enable simultaneous learning of multiple parameters within …

@arXiv_csLG_bot@mastoxiv.page
2025-10-02 11:09:21

Privacy Preserved Federated Learning with Attention-Based Aggregation for Biometric Recognition
Kassahun Azezew, Minyechil Alehegn, Tsega Asresa, Bitew Mekuria, Tizazu Bayh, Ayenew Kassie, Amsalu Tesema, Animut Embiyale
https://arxiv.org/abs/2510.01113

Privacy Preserved Federated Learning with Attention-Based Aggregation for Biometric Recognition
Because biometric data is sensitive, centralized training poses a privacy risk, even though biometric recognition is essential for contemporary applications. Federated learning (FL), which permits decentralized training, provides a privacy-preserving substitute. Conventional FL, however, has trouble with interpretability and heterogeneous data (non-IID). In order to handle non-IID biometric data, this framework adds an attention mechanism at the central server that weights local model updates a…

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-10-15 10:08:21

Self-attention enabled quantum path analysis of high-harmonic generation in solids
Cong Zhao, Xiaozhou Zou
https://arxiv.org/abs/2510.12443 https://arxiv.o…

Self-attention enabled quantum path analysis of high-harmonic generation in solids
High-harmonic generation (HHG) in solids provides a powerful platform to probe ultrafast electron dynamics and interband--intraband coupling. However, disentangling the complex many-body contributions in the HHG spectrum remains challenging. Here we introduce a machine-learning approach based on a Transformer encoder to analyze and reconstruct HHG signals computed from a one-dimensional Kronig--Penney model. The self-attention mechanism inherently highlights correlations between temporal dipole…

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 13:16:18

Deconstructing Attention: Investigating Design Principles for Effective Language Modeling
Huiyin Xue, Nafise Sadat Moosavi, Nikolaos Aletras
https://arxiv.org/abs/2510.11602 htt…

Deconstructing Attention: Investigating Design Principles for Effective Language Modeling
The success of Transformer language models is widely credited to their dot-product attention mechanism, which interweaves a set of key design principles: mixing information across positions (enabling multi-token interactions), sequence-dependent activations (where attention weights adapt to each input), a specific mathematical form (dot-product similarities plus softmax weighting), and coupling of queries and keys to evolving hidden states (grounding attention in the current layer). However, th…

@arXiv_csCV_bot@mastoxiv.page
2025-09-05 10:14:41

TEn-CATS: Text-Enriched Audio-Visual Video Parsing with Multi-Scale Category-Aware Temporal Graph
Yaru Chen, Faegheh Sardari, Peiliang Zhang, Ruohao Guo, Yang Xiang, Zhenbo Li, Wenwu Wang
https://arxiv.org/abs/2509.04086

TEn-CATS: Text-Enriched Audio-Visual Video Parsing with Multi-Scale Category-Aware Temporal Graph
Audio-Visual Video Parsing (AVVP) task aims to identify event categories and their occurrence times in a given video with weakly supervised labels. Existing methods typically fall into two categories: (i) designing enhanced architectures based on attention mechanism for better temporal modeling, and (ii) generating richer pseudo-labels to compensate for the absence of frame-level annotations. However, the first type methods treat noisy segment-level pseudo labels as reliable supervision and the…

@arXiv_csLG_bot@mastoxiv.page
2025-08-29 10:08:31

Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention
Zhongpan Tang
https://arxiv.org/abs/2508.20407 https://

Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention
The Transformer architecture has become a cornerstone of modern artificial intelligence, but its core self-attention mechanism suffers from a complexity bottleneck that scales quadratically with sequence length, severely limiting its application in long-sequence tasks. To address this challenge, existing linear attention methods typically sacrifice model performance by relying on data-agnostic kernel approximations or restrictive context selection. This paper returns to the first principles of …

@arXiv_csLG_bot@mastoxiv.page
2025-09-15 09:56:11

Multipole Semantic Attention: A Fast Approximation of Softmax Attention for Pretraining
Rupert Mitchell, Kristian Kersting
https://arxiv.org/abs/2509.10406 https://

Multipole Semantic Attention: A Fast Approximation of Softmax Attention for Pretraining
We present Multipole Semantic Attention (MuSe), an efficient approximation of softmax attention that combines semantic clustering with multipole expansions from computational physics. Our method addresses the quadratic computational complexity of transformers in the context length by clustering queries and keys separately in their learned representation spaces, enabling a hierarchical two-stage attention mechanism. Unlike prior clustering approaches that group only keys or use unified clusterin…

@arXiv_csLG_bot@mastoxiv.page
2025-09-29 11:34:37

Physics-informed GNN for medium-high voltage AC power flow with edge-aware attention and line search correction operator
Changhun Kim, Timon Conrad, Redwanul Karim, Julian Oelhaf, David Riebesel, Tom\'as Arias-Vergara, Andreas Maier, Johann J\"ager, Siming Bayer
https://arxiv.org/abs/2509.22458

Physics-informed GNN for medium-high voltage AC power flow with edge-aware attention and line search correction operator
Physics-informed graph neural networks (PIGNNs) have emerged as fast AC power-flow solvers that can replace classic Newton--Raphson (NR) solvers, especially when thousands of scenarios must be evaluated. However, current PIGNNs still need accuracy improvements at parity speed; in particular, the physics loss is inoperative at inference, which can deter operational adoption. We address this with PIGNN-Attn-LS, combining an edge-aware attention mechanism that explicitly encodes line physics via p…

@arXiv_csLG_bot@mastoxiv.page
2025-10-06 10:25:29

Signature-Informed Transformer for Asset Allocation
Yoontae Hwang, Stefan Zohren
https://arxiv.org/abs/2510.03129 https://arxiv.org/pdf/2510.03129

Signature-Informed Transformer for Asset Allocation
Robust asset allocation is a key challenge in quantitative finance, where deep-learning forecasters often fail due to objective mismatch and error amplification. We introduce the Signature-Informed Transformer (SIT), a novel framework that learns end-to-end allocation policies by directly optimizing a risk-aware financial objective. SIT's core innovations include path signatures for a rich geometric representation of asset dynamics and a signature-augmented attention mechanism embedding financi…

@arXiv_csCV_bot@mastoxiv.page
2025-08-21 10:13:40

EventSSEG: Event-driven Self-Supervised Segmentation with Probabilistic Attention
Lakshmi Annamalai, Chetan Singh Thakur
https://arxiv.org/abs/2508.14856 https://

EventSSEG: Event-driven Self-Supervised Segmentation with Probabilistic Attention
Road segmentation is pivotal for autonomous vehicles, yet achieving low latency and low compute solutions using frame based cameras remains a challenge. Event cameras offer a promising alternative. To leverage their low power sensing, we introduce EventSSEG, a method for road segmentation that uses event only computing and a probabilistic attention mechanism. Event only computing poses a challenge in transferring pretrained weights from the conventional camera domain, requiring abundant labeled…

@arXiv_csCE_bot@mastoxiv.page
2025-10-15 07:36:21

Agent-Based Simulation of a Financial Market with Large Language Models
Ryuji Hashimoto, Takehiro Takayanagi, Masahiro Suzuki, Kiyoshi Izumi
https://arxiv.org/abs/2510.12189 htt…

Agent-Based Simulation of a Financial Market with Large Language Models
In real-world stock markets, certain chart patterns -- such as price declines near historical highs -- cannot be fully explained by fundamentals alone. These phenomena suggest the presence of path dependence in price formation, where investor decisions are influenced not only by current market conditions but also by the trajectory of prices leading up to the present. Path dependence has drawn attention in behavioral finance as a key mechanism behind such anomalies. One plausible driver of path …

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 10:27:41

Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models
Shihao Ji, Zihui Song, Jiajie Huang
https://arxiv.org/abs/2510.12137

Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models
Large Language Models (LLMs) hallucinate, generating factually incorrect yet confident assertions. We argue this stems from the Transformer's Softmax function, which creates "Artificial Certainty" by collapsing ambiguous attention scores into a single probability distribution, discarding uncertainty information at each layer. To fix this, we introduce the Credal Transformer, which replaces standard attention with a Credal Attention Mechanism (CAM) based on evidential theory. CAM produces a "cre…

@arXiv_csLG_bot@mastoxiv.page
2025-08-15 10:19:32

Natively Trainable Sparse Attention for Hierarchical Point Cloud Datasets
Nicolas Lapautre, Maria Marchenko, Carlos Miguel Pati\~no, Xin Zhou
https://arxiv.org/abs/2508.10758 ht…

Natively Trainable Sparse Attention for Hierarchical Point Cloud Datasets
Unlocking the potential of transformers on datasets of large physical systems depends on overcoming the quadratic scaling of the attention mechanism. This work explores combining the Erwin architecture with the Native Sparse Attention (NSA) mechanism to improve the efficiency and receptive field of transformer models for large-scale physical systems, addressing the challenge of quadratic attention complexity. We adapt the NSA mechanism for non-sequential data, implement the Erwin NSA model, and…

@arXiv_csCV_bot@mastoxiv.page
2025-10-02 10:54:11

Feature Identification for Hierarchical Contrastive Learning
Julius Ott, Nastassia Vysotskaya, Huawei Sun, Lorenzo Servadei, Robert Wille
https://arxiv.org/abs/2510.00837 https:…

Feature Identification for Hierarchical Contrastive Learning
Hierarchical classification is a crucial task in many applications, where objects are organized into multiple levels of categories. However, conventional classification approaches often neglect inherent inter-class relationships at different hierarchy levels, thus missing important supervisory signals. Thus, we propose two novel hierarchical contrastive learning (HMLC) methods. The first, leverages a Gaussian Mixture Model (G-HMLC) and the second uses an attention mechanism to capture hierarchy…

@arXiv_csLG_bot@mastoxiv.page
2025-10-01 11:57:07

TASP: Topology-aware Sequence Parallelism
Yida Wang (Capital Normal University, Infinigence-AI), Ke Hong (Tsinghua University, Infinigence-AI), Xiuhong Li (Infinigence-AI), Yuanchao Xu (Capital Normal University), Wenxun Wang (Tsinghua University), Guohao Dai (Infinigence-AI, Shanghai Jiao Tong University), Yu Wang (Tsinghua University)
https://

TASP: Topology-aware Sequence Parallelism
Long-context large language models (LLMs) face constraints due to the quadratic complexity of the self-attention mechanism. The mainstream sequence parallelism (SP) method, Ring Attention, attempts to solve this by distributing the query into multiple query chunks across accelerators and enable each Q tensor to access all KV tensors from other accelerators via the Ring AllGather communication primitive. However, it exhibits low communication efficiency, restricting its practical applicability. …

@arXiv_csCL_bot@mastoxiv.page
2025-09-19 13:23:51

Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[1/3]:
- Fast Multipole Attention: A Scalable Multilevel Attention Mechanism for Text and Images
Yanming Kang, Giang Tran, Hans De Sterck

@arXiv_csSD_bot@mastoxiv.page
2025-08-28 07:48:40

Infant Cry Detection In Noisy Environment Using Blueprint Separable Convolutions and Time-Frequency Recurrent Neural Network
Haolin Yu, Yanxiong Li
https://arxiv.org/abs/2508.19308

Infant Cry Detection In Noisy Environment Using Blueprint Separable Convolutions and Time-Frequency Recurrent Neural Network
Infant cry detection is a crucial component of baby care system. In this paper, we propose a lightweight and robust method for infant cry detection. The method leverages blueprint separable convolutions to reduce computational complexity, and a time-frequency recurrent neural network for adaptive denoising. The overall framework of the method is structured as a multi-scale convolutional recurrent neural network, which is enhanced by efficient spatial attention mechanism and contrast-aware chann…

@arXiv_csCV_bot@mastoxiv.page
2025-09-30 15:01:16

VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
Zhaozhi Wang, Tong Zhang, Mingyue Guo, Yaowei Wang, Qixiang Ye
https://arxiv.org/abs/2509.25151

VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
Multimodal Large Language Models (MLLMs) have achieved impressive progress in vision-language alignment, yet they remain limited in visual-spatial reasoning. We first identify that this limitation arises from the attention mechanism: visual tokens are overshadowed by language tokens, preventing the model from consistently recognizing the same visual cues across frames. To address this challenge, we draw a novel connection between the self-expressiveness property in sparse subspace clustering an…

@arXiv_eessAS_bot@mastoxiv.page
2025-09-18 09:21:31

Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection
Janne Laakkonen, Ivan Kukanov, Ville Hautam\"aki
https://arxiv.org/abs/2509.13878 https://

Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection
Foundation models such as Wav2Vec2 excel at representation learning in speech tasks, including audio deepfake detection. However, after being fine-tuned on a fixed set of bonafide and spoofed audio clips, they often fail to generalize to novel deepfake methods not represented in training. To address this, we propose a mixture-of-LoRA-experts approach that integrates multiple low-rank adapters (LoRA) into the model's attention layers. A routing mechanism selectively activates specialized experts…

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:12:40

PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting
Tian Sun, Yuqi Chen, Weiwei Sun
https://arxiv.org/abs/2508.13773 https:…

PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting
Long-term time series forecasting (LTSF) is a fundamental task with wide-ranging applications. Although Transformer-based models have made significant breakthroughs in forecasting, their effectiveness for time series forecasting remains debatable. In this paper, we revisit the significance of self-attention and propose a simple yet effective mechanism, Periodic-Nested Group Attention, namely PENGUIN. Our approach highlights the importance of explicitly modeling periodic patterns and incorporati…

@arXiv_csLG_bot@mastoxiv.page
2025-10-02 11:07:11

Random Feature Spiking Neural Networks
Maximilian Gollwitzer, Felix Dietrich
https://arxiv.org/abs/2510.01012 https://arxiv.org/pdf/2510.01012

Random Feature Spiking Neural Networks
Spiking Neural Networks (SNNs) as Machine Learning (ML) models have recently received a lot of attention as a potentially more energy-efficient alternative to conventional Artificial Neural Networks. The non-differentiability and sparsity of the spiking mechanism can make these models very difficult to train with algorithms based on propagating gradients through the spiking non-linearity. We address this problem by adapting the paradigm of Random Feature Methods (RFMs) from Artificial Neural Ne…

@arXiv_csLG_bot@mastoxiv.page
2025-09-26 10:28:01

TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
Ahmet Caner Y\"uz\"ug\"uler, Ahmet \c{C}elik, Jiawei Zhuang, Lukas Cavigelli
https://arxiv.org/abs/2509.21081

TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
Multi-Head Latent Attention (MLA) is a recent attention mechanism adopted in state-of-the-art LLMs such as DeepSeek-v3 and Kimi K2. Thanks to its novel formulation, MLA allows two functionally equivalent but computationally distinct kernel implementations: naive and absorb. While the naive kernels (e.g., FlashAttention) are typically preferred in training and prefill for their computational efficiency, existing decoding kernels (e.g., FlashMLA) rely on the absorb method to minimize HBM bandwidt…

@arXiv_csCL_bot@mastoxiv.page
2025-08-21 09:57:50

Improving in-context learning with a better scoring function
Omar Naim, Swarnadeep Bhar, J\'er\^ome Bolte, Nicholas Asher
https://arxiv.org/abs/2508.14685 https://

Improving in-context learning with a better scoring function
Large language models (LLMs) exhibit a remarkable capacity to learn by analogy, known as in-context learning (ICL). However, recent studies have revealed limitations in this ability. In this paper, we examine these limitations on tasks involving first-order quantifiers such as {\em all} and {\em some}, as well as on ICL with linear functions. We identify Softmax, the scoring function in attention mechanism, as a contributing factor to these constraints. To address this, we propose \textbf{scale…

@arXiv_csSD_bot@mastoxiv.page
2025-08-21 08:58:20

EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement
Bin Wen, Tien-Ping Tan
https://arxiv.org/abs/2508.14525 https://arxi…

EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement
We introduce EffiFusion-GAN (Efficient Fusion Generative Adversarial Network), a lightweight yet powerful model for speech enhancement. The model integrates depthwise separable convolutions within a multi-scale block to capture diverse acoustic features efficiently. An enhanced attention mechanism with dual normalization and residual refinement further improves training stability and convergence. Additionally, dynamic pruning is applied to reduce model size while maintaining performance, making…

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:44:30

Cross-attention Secretly Performs Orthogonal Alignment in Recommendation Models
Hyunin Lee, Yong Zhang, Hoang Vu Nguyen, Xiaoyi Liu, Namyong Park, Christopher Jung, Rong Jin, Yang Wang, Zhigang Wang, Somayeh Sojoudi, Xue Feng
https://arxiv.org/abs/2510.09435

Cross-attention Secretly Performs Orthogonal Alignment in Recommendation Models
Cross-domain sequential recommendation (CDSR) aims to align heterogeneous user behavior sequences collected from different domains. While cross-attention is widely used to enhance alignment and improve recommendation performance, its underlying mechanism is not fully understood. Most researchers interpret cross-attention as residual alignment, where the output is generated by removing redundant and preserving non-redundant information from the query input by referencing another domain data whic…

@arXiv_eessAS_bot@mastoxiv.page
2025-08-13 08:05:32

Joint decoding method for controllable contextual speech recognition based on Speech LLM
Yangui Fang, Jing Peng, Yu Xi, Xu Li, Haoyu Li, Chengwei Zhang, Guohui Zhong, Kai Yu
https://arxiv.org/abs/2508.08585

Joint decoding method for controllable contextual speech recognition based on Speech LLM
Contextual speech recognition refers to the ability to identify preferences for specific content based on contextual information. Recently, leveraging the contextual understanding capabilities of Speech LLM to achieve contextual biasing by injecting contextual information through prompts have emerged as a research hotspot.However, the direct information injection method via prompts relies on the internal attention mechanism of the model, making it impossible to explicitly control the extent of …

@arXiv_csCV_bot@mastoxiv.page
2025-08-20 10:17:30

Self-Aware Adaptive Alignment: Enabling Accurate Perception for Intelligent Transportation Systems
Tong Xiang, Hongxia Zhao, Fenghua Zhu, Yuanyuan Chen, Yisheng Lv
https://arxiv.org/abs/2508.13823

Self-Aware Adaptive Alignment: Enabling Accurate Perception for Intelligent Transportation Systems
Achieving top-notch performance in Intelligent Transportation detection is a critical research area. However, many challenges still need to be addressed when it comes to detecting in a cross-domain scenario. In this paper, we propose a Self-Aware Adaptive Alignment (SA3), by leveraging an efficient alignment mechanism and recognition strategy. Our proposed method employs a specified attention-based alignment module trained on source and target domain datasets to guide the image-level features a…

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:08:30

Great GATsBi: Hybrid, Multimodal, Trajectory Forecasting for Bicycles using Anticipation Mechanism
Kevin Riehl, Shaimaa K. El-Baklish, Anastasios Kouvelas, Michail A. Makridis
https://arxiv.org/abs/2508.14523

Great GATsBi: Hybrid, Multimodal, Trajectory Forecasting for Bicycles using Anticipation Mechanism
Accurate prediction of road user movement is increasingly required by many applications ranging from advanced driver assistance systems to autonomous driving, and especially crucial for road safety. Even though most traffic accident fatalities account to bicycles, they have received little attention, as previous work focused mainly on pedestrians and motorized vehicles. In this work, we present the Great GATsBi, a domain-knowledge-based, hybrid, multimodal trajectory prediction framework for bi…

@arXiv_csSD_bot@mastoxiv.page
2025-09-17 09:27:30

Timbre-Adaptive Transcription: A Lightweight Architecture with Associative Memory for Dynamic Instrument Separation
Ruigang Li, Yongxu Zhu
https://arxiv.org/abs/2509.12712 https…

Timbre-Adaptive Transcription: A Lightweight Architecture with Associative Memory for Dynamic Instrument Separation
Existing multi-timbre transcription models struggle with generalization beyond pre-trained instruments and rigid source-count constraints. We address these limitations with a lightweight deep clustering solution featuring: 1) a timbre-agnostic backbone achieving state-of-the-art performance with only half the parameters of comparable models, and 2) a novel associative memory mechanism that mimics human auditory cognition to dynamically encode unseen timbres via attention-based clustering. Our b…

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 12:01:10

Crosslisted article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[1/2]:
- Limitations of Normalization in Attention Mechanism
Timur Mudarisov, Mikhail Burtsev, Tatiana Petrova, Radu State

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:29:50

Mask Tokens as Prophet: Fine-Grained Cache Eviction for Efficient dLLM Inference
Jianuo Huang, Yaojie Zhang, Yicun Yang, Benhao Huang, Biqing Qi, Dongrui Liu, Linfeng Zhang
https://arxiv.org/abs/2510.09309

Mask Tokens as Prophet: Fine-Grained Cache Eviction for Efficient dLLM Inference
Diffusion large language models (dLLMs) present a promising alternative to dominant autoregressive models (ARMs) by the ability of parallel decoding at the expense of substantial computation and memory costs. Specifically, the cache mechanism for bidirectional attention in dLLMs demands large memory footprint, restricting their ability to handle long contexts under resource-limited settings. Existing cache eviction strategies are designed for ARMs and ignore the unique characteristics of dLLMs,…

@arXiv_csCV_bot@mastoxiv.page
2025-09-17 10:58:10

Vi-SAFE: A Spatial-Temporal Framework for Efficient Violence Detection in Public Surveillance
Ligang Chang, Shengkai Xu, Liangchang Shen, Binhan Xu, Junqiao Wang, Tianyu Shi, Yanhui Du
https://arxiv.org/abs/2509.13210

Vi-SAFE: A Spatial-Temporal Framework for Efficient Violence Detection in Public Surveillance
Violence detection in public surveillance is critical for public safety. This study addresses challenges such as small-scale targets, complex environments, and real-time temporal analysis. We propose Vi-SAFE, a spatial-temporal framework that integrates an enhanced YOLOv8 with a Temporal Segment Network (TSN) for video surveillance. The YOLOv8 model is optimized with GhostNetV3 as a lightweight backbone, an exponential moving average (EMA) attention mechanism, and pruning to reduce computationa…

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:08:10

Artificial Intelligence-Based Multiscale Temporal Modeling for Anomaly Detection in Cloud Services
Lian Lian, Yilin Li, Song Han, Renzi Meng, Sibo Wang, Ming Wang
https://arxiv.org/abs/2508.14503

Artificial Intelligence-Based Multiscale Temporal Modeling for Anomaly Detection in Cloud Services
This study proposes an anomaly detection method based on the Transformer architecture with integrated multiscale feature perception, aiming to address the limitations of temporal modeling and scale-aware feature representation in cloud service environments. The method first employs an improved Transformer module to perform temporal modeling on high-dimensional monitoring data, using a self-attention mechanism to capture long-range dependencies and contextual semantics. Then, a multiscale featur…

Tootfinder

Opt-in global Mastodon full text search. Join the index!