2025-09-12 09:19:09
Fast attention mechanisms: a tale of parallelism
Jingwen Liu, Hantao Yu, Clayton Sanford, Alexandr Andoni, Daniel Hsu
https://arxiv.org/abs/2509.09001 https://
Fast attention mechanisms: a tale of parallelism
Jingwen Liu, Hantao Yu, Clayton Sanford, Alexandr Andoni, Daniel Hsu
https://arxiv.org/abs/2509.09001 https://
Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation
Nakyung Lee, Yeongoon Kim, Minhae Oh, Suhwan Kim, Jin Woo Koo, Hyewon Jo, Jungwoo Lee
https://arxiv.org/abs/2509.07324
Efficient Transformer-Based Piano Transcription With Sparse Attention Mechanisms
Weixing Wei, Kazuyoshi Yoshii
https://arxiv.org/abs/2509.09318 https://arx…
Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning
Andrew Lee, Ian Chuang, Dechen Gao, Kai Fukazawa, Iman Soltani
https://arxiv.org/abs/2510.08442
Causal Attention with Lookahead Keys
Zhuoqing Song, Peng Sun, Huizhuo Yuan, Quanquan Gu
https://arxiv.org/abs/2509.07301 https://arxiv.org/pdf/2509.07301…
Grouped Differential Attention
Junghwan Lim, Sungmin Lee, Dongseok Kim, Wai Ting Cheung, Beomgyu Kim, Taehwan Kim, Haesol Lee, Junhyeok Lee, Dongpin Oh, Eunhwan Park
https://arxiv.org/abs/2510.06949
Attention to Order: Transformers Discover Phase Transitions via Learnability
\c{S}ener \"Oz\"onder
https://arxiv.org/abs/2510.07401 https://arxiv…
DADO: A Depth-Attention framework for Object Discovery
Federico Gonzalez, Estefania Talavera, Petia Radeva
https://arxiv.org/abs/2510.07089 https://arxiv.o…
Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching
Siratish Sakpiboonchit
https://arxiv.org/abs/2509.08696 https://arxi…
In-Context Clustering with Large Language Models
Ying Wang, Mengye Ren, Andrew Gordon Wilson
https://arxiv.org/abs/2510.08466 https://arxiv.org/pdf/2510.08…
HilbertA: Hilbert Attention for Image Generation with Diffusion Models
Shaoyi Zheng, Wenbo Lu, Yuxuan Xia, Haomin Liu, Shengjie Wang
https://arxiv.org/abs/2509.26538 https://
Cortex-Synth: Differentiable Topology-Aware 3D Skeleton Synthesis with Hierarchical Graph Attention
Mohamed Zayaan S
https://arxiv.org/abs/2509.06705 https://
Examining density wave correlations in high pressure $\rm{La_3Ni_2O_7}$ through variational Monte Carlo
Yanxin Chen, Haoxiang Chen, Tonghuan Jiang, Ji Chen
https://arxiv.org/abs/2509.07219
Orchestrator: Active Inference for Multi-Agent Systems in Long-Horizon Tasks
Lukas Beckenbauer, Johannes-Lucas Loewe, Ge Zheng, Alexandra Brintrup
https://arxiv.org/abs/2509.05651
Personality-Enhanced Multimodal Depression Detection in the Elderly
Honghong Wang, Jing Deng, Rong Zheng
https://arxiv.org/abs/2510.08004 https://arxiv.org…
Multi-Item-Query Attention for Stable Sequential Recommendation
Mingshi Xu, Haoren Zhu, Wilfred Siu Hung Ng
https://arxiv.org/abs/2509.24424 https://arxiv.…
Identifying Exoplanets with Deep Learning: A CNN and RNN Classifier for Kepler DR25 and Candidate Vetting
Bibin Thomas, Vittal Bhat M, Salman Arafath Mohammed, Abdul Wase Mohammed, Adis Abebaw Dessalegn, Mohit Mittal
https://arxiv.org/abs/2509.04793
An Emotion Recognition Framework via Cross-modal Alignment of EEG and Eye Movement Data
Jianlu Wang, Yanan Wang, Tong Liu
https://arxiv.org/abs/2509.04938 https://
Synthetic Series-Symbol Data Generation for Time Series Foundation Models
Wenxuan Wang, Kai Wu, Yujian Betterest Li, Dan Wang, Xiaoyu Zhang
https://arxiv.org/abs/2510.08445 http…
LMILAtt: A Deep Learning Model for Depression Detection from Social Media Users Enhanced by Multi-Instance Learning Based on Attention Mechanism
Yukun Yang
https://arxiv.org/abs/2509.26145
Attention as an Adaptive Filter
Peter Racioppo
https://arxiv.org/abs/2509.04154 https://arxiv.org/pdf/2509.04154 …
Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention
Zhaoxin Feng, Jianfei Ma, Emmanuele Chersoni, Xiaojing Zhao, Xiaoyi Bao
https://arxiv.org/abs/2510.01652
BIR-Adapter: A Low-Complexity Diffusion Model Adapter for Blind Image Restoration
Cem Eteke, Alexander Griessel, Wolfgang Kellerer, Eckehard Steinbach
https://arxiv.org/abs/2509.06904
Interplay Between Belief Propagation and Transformer: Differential-Attention Message Passing Transformer
Chin Wa Lau, Xiang Shi, Ziyan Zheng, Haiwen Cao, Nian Guo
https://arxiv.org/abs/2509.15637
The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures
Andrea Diecidue, Carlo Alberto Barbano, Piero Fraternali, Mathieu Fontaine, Enzo Tartaglione
https://arxiv.org/abs/2509.26207
SPH-Net: A Co-Attention Hybrid Model for Accurate Stock Price Prediction
Yiyang Wu, Hanyu Ma, Muxin Ge, Xiaoli Ma, Yadi Liu, Ye Aung Moe, Zeyu Han, Weizheng Xie
https://arxiv.org/abs/2509.15414
Smart Interrupted Routing Based on Multi-head Attention Mask Mechanism-Driven MARL in Software-defined UASNs
Zhenyu Wang, Chuan Lin, Guangjie Han, Shengchao Zhu, Ruoyuan Wu, Tongwei Zhang
https://arxiv.org/abs/2509.15856
On Structured State-Space Duality
Jerry Yao-Chieh Hu, Xiwen Zhang, Weimin Wu, Han Liu
https://arxiv.org/abs/2510.04944 https://arxiv.org/pdf/2510.04944
Enhancing Fitness Movement Recognition with Attention Mechanism and Pre-Trained Feature Extractors
Shanjid Hasan Nishat, Srabonti Deb, Mohiuddin Ahmed
https://arxiv.org/abs/2509.02511
Attention Mechanism in Randomized Time Warping
Yutaro Hiraoka, Kazuya Okamura, Kota Suto, Kazuhiro Fukui
https://arxiv.org/abs/2508.16366 https://arxiv.org…
DeepMech: A Machine Learning Framework for Chemical Reaction Mechanism Prediction
Manajit Das, Ajnabiul Hoque, Mayank Baranwal, Raghavan B. Sunoj
https://arxiv.org/abs/2509.15872
Progressive Uncertainty-Guided Evidential U-KAN for Trustworthy Medical Image Segmentation
Zhen Yang, Yansong Ma, Lei Chen
https://arxiv.org/abs/2510.08949 https://
Cross-Attention is Half Explanation in Speech-to-Text Models
Sara Papi, Dennis Fucci, Marco Gaido, Matteo Negri, Luisa Bentivogli
https://arxiv.org/abs/2509.18010 https://
Integrating Structure-Aware Attention and Knowledge Graphs in Explainable Recommendation Systems
Shuangquan Lyu, Ming Wang, Huajun Zhang, Jiasen Zheng, Junjiang Lin, Xiaoxuan Sun
https://arxiv.org/abs/2510.10109
High-Dimensional Analysis of Single-Layer Attention for Sparse-Token Classification
Nicholas Barnfield, Hugo Cui, Yue M. Lu
https://arxiv.org/abs/2509.25153 https://
Biased-Attention Guided Risk Prediction for Safe Decision-Making at Unsignalized Intersections
Chengyang Dong, Nan Guo
https://arxiv.org/abs/2510.12428 https://
Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference
Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, Zhan Qin
https://arxiv.org/abs/2508.09442
Pi-Transformer: A Physics-informed Attention Mechanism for Time Series Anomaly Detection
Sepehr Maleki, Negar Pourmoazemi
https://arxiv.org/abs/2509.19985 https://
On the algebraic stretching dynamics of variable-density mixing in shock-bubble interaction
Xu Han, Bin Yu, Hong Liu
https://arxiv.org/abs/2509.14607 https://
Eliminating stability hallucinations in llm-based tts models via attention guidance
ShiMing Wang, ZhiHao Du, Yang Xiang, TianYu Zhao, Han Zhao, Qian Chen, XianGang Li, HanJie Guo, ZhenHua Ling
https://arxiv.org/abs/2509.19852
A Deep Multi-Task Learning Approach to Impulsive Noise Parameter Estimation
Abdullahi Mohammad, Bdah Eya, Bassant Selim
https://arxiv.org/abs/2510.12179 https://
Privacy Preserved Federated Learning with Attention-Based Aggregation for Biometric Recognition
Kassahun Azezew, Minyechil Alehegn, Tsega Asresa, Bitew Mekuria, Tizazu Bayh, Ayenew Kassie, Amsalu Tesema, Animut Embiyale
https://arxiv.org/abs/2510.01113
Self-attention enabled quantum path analysis of high-harmonic generation in solids
Cong Zhao, Xiaozhou Zou
https://arxiv.org/abs/2510.12443 https://arxiv.o…
Deconstructing Attention: Investigating Design Principles for Effective Language Modeling
Huiyin Xue, Nafise Sadat Moosavi, Nikolaos Aletras
https://arxiv.org/abs/2510.11602 htt…
TEn-CATS: Text-Enriched Audio-Visual Video Parsing with Multi-Scale Category-Aware Temporal Graph
Yaru Chen, Faegheh Sardari, Peiliang Zhang, Ruohao Guo, Yang Xiang, Zhenbo Li, Wenwu Wang
https://arxiv.org/abs/2509.04086
Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention
Zhongpan Tang
https://arxiv.org/abs/2508.20407 https://
Multipole Semantic Attention: A Fast Approximation of Softmax Attention for Pretraining
Rupert Mitchell, Kristian Kersting
https://arxiv.org/abs/2509.10406 https://
Physics-informed GNN for medium-high voltage AC power flow with edge-aware attention and line search correction operator
Changhun Kim, Timon Conrad, Redwanul Karim, Julian Oelhaf, David Riebesel, Tom\'as Arias-Vergara, Andreas Maier, Johann J\"ager, Siming Bayer
https://arxiv.org/abs/2509.22458
Signature-Informed Transformer for Asset Allocation
Yoontae Hwang, Stefan Zohren
https://arxiv.org/abs/2510.03129 https://arxiv.org/pdf/2510.03129
EventSSEG: Event-driven Self-Supervised Segmentation with Probabilistic Attention
Lakshmi Annamalai, Chetan Singh Thakur
https://arxiv.org/abs/2508.14856 https://
Agent-Based Simulation of a Financial Market with Large Language Models
Ryuji Hashimoto, Takehiro Takayanagi, Masahiro Suzuki, Kiyoshi Izumi
https://arxiv.org/abs/2510.12189 htt…
Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models
Shihao Ji, Zihui Song, Jiajie Huang
https://arxiv.org/abs/2510.12137
Natively Trainable Sparse Attention for Hierarchical Point Cloud Datasets
Nicolas Lapautre, Maria Marchenko, Carlos Miguel Pati\~no, Xin Zhou
https://arxiv.org/abs/2508.10758 ht…
Feature Identification for Hierarchical Contrastive Learning
Julius Ott, Nastassia Vysotskaya, Huawei Sun, Lorenzo Servadei, Robert Wille
https://arxiv.org/abs/2510.00837 https:…
TASP: Topology-aware Sequence Parallelism
Yida Wang (Capital Normal University, Infinigence-AI), Ke Hong (Tsinghua University, Infinigence-AI), Xiuhong Li (Infinigence-AI), Yuanchao Xu (Capital Normal University), Wenxun Wang (Tsinghua University), Guohao Dai (Infinigence-AI, Shanghai Jiao Tong University), Yu Wang (Tsinghua University)
https://
Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[1/3]:
- Fast Multipole Attention: A Scalable Multilevel Attention Mechanism for Text and Images
Yanming Kang, Giang Tran, Hans De Sterck
Infant Cry Detection In Noisy Environment Using Blueprint Separable Convolutions and Time-Frequency Recurrent Neural Network
Haolin Yu, Yanxiong Li
https://arxiv.org/abs/2508.19308
VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
Zhaozhi Wang, Tong Zhang, Mingyue Guo, Yaowei Wang, Qixiang Ye
https://arxiv.org/abs/2509.25151
Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection
Janne Laakkonen, Ivan Kukanov, Ville Hautam\"aki
https://arxiv.org/abs/2509.13878 https://
PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting
Tian Sun, Yuqi Chen, Weiwei Sun
https://arxiv.org/abs/2508.13773 https:…
Random Feature Spiking Neural Networks
Maximilian Gollwitzer, Felix Dietrich
https://arxiv.org/abs/2510.01012 https://arxiv.org/pdf/2510.01012
TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
Ahmet Caner Y\"uz\"ug\"uler, Ahmet \c{C}elik, Jiawei Zhuang, Lukas Cavigelli
https://arxiv.org/abs/2509.21081
Improving in-context learning with a better scoring function
Omar Naim, Swarnadeep Bhar, J\'er\^ome Bolte, Nicholas Asher
https://arxiv.org/abs/2508.14685 https://
EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement
Bin Wen, Tien-Ping Tan
https://arxiv.org/abs/2508.14525 https://arxi…
Cross-attention Secretly Performs Orthogonal Alignment in Recommendation Models
Hyunin Lee, Yong Zhang, Hoang Vu Nguyen, Xiaoyi Liu, Namyong Park, Christopher Jung, Rong Jin, Yang Wang, Zhigang Wang, Somayeh Sojoudi, Xue Feng
https://arxiv.org/abs/2510.09435
Joint decoding method for controllable contextual speech recognition based on Speech LLM
Yangui Fang, Jing Peng, Yu Xi, Xu Li, Haoyu Li, Chengwei Zhang, Guohui Zhong, Kai Yu
https://arxiv.org/abs/2508.08585
Self-Aware Adaptive Alignment: Enabling Accurate Perception for Intelligent Transportation Systems
Tong Xiang, Hongxia Zhao, Fenghua Zhu, Yuanyuan Chen, Yisheng Lv
https://arxiv.org/abs/2508.13823
Great GATsBi: Hybrid, Multimodal, Trajectory Forecasting for Bicycles using Anticipation Mechanism
Kevin Riehl, Shaimaa K. El-Baklish, Anastasios Kouvelas, Michail A. Makridis
https://arxiv.org/abs/2508.14523
Timbre-Adaptive Transcription: A Lightweight Architecture with Associative Memory for Dynamic Instrument Separation
Ruigang Li, Yongxu Zhu
https://arxiv.org/abs/2509.12712 https…
Crosslisted article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[1/2]:
- Limitations of Normalization in Attention Mechanism
Timur Mudarisov, Mikhail Burtsev, Tatiana Petrova, Radu State
Mask Tokens as Prophet: Fine-Grained Cache Eviction for Efficient dLLM Inference
Jianuo Huang, Yaojie Zhang, Yicun Yang, Benhao Huang, Biqing Qi, Dongrui Liu, Linfeng Zhang
https://arxiv.org/abs/2510.09309
Vi-SAFE: A Spatial-Temporal Framework for Efficient Violence Detection in Public Surveillance
Ligang Chang, Shengkai Xu, Liangchang Shen, Binhan Xu, Junqiao Wang, Tianyu Shi, Yanhui Du
https://arxiv.org/abs/2509.13210
Artificial Intelligence-Based Multiscale Temporal Modeling for Anomaly Detection in Cloud Services
Lian Lian, Yilin Li, Song Han, Renzi Meng, Sibo Wang, Ming Wang
https://arxiv.org/abs/2508.14503