2025-10-09 10:37:21
Grouped Differential Attention
Junghwan Lim, Sungmin Lee, Dongseok Kim, Wai Ting Cheung, Beomgyu Kim, Taehwan Kim, Haesol Lee, Junhyeok Lee, Dongpin Oh, Eunhwan Park
https://arxiv.org/abs/2510.06949
Grouped Differential Attention
Junghwan Lim, Sungmin Lee, Dongseok Kim, Wai Ting Cheung, Beomgyu Kim, Taehwan Kim, Haesol Lee, Junhyeok Lee, Dongpin Oh, Eunhwan Park
https://arxiv.org/abs/2510.06949
Deconstructing Attention: Investigating Design Principles for Effective Language Modeling
Huiyin Xue, Nafise Sadat Moosavi, Nikolaos Aletras
https://arxiv.org/abs/2510.11602 htt…
Biased-Attention Guided Risk Prediction for Safe Decision-Making at Unsignalized Intersections
Chengyang Dong, Nan Guo
https://arxiv.org/abs/2510.12428 https://
DADO: A Depth-Attention framework for Object Discovery
Federico Gonzalez, Estefania Talavera, Petia Radeva
https://arxiv.org/abs/2510.07089 https://arxiv.o…
Integrating Structure-Aware Attention and Knowledge Graphs in Explainable Recommendation Systems
Shuangquan Lyu, Ming Wang, Huajun Zhang, Jiasen Zheng, Junjiang Lin, Xiaoxuan Sun
https://arxiv.org/abs/2510.10109
Self-attention enabled quantum path analysis of high-harmonic generation in solids
Cong Zhao, Xiaozhou Zou
https://arxiv.org/abs/2510.12443 https://arxiv.o…
The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures
Andrea Diecidue, Carlo Alberto Barbano, Piero Fraternali, Mathieu Fontaine, Enzo Tartaglione
https://arxiv.org/abs/2509.26207
Cross-attention Secretly Performs Orthogonal Alignment in Recommendation Models
Hyunin Lee, Yong Zhang, Hoang Vu Nguyen, Xiaoyi Liu, Namyong Park, Christopher Jung, Rong Jin, Yang Wang, Zhigang Wang, Somayeh Sojoudi, Xue Feng
https://arxiv.org/abs/2510.09435
Progressive Uncertainty-Guided Evidential U-KAN for Trustworthy Medical Image Segmentation
Zhen Yang, Yansong Ma, Lei Chen
https://arxiv.org/abs/2510.08949 https://
Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models
Shihao Ji, Zihui Song, Jiajie Huang
https://arxiv.org/abs/2510.12137
Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning
Andrew Lee, Ian Chuang, Dechen Gao, Kai Fukazawa, Iman Soltani
https://arxiv.org/abs/2510.08442
HilbertA: Hilbert Attention for Image Generation with Diffusion Models
Shaoyi Zheng, Wenbo Lu, Yuxuan Xia, Haomin Liu, Shengjie Wang
https://arxiv.org/abs/2509.26538 https://
A Deep Multi-Task Learning Approach to Impulsive Noise Parameter Estimation
Abdullahi Mohammad, Bdah Eya, Bassant Selim
https://arxiv.org/abs/2510.12179 https://
Agent-Based Simulation of a Financial Market with Large Language Models
Ryuji Hashimoto, Takehiro Takayanagi, Masahiro Suzuki, Kiyoshi Izumi
https://arxiv.org/abs/2510.12189 htt…
Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting
Yuri Calleo
https://arxiv.org/abs/2512.17696 https://arxiv.org/pdf/2512.17696 https://arxiv.org/html/2512.17696
arXiv:2512.17696v1 Announce Type: new
Abstract: The modeling of high-dimensional spatio-temporal processes presents a fundamental dichotomy between the probabilistic rigor of classical geostatistics and the flexible, high-capacity representations of deep learning. While Gaussian processes offer theoretical consistency and exact uncertainty quantification, their prohibitive computational scaling renders them impractical for massive sensor networks. Conversely, modern transformer architectures excel at sequence modeling but inherently lack a geometric inductive bias, treating spatial sensors as permutation-invariant tokens without a native understanding of distance. In this work, we propose a spatially-informed transformer, a hybrid architecture that injects a geostatistical inductive bias directly into the self-attention mechanism via a learnable covariance kernel. By formally decomposing the attention structure into a stationary physical prior and a non-stationary data-driven residual, we impose a soft topological constraint that favors spatially proximal interactions while retaining the capacity to model complex dynamics. We demonstrate the phenomenon of ``Deep Variography'', where the network successfully recovers the true spatial decay parameters of the underlying process end-to-end via backpropagation. Extensive experiments on synthetic Gaussian random fields and real-world traffic benchmarks confirm that our method outperforms state-of-the-art graph neural networks. Furthermore, rigorous statistical validation confirms that the proposed method delivers not only superior predictive accuracy but also well-calibrated probabilistic forecasts, effectively bridging the gap between physics-aware modeling and data-driven learning.
toXiv_bot_toot
Attention to Order: Transformers Discover Phase Transitions via Learnability
\c{S}ener \"Oz\"onder
https://arxiv.org/abs/2510.07401 https://arxiv…
LMILAtt: A Deep Learning Model for Depression Detection from Social Media Users Enhanced by Multi-Instance Learning Based on Attention Mechanism
Yukun Yang
https://arxiv.org/abs/2509.26145
Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention
Zhaoxin Feng, Jianfei Ma, Emmanuele Chersoni, Xiaojing Zhao, Xiaoyi Bao
https://arxiv.org/abs/2510.01652
On Structured State-Space Duality
Jerry Yao-Chieh Hu, Xiwen Zhang, Weimin Wu, Han Liu
https://arxiv.org/abs/2510.04944 https://arxiv.org/pdf/2510.04944
Personality-Enhanced Multimodal Depression Detection in the Elderly
Honghong Wang, Jing Deng, Rong Zheng
https://arxiv.org/abs/2510.08004 https://arxiv.org…
Privacy Preserved Federated Learning with Attention-Based Aggregation for Biometric Recognition
Kassahun Azezew, Minyechil Alehegn, Tsega Asresa, Bitew Mekuria, Tizazu Bayh, Ayenew Kassie, Amsalu Tesema, Animut Embiyale
https://arxiv.org/abs/2510.01113
Crosslisted article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[1/2]:
- Limitations of Normalization in Attention Mechanism
Timur Mudarisov, Mikhail Burtsev, Tatiana Petrova, Radu State
In-Context Clustering with Large Language Models
Ying Wang, Mengye Ren, Andrew Gordon Wilson
https://arxiv.org/abs/2510.08466 https://arxiv.org/pdf/2510.08…
Feature Identification for Hierarchical Contrastive Learning
Julius Ott, Nastassia Vysotskaya, Huawei Sun, Lorenzo Servadei, Robert Wille
https://arxiv.org/abs/2510.00837 https:…
Mask Tokens as Prophet: Fine-Grained Cache Eviction for Efficient dLLM Inference
Jianuo Huang, Yaojie Zhang, Yicun Yang, Benhao Huang, Biqing Qi, Dongrui Liu, Linfeng Zhang
https://arxiv.org/abs/2510.09309
TASP: Topology-aware Sequence Parallelism
Yida Wang (Capital Normal University, Infinigence-AI), Ke Hong (Tsinghua University, Infinigence-AI), Xiuhong Li (Infinigence-AI), Yuanchao Xu (Capital Normal University), Wenxun Wang (Tsinghua University), Guohao Dai (Infinigence-AI, Shanghai Jiao Tong University), Yu Wang (Tsinghua University)
https://
Synthetic Series-Symbol Data Generation for Time Series Foundation Models
Wenxuan Wang, Kai Wu, Yujian Betterest Li, Dan Wang, Xiaoyu Zhang
https://arxiv.org/abs/2510.08445 http…
Signature-Informed Transformer for Asset Allocation
Yoontae Hwang, Stefan Zohren
https://arxiv.org/abs/2510.03129 https://arxiv.org/pdf/2510.03129
Random Feature Spiking Neural Networks
Maximilian Gollwitzer, Felix Dietrich
https://arxiv.org/abs/2510.01012 https://arxiv.org/pdf/2510.01012