2024-03-06 07:35:29
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel
https://arxiv.org/abs/2403.02920
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel
https://arxiv.org/abs/2403.02920
Fairly Evaluating Large Language Model-based Recommendation Needs Revisit the Cross-Entropy Loss
Cong Xu, Zhangchi Zhu, Jun Wang, Jianyong Wang, Wei Zhang
https://arxiv.org/abs/2402.06216