Zen-Attention: A Compiler Framework for Dynamic Attention Folding on AMD NPUs
Aadesh Deshmukh, Venkata Yaswanth Raparti, Samuel Hsu
https://arxiv.org/abs/2508.17593 https://
H2SGEMM: Emulating FP32 GEMM on Ascend NPUs using FP16 Units with Precision Recovery and Cache-Aware Optimization
Weicheng Xue, Baisong Xu, Kai Yang, Yongxiang Liu, Dengdeng Fan, Pengxiang Xu, Yonghong Tian
https://arxiv.org/abs/2507.23387
Forecasting LLM Inference Performance via Hardware-Agnostic Analytical Modeling
Rajeev Patwari, Ashish Sirasao, Devleena Das
https://arxiv.org/abs/2508.00904 https://
NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers
Sarunas Kalade, Graham Schelle
https://arxiv.org/abs/2507.14403 https://
Replaced article(s) found for cs.DC. https://arxiv.org/list/cs.DC/new
[1/1]:
- SGEMM-cube: Emulating FP32 GEMM on Ascend NPUs Using FP16 Cube Units with Precision Recovery
Weicheng Xue, Baisong Xu, Kai Yang, Yongxiang Liu, Dengdeng Fan, Pengxiang Xu, Yonghong Tian
Flexible Vector Integration in Embedded RISC-V SoCs for End to End CNN Inference Acceleration
Dmitri Lyalikov
https://arxiv.org/abs/2507.17771 https://arxi…