Zen-Attention: A Compiler Framework for Dynamic Attention Folding on AMD NPUs
Aadesh Deshmukh, Venkata Yaswanth Raparti, Samuel Hsu
https://arxiv.org/abs/2508.17593 https://
Evaluating the Energy Efficiency of NPU-Accelerated Machine Learning Inference on Embedded Microcontrollers
Anastasios Fanariotis, Theofanis Orphanoudakis, Vasilis Fotopoulos
https://arxiv.org/abs/2509.17533
eIQ Neutron: Redefining Edge-AI Inference with Integrated NPU and Compiler Innovations
Lennart Bamberg, Filippo Minnella, Roberto Bosio, Fabrizio Ottati, Yuebin Wang, Jongmin Lee, Luciano Lavagno, Adam Fuks
https://arxiv.org/abs/2509.14388
From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs
Tianhao Zhu, Dahu Feng, Erhu Feng, Yubin Xia
https://arxiv.org/abs/2510.05632 https://
Benchmarking Deep Learning Convolutions on Energy-constrained CPUs
Enrique Galvez (ALSOC), Adrien Cassagne (ALSOC), Alix Munier (ALSOC), Manuel Bouyer
https://arxiv.org/abs/2509.26217
H2SGEMM: Emulating FP32 GEMM on Ascend NPUs using FP16 Units with Precision Recovery and Cache-Aware Optimization
Weicheng Xue, Baisong Xu, Kai Yang, Yongxiang Liu, Dengdeng Fan, Pengxiang Xu, Yonghong Tian
https://arxiv.org/abs/2507.23387
Replaced article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[8/9]:
- MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
Feilong Chen, Yijiang Liu, Yi Huang, Hao Wang, Miren Tian, Ya-Qi Yu, Minghui Liao, Jihao Wu
Scaling LLM Test-Time Compute with Mobile NPU on Smartphones
Zixu Hao, Jianyu Wei, Tuowei Wang, Minxing Huang, Huiqiang Jiang, Shiqi Jiang, Ting Cao, Ju Ren
https://arxiv.org/abs/2509.23324
Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices
Yilong Li, Shuai Zhang, Yijing Zeng, Hao Zhang, Xinmiao Xiong, Jingyu Liu, Pan Hu, Suman Banerjee
https://arxiv.org/abs/2510.05109
Replaced article(s) found for cs.DC. https://arxiv.org/list/cs.DC/new
[1/1]:
- SGEMM-cube: Emulating FP32 GEMM on Ascend NPUs Using FP16 Cube Units with Precision Recovery
Weicheng Xue, Baisong Xu, Kai Yang, Yongxiang Liu, Dengdeng Fan, Pengxiang Xu, Yonghong Tian