Scaling LLM Test-Time Compute with Mobile NPU on Smartphones
Zixu Hao, Jianyu Wei, Tuowei Wang, Minxing Huang, Huiqiang Jiang, Shiqi Jiang, Ting Cao, Ju Ren
https://arxiv.org/abs/2509.23324
Qualcomm unveils two AI chips, the AI200, set for 2026, and the AI250, planned for 2027, based on its Hexagon NPUs, and says Humain is the first customer (Kif Leswing/CNBC)
https://www.cnbc.com/2025/10/27/qualcomm-ai200-ai250-ai-chips-nvidia-amd.html
Context-Driven Performance Modeling for Causal Inference Operators on Neural Processing Units
Neelesh Gupta, Rakshith Jayanth, Dhruv Parikh, Viktor Prasanna
https://arxiv.org/abs/2509.25155
From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs
Tianhao Zhu, Dahu Feng, Erhu Feng, Yubin Xia
https://arxiv.org/abs/2510.05632 https://
Evaluating the Energy Efficiency of NPU-Accelerated Machine Learning Inference on Embedded Microcontrollers
Anastasios Fanariotis, Theofanis Orphanoudakis, Vasilis Fotopoulos
https://arxiv.org/abs/2509.17533
Benchmarking Deep Learning Convolutions on Energy-constrained CPUs
Enrique Galvez (ALSOC), Adrien Cassagne (ALSOC), Alix Munier (ALSOC), Manuel Bouyer
https://arxiv.org/abs/2509.26217
eIQ Neutron: Redefining Edge-AI Inference with Integrated NPU and Compiler Innovations
Lennart Bamberg, Filippo Minnella, Roberto Bosio, Fabrizio Ottati, Yuebin Wang, Jongmin Lee, Luciano Lavagno, Adam Fuks
https://arxiv.org/abs/2509.14388
Replaced article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[8/9]:
- MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
Feilong Chen, Yijiang Liu, Yi Huang, Hao Wang, Miren Tian, Ya-Qi Yu, Minghui Liao, Jihao Wu
Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices
Yilong Li, Shuai Zhang, Yijing Zeng, Hao Zhang, Xinmiao Xiong, Jingyu Liu, Pan Hu, Suman Banerjee
https://arxiv.org/abs/2510.05109