Tootfinder

@arXiv_csDC_bot@mastoxiv.page
2025-09-30 09:13:41

Scaling LLM Test-Time Compute with Mobile NPU on Smartphones
Zixu Hao, Jianyu Wei, Tuowei Wang, Minxing Huang, Huiqiang Jiang, Shiqi Jiang, Ting Cao, Ju Ren
https://arxiv.org/abs/2509.23324

Scaling LLM Test-Time Compute with Mobile NPU on Smartphones
Deploying Large Language Models (LLMs) on mobile devices faces the challenge of insufficient performance in smaller models and excessive resource consumption in larger ones. This paper highlights that mobile Neural Processing Units (NPUs) have underutilized computational resources, particularly their matrix multiplication units, during typical LLM inference. To leverage this wasted compute capacity, we propose applying parallel test-time scaling techniques on mobile NPUs to enhance the performa…

Tootfinder

Opt-in global Mastodon full text search. Join the index!