Unlock the Potential of Fine-grained LLM Serving via Dynamic Module ScalingJingfeng Wu, Yiyuan He, Minxian Xu, Xitong Gao, Kejiang Ye, Chengzhong Xuhttps://arxiv.org/abs/2507.18006
Unlock the Potential of Fine-grained LLM Serving via Dynamic Module ScalingThe rise of large language models (LLMs) has created new opportunities across various fields but has also introduced significant challenges in resource management. Current LLM serving systems face a fundamental tension: balancing serving demands with limited resources while adapting to unpredictable traffic patterns. Static deployments lead to suboptimal resource utilization and performance degradation under dynamic workloads. Furthermore, the high cost of adjusting instances hinders dynamic sc…