
2025-07-03 10:03:40
MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
Zhixun Chen, Ping Guo, Wenhan Han, Yifan Zhang, Binbin Liu, Haobin Lin, Fengze Liu, Yan Zhao, Bingni Zhang, Taifeng Wang, Yin Zheng, Meng Fang
https://arxiv.org/abs/2507.01785