MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
Zhixun Chen, Ping Guo, Wenhan Han, Yifan Zhang, Binbin Liu, Haobin Lin, Fengze Liu, Yan Zhao, Bingni Zhang, Taifeng Wang, Yin Zheng, Meng Fang
https://arxiv.org/abs/2507.01785
JANUS: Resilient and Adaptive Data Transmission for Enabling Timely and Efficient Cross-Facility Scientific Workflows
Vladislav Esaulov, Jieyang Chen, Norbert Podhorszki, Fred Suter, Scott Klasky, Anu G Bourgeois, Lipeng Wan
https://arxiv.org/abs/2506.17084