Taming Data Challenges in ML-based Security Tasks: Lessons from Integrating Generative AIShravya Kanchi, Neal Mangaokar, Aravind Cheruvu, Sifat Muhammad Abdullah, Shirin Nilizadeh, Atul Prakash, Bimal Viswanathhttps://arxiv.org/abs/2507.06092
Taming Data Challenges in ML-based Security Tasks: Lessons from Integrating Generative AIMachine learning-based supervised classifiers are widely used for security tasks, and their improvement has been largely focused on algorithmic advancements. We argue that data challenges that negatively impact the performance of these classifiers have received limited attention. We address the following research question: Can developments in Generative AI (GenAI) address these data challenges and improve classifier performance? We propose augmenting training datasets with synthetic data genera…