Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data PoisoningWassim Bouaziz, Mathurin Videau, Nicolas Usunier, El-Mahdi El-Mhamdihttps://arxiv.org/abs/2506.14913
Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data PoisoningThe pre-training of large language models (LLMs) relies on massive text datasets sourced from diverse and difficult-to-curate origins. Although membership inference attacks and hidden canaries have been explored to trace data usage, such methods rely on memorization of training data, which LM providers try to limit. In this work, we demonstrate that indirect data poisoning (where the targeted behavior is absent from training data) is not only feasible but also allow to effectively protect a dat…