Data Augmentation for Spoken Grammatical Error CorrectionPenny Karanasou, Mengjie Qian, Stefano Bann\`o, Mark J. F. Gales, Kate M. Knillhttps://arxiv.org/abs/2507.19374 https:…
Data Augmentation for Spoken Grammatical Error CorrectionWhile there exist strong benchmark datasets for grammatical error correction (GEC), high-quality annotated spoken datasets for Spoken GEC (SGEC) are still under-resourced. In this paper, we propose a fully automated method to generate audio-text pairs with grammatical errors and disfluencies. Moreover, we propose a series of objective metrics that can be used to evaluate the generated data and choose the more suitable dataset for SGEC. The goal is to generate an augmented dataset that maintains…