Low-resource keyword spotting using contrastively trained transformer acoustic word embeddingsJulian Herreilers, Christiaan Jacobs, Thomas Nieslerhttps://arxiv.org/abs/2506.17690
Low-resource keyword spotting using contrastively trained transformer acoustic word embeddingsWe introduce a new approach, the ContrastiveTransformer, that produces acoustic word embeddings (AWEs) for the purpose of very low-resource keyword spotting. The ContrastiveTransformer, an encoder-only model, directly optimises the embedding space using normalised temperature-scaled cross entropy (NT-Xent) loss. We use this model to perform keyword spotting for radio broadcasts in Luganda and Bambara, the latter a severely under-resourced language. We compare our model to various existing AWE a…