Advancing Conversational AI with Shona Slang: A Dataset and Hybrid Model for Digital Inclusion
Happymore Masoka
https://arxiv.org/abs/2509.14249 https://ar…
AI, AGI, and learning efficiency
My 4-month-old kid is not DDoSing Wikipedia right now, nor will they ever do so before learning to speak, read, or write. Their entire "training corpus" will not top even 100 million "tokens" before they can speak & understand language, and do so with real intentionally.
Just to emphasize that point: 100 words-per-minute times 60 minutes-per-hour times 12 hours-per-day times 365 days-per-year times 4 years is a mere 105,120,000 words. That's a ludicrously *high* estimate of words-per-minute and hours-per-day, and 4 years old (the age of my other kid) is well after basic speech capabilities are developed in many children, etc. More likely the available "training data" is at least 1 or 2 orders of magnitude less than this.
The point here is that large language models, trained as they are on multiple *billions* of tokens, are not developing their behavioral capabilities in a way that's remotely similar to humans, even if you believe those capabilities are similar (they are by certain very biased ways of measurement; they very much aren't by others). This idea that humans must be naturally good at acquiring language is an old one (see e.g. #AI #LLM #AGI
Simulated Language Acquisition in a Biologically Realistic Model of the Brain
Daniel Mitropolsky, Christos Papadimitriou
https://arxiv.org/abs/2507.11788 h…
MultimodalHugs: Enabling Sign Language Processing in Hugging Face
Gerard Sant, Zifan Jiang, Carlos Escolano, Amit Moryossef, Mathias M\"uller, Rico Sennrich, Sarah Ebling
https://arxiv.org/abs/2509.09729
SpikingBrain Technical Report: Spiking Brain-inspired Large Models
Yuqi Pan, Yupeng Feng, Jinghao Zhuang, Siyu Ding, Zehao Liu, Bohan Sun, Yuhong Chou, Han Xu, Xuerui Qiu, Anlin Deng, Anjie Hu, Peng Zhou, Man Yao, Jibin Wu, Jian Yang, Guoliang Sun, Bo Xu, Guoqi Li
https://arxiv.org/abs/2509.05276
Continuous Bangla Sign Language Translation: Mitigating the Expense of Gloss Annotation with the Assistance of Graph
Safaeid Hossain Arib, Rabeya Akter, Sejuti Rahman
https://arxiv.org/abs/2508.10687
MultiVox: Benchmarking Voice Assistants for Multimodal Interactions
Ramaneswaran Selvakumar, Ashish Seth, Nishit Anand, Utkarsh Tyagi, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha
https://arxiv.org/abs/2507.10859
AI Conversational Tutors in Foreign Language Learning: A Mixed-Methods Evaluation Study
Nikolaos Avouris
https://arxiv.org/abs/2508.05156 https://arxiv.org…
Bias after Prompting: Persistent Discrimination in Large Language Models
Nivedha Sivakumar, Natalie Mackraz, Samira Khorshidi, Krishna Patel, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff
https://arxiv.org/abs/2509.08146
Natural language processing for African languages
David Ifeoluwa Adelani
https://arxiv.org/abs/2507.00297 https://arxiv.org/pdf/2507.…