
2025-06-12 07:57:21
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction
Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li
https://arxiv.org/abs/2506.09792
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction
Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li
https://arxiv.org/abs/2506.09792
OH in corpus-linguistic UX convo:
> He who go too far down long tail end up wagging dog
Intergenerational AI Literacy in Korean Immigrant Families: Interpretive Gatekeeping Meets Convenient Critical Deferment
Jeongone Seo, Ryan Womack, Tawfiq Ammari
https://arxiv.org/abs/2506.10197
You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks
\"Unal Ege Gaznepoglu, Anna Leschanowsky, Ahmad Aloradi, Prachi Singh, Daniel Tenbrinck, Emanu\"el A. P. Habets, Nils Peters
https://arxiv.org/abs/2506.09521
Linguistic Ordered Weighted Averaging based deep learning pooling for fault diagnosis in a wastewater treatment plant
Alicia Beneyto-Rodriguez, Gregorio I. Sainz-Palmero, Marta Galende-Hern\'andez, Mar\'ia J. Fuente
https://arxiv.org/abs/2506.08676
This https://arxiv.org/abs/2506.03589 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCV_…
Early linguistic fingerprints of online users who engage with conspiracy communities
Francesco Corso, Giuseppe Russo, Francesco Pierri, Gianmarco De Francisci Morales
https://arxiv.org/abs/2506.05086
StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
Fengjin Li, Jie Wang, Yadong Niu, Yongqing Wang, Meng Meng, Jian Luan, Zhiyong Wu
https://arxiv.org/abs/2506.02414
Students, THIS is how generative AI (ChatGPT et alia) "reads" papers and "understands" content. It's a bullshit machine, a gaslighting machine. It shows the linguistic behavior of a psychopath (is this what us humans average to, if one trains on all our "content" and online behavior?). Yikes.
https://amandaguinzburg.substack.com/p/diabolus-ex-machina
Voice Impression Control in Zero-Shot TTS
Keinichi Fujita, Shota Horiguchi, Yusuke Ijima
https://arxiv.org/abs/2506.05688 https://arx…
We have in fact encountered similar things to "linguistic sequence processing". Bullshit artists, politicians with the "truthiness" of Ronald Reagan (coined by a comedian at the time). I have had personal experience being a rabbit in a spotlight where this kind of "thinking" kicks in. Got out of that game, thankfully.
This https://arxiv.org/abs/2502.00698 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Mustafa Shukor, Dana Aubakirova, Francesco Capuano, Pepijn Kooijmans, Steven Palma, Adil Zouitine, Michel Aractingi, Caroline Pascal, Martino Russi, Andres Marafioti, Simon Alibert, Matthieu Cord, Thomas Wolf, Remi Cadene
https://arxiv.org/abs/2506.018…
Den fŸlelse når man er fan af en ny podcast - og så selv bliver inviteret på den! 🤓🎉
https://podcasts.apple.com/dk/podcast/writing-wrongs/id1797795962?l=da
From Guidelines to Practice: A New Paradigm for Arabic Language Model Evaluation
Serry Sibaee, Omer Nacar, Adel Ammar, Yasser Al-Habashi, Abdulrahman Al-Batati, Wadii Boulila
https://arxiv.org/abs/2506.01920
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang, Zixin Zhang, Bin Wang, Bo Li, Buyun Ma, Changxin Miao, Changyi Wan, Chen Xu, Dapeng Shi, Dingyuan Hu, Enle…
This https://arxiv.org/abs/2410.00527 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…
This https://arxiv.org/abs/2506.02139 has been replaced.
link: https://scholar.google.com/scholar?q=a
An Exploratory Framework for Future SETI Applications: Detecting Generative Reactivity via Language Models
Po-Chieh Yu
#toXiv_bot_toot
Rhythm Features for Speaker Identification
Nick Mehlman, Thomas Thebaud, Dani Byrd, Shri Narayanan
https://arxiv.org/abs/2506.06834 https://
This https://arxiv.org/abs/2505.23018 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csMM_…
Breaking the Barriers of Text-Hungry and Audio-Deficient AI
Hamidou Tembine, Issa Bamia, Massa NDong, Bakary Coulibaly, Oumar Issiaka Traore, Moussa Traore, Moussa Sanogo, Mamadou Eric Sangare, Salif Kante, Daryl Noupa Yongueng, Hafiz Tiomoko Ali, Malik Tiomoko, Frejus Laleye, Boualem Djehiche, Wesmanegda Elisee Dipama, Idris Baba Saje, Hammid Mohammed Ibrahim, Moumini Sanogo, Marie Coursel Nininahazwe, Abdul-Latif Siita, Haine Mhlongo, Teddy Nelvy Dieu Merci Kouka, Mariam Serine Jerid…
Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion
Seymanur Akti, Tuan Nam Nguyen, Alexander Waibel
https://arxiv.org/abs/2506.04013
This https://arxiv.org/abs/2503.01879 has been replaced.
link: https://scholar.google.com/scholar?q=a
This https://arxiv.org/abs/2409.03636 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…
EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations
Haoqin Sun, Xuechen Wang, Jinghua Zhao, Shiwan Zhao, Jiaming Zhou, Hui Wang, Jiabei He, Aobo Kong, Xi Yang, Yequan Wang, Yonghua Lin, Yong Qin
https://arxiv.org/abs/2505.23018
This https://arxiv.org/abs/2505.15004 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…