I appreciate Hidde’s ability to take preposterous claims by a major accessibility tooling maker and the rampant thievery of IP by LLM makers, consider their dehumanizing output, and propose this gentle, whisper-soft rebuke:
https://hidde.blog/teaching-llms-or-companies/
'Dem Boden ganz nah' #FotoVorschlag 'close to the ground'
I think this could fit the topic. I took this photo when we had some vacations at the #balticSea . Quite a while ago - but all fond memories. 🙂
HITSZ's End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track
Xuchen Wei, Yangxin Wu, Yaoyin Zhang, Henglyu Liu, Kehai Chen, Xuefeng Bai, Min Zhang
https://arxiv.org/abs/2507.19616
Cooles Feature. Ich bin kein großer AI Fan gleichzeitig sehe ich bei Transkription tatsächlich Potenzial. Und nahtlos und offline SRT Dateien erstellen zu lassen ist super.
FFmpeg 8.0 integriert Whisper: Lokale Audio-Transkription ohne Cloud | heise online
“I sat beside a boy who tried to smile at me. I couldn’t return a real smile. Tears welled in my eyes as I realized words could never reach the horrors his soul had witnessed. All I could do was place my hand gently on his shoulder and whisper, ‘You are not alone.’”
– @…

I was walking down a street that had once been full of life—shops open,...
Then I saw them. Children. Some missing an arm, some missing a leg, some missing both, yet their small bodies moved through the debris with a courage that seemed almost impossible. Their eyes—wide, searching—asked questions no child should ever have to ask: Why me? Why here? Will anyone see me? A boy sat on the curb, rolling a broken toy car over the cracks. His laugh was fragile, sharp, yet it was still laughter. Nearby, a girl balanced on a fallen beam, her small feet gripping...
Structured Sparsity and Weight-adaptive Pruning for Memory and Compute efficient Whisper models
Prasenjit K Mudi, Anshi Sachan, Dahlia Devapriya, Sheetal Kalyani
https://arxiv.org/abs/2510.12666
@… Jordi, I hear Macwhisper is now blazingly fast with Nvidia tech — did you ever finish the export that we can use for the podcasts with multiple speaker speakers plus chapter marker − .srt?
I’m always waiting to use Mac whisper, but I can’t because the output file doesn’t help me.
And I am one of thousands.
The three words you can whisper into the ear of any support AI chatbot to get their attention
credit card chargeback
VOX-KRIKRI: Unifying Speech and Language through Continuous Fusion
Dimitrios Damianos, Leon Voukoutis, Georgios Paraskevopoulos, Vassilis Katsouros
https://arxiv.org/abs/2509.15667
Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition
Yujian Ma, Jinqiu Sang, Ruizhe Li
https://arxiv.org/abs/2509.08454 https:/…
📹 Creates SRT subtitle files for videos and supports real-time live broadcast transcription
🔄 Seamless workflow integration
🔄 allowing automated processing and data transfer to other applications
🔗 Source URL
https://www.heise.de/en/news…
Why must we whisper so much? A quite sad season opener feels bad!
#CriticalRole
Programmieren per Spracheingabe LLM, ein riesen Spaß. Gerade hat whisper "grobes Scheiß-Skript" verstanden und es ist inhaltlich absolut korrekt :D Hab "shell script" gesagt, aber ersteres gedacht...
Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition
Niclas Pokel, Pehu\'en Moure, Roman Boehringer, Shih-Chii Liu, Yingqiang Gao
https://arxiv.org/abs/2509.20397
It's basically impossible to imagine a time where Careless Whisper was a song you'd never heard before. But here it is. #TOTP
#Python Friday #292: Extract Text From Audio Files With Whisper - #ai #nlp
AISHELL6-whisper: A Chinese Mandarin Audio-visual Whisper Speech Dataset with Speech Recognition Baselines
Cancan Li, Fei Su, Juan Liu, Hui Bu, Yulong Wan, Hongbin Suo, Ming Li
https://arxiv.org/abs/2509.23833
Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies
Vishnu Raja, Adithya V Ganesan, Anand Syamkumar, Ritwik Banerjee, H Andrew Schwartz
https://arxiv.org/abs/2509.16718
Crosslisted article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[4/5]:
- Leveraging Whisper Embeddings for Audio-based Lyrics Matching
Eleonora Mancini, Joan Serr\`a, Paolo Torroni, Yuki Mitsufuji
Isoperimetric-type inequalities for Mather's $\beta$-function of convex billiards
Stefano Baranzini, Misha Bialy, Alfonso Sorrentino
https://arxiv.org/abs/2509.06915 https:/…
Probing Whisper for Dysarthric Speech in Detection and Assessment
Zhengjun Yue, Devendra Kayande, Zoran Cvetkovic, Erfan Loweimi
https://arxiv.org/abs/2510.04219 https://…
Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
Linus Stuhlmann, Michael Alexander Saxer
https://arxiv.org/abs/2509.00230

Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
This study evaluates the performance of three advanced speech encoder models, Wav2Vec 2.0, XLS-R, and Whisper, in speaker identification tasks. By fine-tuning these models and analyzing their layer-wise representations using SVCCA, k-means clustering, and t-SNE visualizations, we found that Wav2Vec 2.0 and XLS-R capture speaker-specific features effectively in their early layers, with fine-tuning improving stability and performance. Whisper showed better performance in deeper layers. Additional…
Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition
Martin Kocour, Martin Karafiat, Alexander Polok, Dominik Klement, Luk\'a\v{s} Burget, Jan \v{C}ernock\'y
https://arxiv.org/abs/2510.03723
🎵 #FFmpeg 8.0 integrates #Whisper for local audio transcription 🎵 #ai
🔧 Direct integration of #OpenAI
Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
William Ravenscroft, George Close, Kit Bower-Morris, Jamie Stacey, Dmitry Sityaev, Kris Y. Hong
https://arxiv.org/abs/2507.21642
Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
Mengqi Wang, Zhan Liu, Zengrui Jin, Guangzhi Sun, Chao Zhang, Philip C. Woodland
https://arxiv.org/abs/2509.16622
WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers
Akshat Pandey, Karun Kumar, Raphael Tang
https://arxiv.org/abs/2509.10452 …
Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages
Mingchen Shao, Bingshen Mu, Chengyou Wang, Hai Li, Ying Yan, Zhonghua Fu, Lei Xie
https://arxiv.org/abs/2509.14804
WhiSQA: Non-Intrusive Speech Quality Prediction Using Whisper Encoder Features
George Close, Kris Hong, Thomas Hain, Stefan Goetze
https://arxiv.org/abs/2508.02210 https://
Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM
Ryandhimas E. Zezario, Dyah A. M. G. Wisnu, Hsin-Min Wang, Yu Tsao
https://arxiv.org/abs/2509.03013
Large Language Model Data Generation for Enhanced Intent Recognition in German Speech
Theresa Pekarek Rosin, Burak Can Kaplan, Stefan Wermter
https://arxiv.org/abs/2508.06277 ht…
A Study on Zero-Shot Non-Intrusive Speech Intelligibility for Hearing Aids Using Large Language Models
Ryandhimas E. Zezario, Dyah A. M. G. Wisnu, Hsin-Min Wang, Yu Tsao
https://arxiv.org/abs/2509.03021
Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches
Ahmed Aboeitta, Ahmed Sharshar, Youssef Nafea, Shady Shehata
https://arxiv.org/abs/2508.08027
Competitive Audio-Language Models with Data-Efficient Single-Stage Training on Public Data
Gokul Karthik Kumar, Rishabh Saraf, Ludovick Lepauloux, Abdul Muneer, Billel Mokeddem, Hakim Hacid
https://arxiv.org/abs/2509.07526
Crosslisted article(s) found for eess.AS. https://arxiv.org/list/eess.AS/new
[1/1]:
- Whisper Smarter, not Harder: Adversarial Attack on Partial Suppression
Zheng Jie Wong, Bingquan Shen
Keyword Spotting with Hyper-Matched Filters for Small Footprint Devices
Yael Segal-Feldman, Ann R. Bradlow, Matthew Goldrick, Joseph Keshet
https://arxiv.org/abs/2508.04857 http…
Feature Importance across Domains for Improving Non-Intrusive Speech Intelligibility Prediction in Hearing Aids
Ryandhimas E. Zezario, Sabato M. Siniscalchi, Fei Chen, Hsin-Min Wang, Yu Tsao
https://arxiv.org/abs/2507.23223