Replaced article(s) found for q-bio.QM. https://arxiv.org/list/q-bio.QM/new
[1/1]:
- Comparative analysis of computational approaches for predicting Transthyretin (TTR) transcription...
Mariya L. Ivanova, Nicola Russo, Gueorgui Mihaylov, Konstantin Nikolic
Towards Robust Speech Recognition for Jamaican Patois Music Transcription
Jordan Madden, Matthew Stone, Dimitri Johnson, Daniel Geddez
https://arxiv.org/abs/2507.16834 https://
Overcoming Latency Bottlenecks in On-Device Speech Translation: A Cascaded Approach with Alignment-Based Streaming MT
Zeeshan Ahmed, Frank Seide, Niko Moritz, Ju Lin, Ruiming Xie, Simone Merello, Zhe Liu, Christian Fuegen
https://arxiv.org/abs/2508.13358
A Comparative Study of Floating-Base Space Parameterizations for Agile Whole-Body Motion Planning
Evangelos Tsiatsianas, Chairi Kiourt, Konstantinos Chatzilygeroudis
https://arxiv.org/abs/2508.11520
Exploiting Music Source Separation for Automatic Lyrics Transcription with Whisper
Jaza Syed, Ivan Meresman Higgs, Ond\v{r}ej C\'ifka, Mark Sandler
https://arxiv.org/abs/2506.15514
Dude, where's my utterance? Evaluating the effects of automatic segmentation and transcription on CPS detection
Videep Venkatesha, Mariah Bradford, Nathaniel Blanchard
https://arxiv.org/abs/2507.04454
Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription
Anna Hamberger, Sebastian Murgul, Jochen Schmidt, Michael Heizmann
https://arxiv.org/abs/2506.14223
The Impact of Automatic Speech Transcription on Speaker Attribution
Cristina Aggazzotti, Matthew Wiesner, Elizabeth Allyn Smith, Nicholas Andrews
https://arxiv.org/abs/2507.08660 …
Addressing Pitfalls in Auditing Practices of Automatic Speech Recognition Technologies: A Case Study of People with Aphasia
Katelyn Xiaoying Mei, Anna Seo Gyeong Choi, Hilke Schellmann, Mona Sloane, Allison Koenecke
https://arxiv.org/abs/2506.08846
Crosslisted article(s) found for eess.AS. https://arxiv.org/list/eess.AS/new
[1/1]:
- Is Transfer Learning Necessary for Violin Transcription?
Yueh-Po Peng, Ting-Kang Wang, Li Su, Vincent K. M. Cheung
Replaced article(s) found for physics.bio-ph. https://arxiv.org/list/physics.bio-ph/new/
[1/1]:
Design principles of transcription factors with intrinsically disordered regions
Replaced article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[7/8]:
- Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering
Chowdhury, Aukkapinyo, Fujimura, Woo, Wasusatein, Ghourabi
RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection
Sungkyun Chang, Simon Dixon, Emmanouil Benetos
https://arxiv.org/abs/2507.12175
Programmable Co-Transcriptional Splicing: Realizing Regular Languages via Hairpin Deletion
Da-Jung Cho, Szil\'ard Zsolt Fazekas, Shinnosuke Seki, Max Wiedenh\"oft
https://arxiv.org/abs/2506.23384
Fast decisions with biophysically constrained gene promoter architectures
Tarek Tohme, Massimo Vergassola, Thierry Mora, Aleksandra M. Walczak
https://arxiv.org/abs/2507.03720
Comparative analysis of computational approaches for predicting Transthyretin transcription activators and human dopamine D1 receptor antagonists
Mariya L. Ivanova, Nicola Russo, Konstantin Nikolic
https://arxiv.org/abs/2506.01137
Score-Informed BiLSTM Correction for Refining MIDI Velocity in Automatic Piano Transcription
Zhanhong He (David), Roberto Togneri (David), Defeng (David), Huang
https://arxiv.org/abs/2508.07757
Modelling transcriptional silencing and its coupling to 3D genome organisation
Massimiliano Semeraro, Giuseppe Negro, Davide Marenduzzo, Giada Forte
https://arxiv.org/abs/2507.02150
A Custom-Built Ambient Scribe Reduces Cognitive Load and Documentation Burden for Telehealth Clinicians
Justin Morse, Kurt Gilbert, Kyle Shin, Rick Cooke, Peyton Rose, Jack Sullivan, Angelo Sisante
https://arxiv.org/abs/2507.17754
Methods for pitch analysis in contemporary popular music: multiple pitches from harmonic tones in Vitalic's music
Emmanuel Deruty, David Meredith, Maarten Grachten, Pascal Arbez-Nicolas, Andreas Hasselholt J{\o}rgensen, Oliver S{\o}nderm{\o}lle Hansen, Magnus Stensli, Christian N{\o}rk{\ae}r Petersen
https://arxiv.org/abs/25…
Assessing the Impact of Anisotropy in Neural Representations of Speech: A Case Study on Keyword Spotting
Guillaume Wisniewski (LLF - UMR7110), S\'everine Guillaume (LACITO), Clara Rosina Fern\'andez (LACITO)
https://arxiv.org/abs/2506.11096
Topological weight and structural diversity of polydisperse chromatin loop networks
Andrea Bonato, Enrico Carlon, Sergey Kitaev, Davide Marenduzzo, Enzo Orlandini
https://arxiv.org/abs/2507.00520
LCS-CTC: Leveraging Soft Alignments to Enhance Phonetic Transcription Robustness
Zongli Ye, Jiachen Lian, Akshaj Gupta, Xuanru Zhou, Krish Patel, Haodong Li, Hwi Joo Park, Chenxu Guo, Shuhe Li, Sam Wang, Cheol Jun Cho, Zoe Ezzes, Jet M. J. Vonk, Brittany T. Morin, Rian Bogley, Lisa Wauters, Zachary A. Miller, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli
htt…
STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation
Wenxiang Guo, Yu Zhang, Changhao Pan, Zhiyuan Zhu, Ruiqi Li, Zhetao Chen, Wenhao Xu, Fei Wu, Zhou Zhao
https://arxiv.org/abs/2507.06670

STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation
Recent breakthroughs in singing voice synthesis (SVS) have heightened the demand for high-quality annotated datasets, yet manual annotation remains prohibitively labor-intensive and resource-intensive. Existing automatic singing annotation (ASA) methods, however, primarily tackle isolated aspects of the annotation pipeline. To address this fundamental challenge, we present STARS, which is, to our knowledge, the first unified framework that simultaneously addresses singing transcription, alignme…
A Survey on Non-Intrusive ASR Refinement: From Output-Level Correction to Full-Model Distillation
Mohammad Reza Peyghan, Fatemeh Rajabi, Saman Soleimani Roudi, Saeedreza Zouashkiani, Sajjad Amini, Shahrokh Ghaemmaghami
https://arxiv.org/abs/2508.07285

A Survey on Non-Intrusive ASR Refinement: From Output-Level Correction to Full-Model Distillation
Automatic Speech Recognition (ASR) has become an integral component of modern technology, powering applications such as voice-activated assistants, transcription services, and accessibility tools. Yet ASR systems continue to struggle with the inherent variability of human speech, such as accents, dialects, and speaking styles, as well as environmental interference, including background noise. Moreover, domain-specific conversations often employ specialized terminology, which can exacerbate tran…
SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
Raymond Grossman, Taejin Park, Kunal Dhawan, Andrew Titus, Sophia Zhi, Yulia Shchadilova, Weiqing Wang, Jagadeesh Balam, Boris Ginsburg
https://arxiv.org/abs/2508.05554
Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss
Jiawen Huang, Felipe Sousa, Emir Demirel, Emmanouil Benetos, Igor Gadelha
https://arxiv.org/abs/2506.02339
Whisfusion: Parallel ASR Decoding via a Diffusion Transformer
Taeyoun Kwon, Junhyuk Ahn, Taegeun Yun, Heeju Jwa, Yoonchae Choi, Siwon Park, Nam-Joon Kim, Jangchan Kim, Hyun Gon Ryu, Hyuk-Jae Lee
https://arxiv.org/abs/2508.07048
Improved Dysarthric Speech to Text Conversion via TTS Personalization
P\'eter Mihajlik, \'Eva Sz\'ekely, Piroska Barta, M\'at\'e Soma K\'ad\'ar, Gergely Dobsinszki, L\'aszl\'o T\'oth
https://arxiv.org/abs/2508.06391
Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models
Yuke Lin, Ming Cheng, Ze Li, Beilong Tang, Ming Li
https://arxiv.org/abs/2506.05796
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
Han Yin, Yafeng Chen, Chong Deng, Luyao Cheng, Hui Wang, Chao-Hong Tan, Qian Chen, Wen Wang, Xiangang Li
https://arxiv.org/abs/2508.06372
Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models
Chen Feng, Yicheng Lin, Shaojie Zhuo, Chenzheng Su, Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Xiaopeng Zhang
https://arxiv.org/abs/2507.07877
Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR
Ale\v{s} Pra\v{z}\'ak, Marie Kune\v{s}ov\'a, Josef Psutka
https://arxiv.org/abs/2506.20288
You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
Paige Tutt\"os\'i, H. Henny Yeung, Yue Wang, Jean-Julien Aucouturier, Angelica Lim
https://arxiv.org/abs/2506.23367