Tootfinder

Opt-in global Mastodon full text search. Join the index!

@jamesthebard@social.linux.pizza
2025-08-20 19:05:28

So, been wondering why my mic audio has been absolutely wretched over the last few videos and such. Come to find out, if you accidentally wreck almost all of the settings on the compressor/gate then things are going to go very, very poorly. Did a quick test run in OBS and everything sounded so much better.
Current setup for the microphone: Earthworks Audio Ethos -> Universal Audio SOLO/610 -> dbx 286s -> [insert generic audio interface] -> Linux rig

A picture of the Earthworks Audio Ethos microphone on a microphone arm, the top of an audio rack showing an Allen&Heath CQ-12T mixer beside a Universal Audio SOLO/610.  Below those is a dbx 286s audio processor strip (1U).
@v_i_o_l_a@openbiblio.social
2025-08-19 20:25:07

#hamstercontent im WDR-zeitzeichen: "Laborheld und Kinderliebling: Siegeszug des Goldhamsters" www1.wdr.de/mediathek/audio/ze

@Techmeme@techhub.social
2025-08-20 16:40:49

Google plans to update the Pixel Buds Pro 2 in September with adaptive audio, background noise reduction when using Gemini Live, and a new "moonstone" color (Victoria Song/The Verge)
theverge.com/news/762048/googl

@stiefkind@mastodon.social
2025-08-21 11:17:50

»Kernenergie erzeugt kein CO2 und ist deshalb die Lösung für unser Energieproblem.« Es ist natürlich nicht ganz so simpel, wie diese Sorte populistischer Parolen. Der Beitrag "Milliardengrab Atomkraft" von 2022 (50min) beleuchtet da ein paar problematische Ecken. Leider werden diejenigen, die da aufmerksam zuhören sollten, auch hier konsequent weghören. Podcast-Empfehlung trotzdem:

@NFL@darktundra.xyz
2025-10-19 19:46:34

Panthers-Jets broadcast temporarily affected by audio problems nytimes.com/athletic/6731135/2

@lightweight@mastodon.nzoss.nz
2025-10-20 18:28:46
@heiseonline@social.heise.de
2025-08-19 11:40:00

Zahlen, bitte! 74 Minuten – Die krumme Abspieldauer der Audio-CD
Geschichten ranken sich um die 74-Minuten-Abspieldauer der CD: offiziell um Beethovens 9. Symphonie komplett abzuspielen ist der wahre Grund wohl profaner.

@matthiasott@mastodon.social
2025-09-18 14:00:31

There are few investments more valuable than a set of good speakers and headphones. It makes such a vast difference when you’re listening to your favourite music every day. 😍🎶
My setup atm:
🎧 Austrian Audio Hi-X60
austrian.audio/product/hi-x60/
🔈 ADAM Audio T5V

@arXiv_csSD_bot@mastoxiv.page
2025-08-20 07:54:30

DegDiT: Controllable Audio Generation with Dynamic Event Graph Guided Diffusion Transformer
Yisu Liu, Chenxing Li, Wanqian Zhang, Wenfu Wang, Meng Yu, Ruibo Fu, Zheng Lin, Weiping Wang, Dong Yu
arxiv.org/abs/2508.13786

@arXiv_csCV_bot@mastoxiv.page
2025-08-20 10:22:00

InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
Shaoshu Yang, Zhe Kong, Feng Gao, Meng Cheng, Xiangyu Liu, Yong Zhang, Zhuoliang Kang, Wenhan Luo, Xunliang Cai, Ran He, Xiaoming Wei
arxiv.org/abs/2508.14033

@fluchtkapsel@nerdculture.de
2025-10-20 21:54:29
Content warning: tech, linux, bash

What am I doing wrong? Why is the variable `b` empty?
```
# mkvmerge -F json -i example.mkv | jq -r '. | .tracks[] | select(.type == "audio") | select( .properties.default_track ) | .id, .properties.language'
1
eng
# read a b < <(mkvmerge -F json -i example.mkv | jq -r '. | .tracks[] | select(.type == "audio") | select( .properties.default_track ) | .id, .properties.language' )
# printf "a='%s', b='%s…

@jredlund@social.linux.pizza
2025-08-21 17:08:43

AVL Drumkits and New Jams
# music #improvisation I have been using the Hydrogen drum sequencer to make drum grooves. I was saving them as audio files because when I saved them as midi files, the drum mapping was usually off for the drum kit plugins I had in Reaper. The disadvantages of this approach are a) I can't change the tempo, and b) I can't modify the groove by addin…

@Techmeme@techhub.social
2025-09-18 14:35:45

Inside Apple's Audio Labs, where the company develops features like spatial audio and uses tuning studios as reference points for AirPods and other products (Lance Ulanoff/TechRadar)

@michaels@mstdn.nursing.unibas.ch
2025-10-17 17:53:56

Can you transcribe from audio files? #AI #mistralai

Screenshot reads:

can you transcribe from audio files?

Yes, I can transcribe audio files! If you upload an audio file (such as a recording, voice memo, or any audio clip), I can transcribe the spoken content into text for you. Just upload the file, and I'll handle the rest.

What file format?

I can transcribe audio files in common formats, including:
• MP3
• WAV
• M4A
• OGG
• FLAC
• AAC
• WebM (audio only)
If your file is in one of these formats, you can upload it, and I'll transcribe the au…
@arXiv_eessAS_bot@mastoxiv.page
2025-08-20 08:13:00

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
Sonal Kumar, \v{S}imon Sedl\'a\v{c}ek, Vaibhavi Lokegaonkar, Fernando L\'opez, Wenyi Yu, Nishit Anand, Hyeonggon Ryu, Lichang Chen, Maxim Pli\v{c}ka, Miroslav Hlav\'a\v{c}ek, William Fineas Ellingwood, Sathvik Udupa, Siyuan Hou, Allison Ferner, Sara Barahona, Cecilia Bola\~nos, Satish Rahi, Laura Herrera-Alarc\'on, Satvik Dixit, Siddhi Patil, Soham Deshmukh, Lasha K…

@ErikJonker@mastodon.social
2025-10-21 05:36:31

Trying out huxe.com/ , a tool for AI generated personal podcasts about anything you want, not perfect, but works great for certain categories, like a podcast about the great hackernews post of today or main AI industry headline, very convenient when commuting.

@Mediagazer@mstdn.social
2025-07-21 00:05:36

A look at the video podcasting surge; study: ~75% of podcast consumers play video episodes and ~30% play video episodes minimized or in background mode (Joseph Bernstein/New York Times)
nytimes.com/2025/07/20/style/p

@arXiv_csCR_bot@mastoxiv.page
2025-09-19 09:46:41

Threat Modeling for Enhancing Security of IoT Audio Classification Devices under a Secure Protocols Framework
Sergio Benlloch-Lopez, Miquel Viel-Vazquez, Javier Naranjo-Alcazar, Jordi Grau-Haro, Pedro Zuccarello
arxiv.org/abs/2509.14657

@lightweight@mastodon.nzoss.nz
2025-08-19 01:23:37

This is a very good interview - rnz.co.nz/national/programmes/ - Prof Alexandra Andhov is spot on: a) bigtech is more powerful than our gov…

@thomasfuchs@hachyderm.io
2025-10-15 16:44:47

PSA about common connectors on audio and phone equipment:
The "audio jack" on headphones etc. is called a "phone connector" (it was first used for phone switchboards all the way back in the 1800s).
The "phone jack" on telephones (looks like a smaller Ethernet jack) is called a "registered jack".
The "audio connector" on analog audio and video cables is called a "RCA plug".
Now you know jack shit.

@publicvoit@graz.social
2025-10-11 08:36:40

In case you need some #silence to relax: github.com/anars/blank-audio

@mariyadelano@hachyderm.io
2025-10-20 20:41:14

I’ve worked over the past year to reduce the amount of noise in my consciousness on a daily basis.
By that I mean - information noise, not literal sounds “noise”. (That problem was solved long ago by some good earplugs and noise canceling earphones.)
I’ve gotten used to spending less time on social media, regularly blocking most apps on my devices (anything with a feed news, most work communication apps, etc.), putting my phone and other devices aside for extended periods of time. Often go to work places with my iPad explicitly having its WiFi turned off and selecting cafes that don’t offer WiFi at all.
Negotiated better boundaries at work and in personal life where I exchange messages with people less often but try to make those interactions more meaningful, and people rarely expect me to respond to requests in less than 24 hours. Spent a lot of time setting up custom notification settings on all apps that would allow it, so I get fewer pings. With software, choosing fewer cloud-based options and using tools that are simple and require as few interruptions as possible.
Accustomed myself to lower-tech versions of doing things I like to do: reading on paper, writing by hand, drawing in physical sketchbooks, got a typewriter for typing without a screen. Choosing to call people on audio more, trying to make more of an effort to see people in person. Going to museums to look at art instead of browsing Pinterest. Defaulting to the library when looking for information.
I’m commenting on this now for two reasons:
1. I am pretty proud of myself for how much I’ve actually managed to reduce the constant stream of modern life esp. as a remote worker in tech!
2. Now that I’ve reached a breaking point of reducing enough noise that it’s NOTICEABLE - I am struck by the silence. I don’t know what to do with it. I don’t know how to navigate it and fill it. I made this space to be able to read and write and think more deeply - for now I feel stuck in limbo where I’m just reacquainting myself with the concept of having any space in my mind at all.

@arXiv_csCL_bot@mastoxiv.page
2025-09-17 09:16:00

MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models
Vijay Govindarajan, Pratik Patel, Sahil Tripathi, Md Azizul Hoque, Gautam Siddharth Kashyap
arxiv.org/abs/2509.12591

@muz4now@mastodon.world
2025-10-15 23:12:01

There's Nothing Fluffy About This Audio
#inspiration #Sounds4MusicMaking
muz4no…

@arXiv_csSD_bot@mastoxiv.page
2025-08-19 09:19:20

Towards Automatic Evaluation and High-Quality Pseudo-Parallel Dataset Construction for Audio Editing: A Human-in-the-Loop Method
Yuhang Jia, Hui Wang, Xin Nie, Yujie Guo, Lianru Gao, Yong Qin
arxiv.org/abs/2508.11966

@arXiv_csMM_bot@mastoxiv.page
2025-08-21 07:37:49

FakeHunter: Multimodal Step-by-Step Reasoning for Explainable Video Forensics
Chen Chen, Runze Li, Zejun Zhang, Pukun Zhao, Fanqing Zhou, Longxiang Wang, Haojian Huang
arxiv.org/abs/2508.14581

@arXiv_csSI_bot@mastoxiv.page
2025-08-19 08:31:30

MAD: A Benchmark for Multi-Turn Audio Dialogue Fact-Checking
Chaewan Chun, Lysandre Terrisse, Delvin Ce Zhang, Dongwon Lee
arxiv.org/abs/2508.12186

@vform@openbiblio.social
2025-09-03 19:49:11

Neue Audio-Sharing-Funktionen für Android dank LE Audio
stadt-bremerhaven.de/neue-audi
"Nach der Einführung von LE Audio Auracast für Hörgeräte zu Beginn des Jahres..."…

@arXiv_csAI_bot@mastoxiv.page
2025-08-20 11:23:47

Crosslisted article(s) found for cs.AI. arxiv.org/list/cs.AI/new
[4/6]:
- End-to-End Audio-Visual Learning for Cochlear Implant Sound Coding in Noisy Environments
Meng-Ping Lin, Enoch Hsin-Ho Huang, Shao-Yi Chien, Yu Tsao

@guerda@ruhr.social
2025-08-16 09:17:15

Cooles Feature. Ich bin kein großer AI Fan gleichzeitig sehe ich bei Transkription tatsächlich Potenzial. Und nahtlos und offline SRT Dateien erstellen zu lassen ist super.
FFmpeg 8.0 integriert Whisper: Lokale Audio-Transkription ohne Cloud | heise online

@jamesthebard@social.linux.pizza
2025-09-19 19:01:58

Okay, got the new version finished. The BNC connectors are for RGB Sync, and the other two BNC connectors were removed and replaced with a DB9 connector for the glory that is CGA. Kept the RCA audio port on the side of the board because audio will mostly go in directly to the PVM. Overall I'm much more satisfied with this layout and connector choice than the previous version.
#electronics

A CMPTR signal breakout board that exposes RGB via BNC connectors, CGA via a DB9 port, and a lone audio port on the side.
@arXiv_csSE_bot@mastoxiv.page
2025-08-21 09:31:00

Towards a DSL to Formalize Multimodal Requirements
Marcos Gomez-Vazquez, Jordi Cabot
arxiv.org/abs/2508.14631 arxiv.org/pdf/2508.14631

@malik@Mastodon.Social
2025-08-19 09:23:40

Braucht ihr Audio- oder Bühnenequipment? Wir kaufen viel bei DJ-checkpoint.de (vor allem Lampen, Kabel, aber die haben auch Mikrofone und alles andere)— mit dem Code DARKK gibt es da 5 % Rabatt (wir kriegen davon nichts). Chef Daniel ist einfach ein guter Typ, kann man supporten.
Vielleicht habt ihr ein paar Musikanten-FreundInnen, die das interessiert. 🚀

@arXiv_csSD_bot@mastoxiv.page
2025-09-19 09:44:21

Spatial Audio Motion Understanding and Reasoning
Arvind Krishna Sridhar, Yinyi Guo, Erik Visser
arxiv.org/abs/2509.14666 arxiv.org/pdf/2509…

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:25:01

Teacher-Guided Pseudo Supervision and Cross-Modal Alignment for Audio-Visual Video Parsing
Yaru Chen, Ruohao Guo, Liting Gao, Yang Xiang, Qingyu Luo, Zhenbo Li, Wenwu Wang
arxiv.org/abs/2509.14097

@cdamian@rls.social
2025-08-19 18:08:07

LEHMANN Audio 007 - Raphael Dincsoy b2b Tamara Wirth by Lehmann Club / Labor Lehmann
on.soundcloud.com/PrAOEQSO0h2u
Raphael Dincsoy:
SoundCloud:

@seeingwithsound@mas.to
2025-10-17 08:18:25

Does spatialized audio enhance the creation of mental representations? Spoiler: No (for their SnapStick-based setup) frontiersin.org/journals/neuro "seven blind individuals and se…

@v_i_o_l_a@openbiblio.social
2025-08-20 06:41:37

"DigiPop – eine Wissensplattform als Living Document"
doi.org/10.1515/abitech-2025-0
"DigiPop (Digitales Propädeutikum) ist eine innovative Online-Wissensplattform, die digitale Kompetenzen in den Geisteswissenschaften fördert. Sie bietet Studierenden und L…

@nobodyinperson@fosstodon.org
2025-08-19 17:27:21

What's the equivalent of #Immich but for audio and video files?
I'm looking for a self-hostable #FOSS thing that I can throw the rest of my media library in the face (movies, music, downloaded youtube videos, etc.) and it'll sort everything nicely, maybe even query databases to find metada…

@Techmeme@techhub.social
2025-07-21 00:01:14

A look at the video podcasting surge; study: ~75% of podcast consumers play video episodes and ~30% play video episodes minimized or in background mode (Joseph Bernstein/New York Times)
nytimes.com/2025/07/20/style/p

@danyork@mastodon.social
2025-08-18 09:28:18

New audio - What is happening with the Linux Professional Institute (LPI) certification program? Over the weekend I sat down with Matt Rice, LPI's Executive Director. Way back in 1998/99, I was one of the co-founders of LPI, but haven't been involved in 20 years until I was recently inducted into LPI's "Hall of Fellows". Matt visited me in Vermont to give me the physical award, and so I pulled out the microphones to learn more. 🙂

@ncoca@social.coop
2025-09-15 23:40:19

The last few days, #Slack, which I only use via a browser, keeps asking for video and audio permission, even though I'm not joining a call or using their audio feature.
Is slack trying to steal my #data and I'm only noticing because I monitor and track all such requests due to

@arXiv_eessAS_bot@mastoxiv.page
2025-09-19 09:28:31

Aligning Audio Captions with Human Preferences
Kartik Hegde, Rehana Mahfuz, Yinyi Guo, Erik Visser
arxiv.org/abs/2509.14659 arxiv.org/pdf/2…

@berlinbuzzwords@floss.social
2025-09-19 11:04:02

At Berlin Buzzwords 2025, Dhrubo Saha discussed how OpenSearch pipelines are integrating ML inference processors for powerful multi-modal search. Learn to search directly within images, audio, and text – locally on your own hardware!
Watch the full session: youtu.be/eg3NoKFxvzA?si=cP3o1p
Berlin Buzzwords returns on 7-9 June 2026! Get 36% off with our Trust Us Ticket: tickets.plainschwarz.com/bbuzz

@arXiv_csSD_bot@mastoxiv.page
2025-08-20 07:44:19

Leveraging Mamba with Full-Face Vision for Audio-Visual Speech Enhancement
Rong Chao, Wenze Ren, You-Jin Li, Kuo-Hsuan Hung, Sung-Feng Huang, Szu-Wei Fu, Wen-Huang Cheng, Yu Tsao
arxiv.org/abs/2508.13624

@heiseonline@social.heise.de
2025-09-11 09:15:00

YouTube: Multilanguage-Audio ab sofort für alle verfügbar
Zwei Jahre lang testete YouTube mit ausgewählten Kanälen das Bereitstellen eigener Übersetzungen. In Kürze sollen alle Creator die Funktion erhalten.

@arXiv_csCL_bot@mastoxiv.page
2025-08-20 08:09:29

Whispering Context: Distilling Syntax and Semantics for Long Speech Transcripts
Duygu Altinok
arxiv.org/abs/2508.13376 arxiv.org/pdf/2508.1…

@arXiv_csAI_bot@mastoxiv.page
2025-08-21 12:33:04

Replaced article(s) found for cs.AI. arxiv.org/list/cs.AI/new
[4/4]:
- When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs
Kim, Dingeto, Kwon, Choi, Lee, Park, Lee, Shin

@Techmeme@techhub.social
2025-10-21 18:10:52

Google Fi plans an AI call quality feature to filter out background sounds like wind next month, and will bring full RCS support on the web in December (Ryan Whitwam/Ars Technica)
arstechnica.com/gadgets/2025/1

@arXiv_csMM_bot@mastoxiv.page
2025-08-19 07:49:40

Ges-QA: A Multidimensional Quality Assessment Dataset for Audio-to-3D Gesture Generation
Zhilin Gao, Yunhao Li, Sijing Wu, Yuqin Cao, Huiyu Duan, Guangtao Zhai
arxiv.org/abs/2508.12020

@arXiv_csSD_bot@mastoxiv.page
2025-07-21 08:49:00

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe
arxiv.org/abs/2507.14129

@arXiv_csSD_bot@mastoxiv.page
2025-08-19 09:54:20

FoleySpace: Vision-Aligned Binaural Spatial Audio Generation
Lei Zhao, Rujin Chen, Chi Zhang, Xiao-Lei Zhang, Xuelong Li
arxiv.org/abs/2508.12918

@Mediagazer@mstdn.social
2025-09-04 19:55:55

The New York Times is shuttering its NYT Audio app, launched in May 2023, and integrating podcast archives and audio journalism into its News app (Mark Stenberg/Adweek)
adweek.com/media/new-york-time

@matthiasott@mastodon.social
2025-10-19 22:12:54

✍️ New post: Acoustic Room Treat­ment and Build­ing Sound Pan­els, Part 1: Planning
#blogtober #audio #recording #sound

@arXiv_eessAS_bot@mastoxiv.page
2025-08-20 07:53:40

End-to-End Audio-Visual Learning for Cochlear Implant Sound Coding in Noisy Environments
Meng-Ping Lin, Enoch Hsin-Ho Huang, Shao-Yi Chien, Yu Tsao
arxiv.org/abs/2508.13576

@jamesthebard@social.linux.pizza
2025-08-18 17:49:25

Finally cleaned up and reorganized the consoles along with some of the audio stuff. I really need to dust more often...
#retroconsoles

A picture showing (from left-to-right): a 19" audio rack with a mixer and microphone amp on top, a compresser/expander (dbx 286s), a shelf holding miniDiscs and a mini PC, a TASCAM MD-350 MiniDisc player, a 4U rack-mounted PC (my main rig), and a Sony MDS-E12 MiniDisc player.  A shelf full of retro consoles including an NES, Sega Master System, PS1, GBA (top shelf), SNES, Dreamcast, PS2, and Gamecube (middle shelf), Sega Saturn, Sega Genesis/Mega Drive (bottom shelf).  Beside the shelves are tw…
@arXiv_csSD_bot@mastoxiv.page
2025-07-21 07:45:40

Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis
Yixiao Zhang, Haonan Chen, Ju-Chiang Wang, Jitong Chen
arxiv.org/abs/2507.13572

@arXiv_csCL_bot@mastoxiv.page
2025-09-18 10:11:41

Audio-Based Crowd-Sourced Evaluation of Machine Translation Quality
Sami Ul Haq, Sheila Castilho, Yvette Graham
arxiv.org/abs/2509.14023 ar…

@arXiv_csSD_bot@mastoxiv.page
2025-09-19 10:13:11

Exploring How Audio Effects Alter Emotion with Foundation Models
Stelios Katsis, Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou
arxiv.org/abs/2509.15151

@arXiv_csMM_bot@mastoxiv.page
2025-09-19 08:28:41

MMED: A Multimodal Micro-Expression Dataset based on Audio-Visual Fusion
Junbo Wang, Yan Zhao, Shuo Li, Shibo Wang, Shigang Wang, Jian Wei
arxiv.org/abs/2509.14592

@arXiv_eessAS_bot@mastoxiv.page
2025-09-19 09:46:01

Mitigating data replication in text-to-audio generative diffusion models through anti-memorization guidance
Francisco Messina, Francesca Ronchini, Luca Comanducci, Paolo Bestagini, Fabio Antonacci
arxiv.org/abs/2509.14934

@arXiv_csCV_bot@mastoxiv.page
2025-08-12 12:48:03

StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation
Shuyuan Tu, Yueming Pan, Yinming Huang, Xintong Han, Zhen Xing, Qi Dai, Chong Luo, Zuxuan Wu, Yu-Gang Jiang
arxiv.org/abs/2508.08248

@Techmeme@techhub.social
2025-09-10 22:51:06

YouTube rolls out multi-language audio after a two-year pilot, saying creators saw 25% of watch time coming from views in their video's non-primary language (Lauren Forristal/TechCrunch)
techcrunch.com/2025/09/10/yout

@arXiv_csSD_bot@mastoxiv.page
2025-08-18 07:39:10

LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters
Haomin Zhang, Kristin Qi, Shuxin Yang, Zihao Chen, Chaofan Ding, Xinhan Di
arxiv.org/abs/2508.11074

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 10:29:31

Not in Sync: Unveiling Temporal Bias in Audio Chat Models
Jiayu Yao, Shenghua Liu, Yiwei Wang, Rundong Cheng, Lingrui Mei, Baolong Bi, Zhen Xiong, Xueqi Cheng
arxiv.org/abs/2510.12185

@arXiv_eessAS_bot@mastoxiv.page
2025-09-18 09:27:51

DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models
Kevin Wilkinghoff, Zheng-Hua Tan
arxiv.org/abs/2509.13927

@arXiv_csSD_bot@mastoxiv.page
2025-09-19 10:02:51

Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
Kentaro Seki, Yuki Okamoto, Kouei Yamaoka, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
arxiv.org/abs/2509.14785

@Techmeme@techhub.social
2025-08-07 21:46:03

Source: Meta has acquired WaveForms AI, which is working on AI that understands and mimics emotion in audio and debuted in December with a $40M seed led by a16z (Kalley Huang/The Information)
theinformation.com/articles/me

@arXiv_csSD_bot@mastoxiv.page
2025-09-17 10:02:09

Can Large Audio Language Models Understand Audio Well? Speech, Scene and Events Understanding Benchmark for LALMs
Han Yin, Jung-Woo Choi
arxiv.org/abs/2509.13148

@arXiv_csSD_bot@mastoxiv.page
2025-08-19 09:53:50

MATPAC : Enhanced Masked Latent Prediction for Self-Supervised Audio Representation Learning
Aurian Quelennec, Pierre Chouteau, Geoffroy Peeters, Slim Essid
arxiv.org/abs/2508.12709

@arXiv_eessAS_bot@mastoxiv.page
2025-08-18 08:34:00

ASAudio: A Survey of Advanced Spatial Audio Research
Zhiyuan Zhu, Yu Zhang, Wenxiang Guo, Changhao Pan, Zhou Zhao
arxiv.org/abs/2508.10924

@arXiv_csSD_bot@mastoxiv.page
2025-08-19 09:46:50

Cross-Modal Knowledge Distillation with Multi-Level Data Augmentation for Low-Resource Audio-Visual Sound Event Localization and Detection
Qing Wang, Ya Jiang, Hang Chen, Sabato Marco Siniscalchi, Jun Du, Jianqing Gao
arxiv.org/abs/2508.12334

@Techmeme@techhub.social
2025-09-10 11:11:16

Spotify plans to roll out lossless, offering 24-bit / 44.1 kHz FLAC audio, to all Premium users in the next two months, without needing a new higher-priced tier (Terrence O'Brien/The Verge)
theverge.com/spotify/775189/sp

@arXiv_eessAS_bot@mastoxiv.page
2025-09-19 09:33:21

DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
Ye-Xin Lu, Yu Gu, Kun Wei, Hui-Peng Du, Yang Ai, Zhen-Hua Ling
arxiv.org/abs/2509.14684

@arXiv_csSD_bot@mastoxiv.page
2025-09-18 08:25:51

RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
Liting Gao, Yi Yuan, Yaru Chen, Yuelan Cheng, Zhenbo Li, Juan Wen, Shubin Zhang, Wenwu Wang
arxiv.org/abs/2509.14003

@Techmeme@techhub.social
2025-09-18 00:31:08

Meta announces the Oakley Meta Vanguard, with a 12MP wide-angle camera, Garmin and Strava integrations, and immersive audio, launching on October 21 for $499 (Aisha Malik/TechCrunch)
techcrunch.com/2025/09/17/meta

@arXiv_eessAS_bot@mastoxiv.page
2025-09-19 09:27:51

Enhancing Situational Awareness in Wearable Audio Devices Using a Lightweight Sound Event Localization and Detection System
Jun-Wei Yeow, Ee-Leng Tan, Santi Peksi, Zhen-Ting Ong, Woon-Seng Gan
arxiv.org/abs/2509.14650

@arXiv_csSD_bot@mastoxiv.page
2025-09-18 08:43:31

Comprehensive Evaluation of CNN-Based Audio Tagging Models on Resource-Constrained Devices
Jordi Grau-Haro, Ruben Ribes-Serrano, Javier Naranjo-Alcazar, Marta Garcia-Ballesteros, Pedro Zuccarello
arxiv.org/abs/2509.14049

@arXiv_csSD_bot@mastoxiv.page
2025-08-18 08:01:00

Pretrained Conformers for Audio Fingerprinting and Retrieval
Kemal Altwlkany, Elmedin Selmanovic, Sead Delalic
arxiv.org/abs/2508.11609 arx…

@Techmeme@techhub.social
2025-09-03 18:15:55

Google Pixel 8 and newer models can now stream audio to multiple headphones at once with expanded Bluetooth LE Audio and Auracast support (Terrence O'Brien/The Verge)
theverge.com/news/770233/googl

@arXiv_eessAS_bot@mastoxiv.page
2025-09-19 09:49:21

Discrete optimal transport is a strong audio adversarial attack
Anton Selitskiy, Akib Shahriyar, Jishnuraj Prakasan
arxiv.org/abs/2509.14959

@arXiv_csSD_bot@mastoxiv.page
2025-08-19 09:06:59

Audio Flamingo Sound-CoT Technical Report: Improving Chain-of-Thought Reasoning in Sound Understanding
Zhifeng Kong, Arushi Goel, Joao Felipe Santos, Sreyan Ghosh, Rafael Valle, Wei Ping, Bryan Catanzaro
arxiv.org/abs/2508.11818

@arXiv_eessAS_bot@mastoxiv.page
2025-09-19 08:47:41

Diffusion-Based Unsupervised Audio-Visual Speech Separation in Noisy Environments with Noise Prior
Yochai Yemini, Rami Ben-Ari, Sharon Gannot, Ethan Fetaya
arxiv.org/abs/2509.14379

@arXiv_csSD_bot@mastoxiv.page
2025-08-19 09:25:39

Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection
Bing Han, Anbai Jiang, Xinhu Zheng, Wei-Qiang Zhang, Jia Liu, Pingyi Fan, Yanmin Qian
arxiv.org/abs/2508.12230

@arXiv_csSD_bot@mastoxiv.page
2025-10-15 08:43:22

UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
Jinchuan Tian, Sang-gil Lee, Zhifeng Kong, Sreyan Ghosh, Arushi Goel, Chao-Han Huck Yang, Wenliang Dai, Zihan Liu, Hanrong Ye, Shinji Watanabe, Mohammad Shoeybi, Bryan Catanzaro, Rafael Valle, Wei Ping
arxiv.org/abs/2510.12000

@arXiv_csSD_bot@mastoxiv.page
2025-09-19 10:09:51

Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic event Classification
Yuanjian Chen, Yang Xiao, Jinjie Huang
arxiv.org/abs/2509.14893

@arXiv_csSD_bot@mastoxiv.page
2025-09-19 09:33:01

Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis
Qingyu Liu, Yushen Chen, Zhikang Niu, Chunhui Wang, Yunting Yang, Bowen Zhang, Jian Zhao, Pengcheng Zhu, Kai Yu, Xie Chen
arxiv.org/abs/2509.14579

@arXiv_csSD_bot@mastoxiv.page
2025-08-18 07:36:20

Perturbed Public Voices (P$^{2}$V): A Dataset for Robust Audio Deepfake Detection
Chongyang Gao, Marco Postiglione, Isabel Gortner, Sarit Kraus, V. S. Subrahmanian
arxiv.org/abs/2508.10949

@arXiv_csSD_bot@mastoxiv.page
2025-10-15 08:13:41

SeeingSounds: Learning Audio-to-Visual Alignment via Text
Simone Carnemolla, Matteo Pennisi, Chiara Russo, Simone Palazzo, Daniela Giordano, Concetto Spampinato
arxiv.org/abs/2510.11738

@arXiv_csSD_bot@mastoxiv.page
2025-09-16 07:37:56

Combining Audio and Non-Audio Inputs in Evolved Neural Networks for Ovenbird
Sergio Poo Hernandez, Vadim Bulitko, Erin Bayne
arxiv.org/abs/2509.10566

@arXiv_csSD_bot@mastoxiv.page
2025-10-14 11:35:48

Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
Kuan-Yi Lee, Tsung-En Lin, Hung-Yi Lee
arxiv.org/abs/2510.11454

@arXiv_csSD_bot@mastoxiv.page
2025-10-15 08:56:42

Audio Palette: A Diffusion Transformer with Multi-Signal Conditioning for Controllable Foley Synthesis
Junnuo Wang
arxiv.org/abs/2510.12175

@arXiv_csSD_bot@mastoxiv.page
2025-08-19 09:02:50

Prediction of Spotify Chart Success Using Audio and Streaming Features
Ian Jacob Cabansag, Paul Ntegeka
arxiv.org/abs/2508.11632 arxiv.org/…

@arXiv_csSD_bot@mastoxiv.page
2025-09-19 08:00:31

A long-form single-speaker real-time MRI speech dataset and benchmark
Sean Foley, Jihwan Lee, Kevin Huang, Xuan Shi, Yoonjeong Lee, Louis Goldstein, Shrikanth Narayanan
arxiv.org/abs/2509.14479

@arXiv_csSD_bot@mastoxiv.page
2025-08-20 10:30:18

Crosslisted article(s) found for cs.SD. arxiv.org/list/cs.SD/new
[1/1]:
- End-to-End Audio-Visual Learning for Cochlear Implant Sound Coding in Noisy Environments
Meng-Ping Lin, Enoch Hsin-Ho Huang, Shao-Yi Chien, Yu Tsao

@arXiv_csSD_bot@mastoxiv.page
2025-10-15 08:31:42

Audio-Guided Visual Perception for Audio-Visual Navigation
Yi Wang, Yinfeng Yu, Fuchun Sun, Liejun Wang, Wendong Zheng
arxiv.org/abs/2510.11760

@arXiv_csSD_bot@mastoxiv.page
2025-09-16 09:16:46

Neural Audio Codecs for Prompt-Driven Universal Source Separation
Adhiraj Banerjee, Vipul Arora
arxiv.org/abs/2509.11717 arxiv.org/pdf/2509…

@arXiv_csSD_bot@mastoxiv.page
2025-09-16 07:41:06

STASE: A spatialized text-to-audio synthesis engine for music generation
Tutti Chi, Letian Gao, Yixiao Zhang
arxiv.org/abs/2509.11124 arxiv…

@arXiv_csSD_bot@mastoxiv.page
2025-09-16 07:39:46

Emoanti: audio anti-deepfake with refined emotion-guided representations
Xiaokang Li, Yicheng Gong, Dinghao Zou, Xin Cao, Sunbowen Lee
arxiv.org/abs/2509.10781

@arXiv_csSD_bot@mastoxiv.page
2025-10-14 10:45:08

MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Xintong Hu, Yu Zhang, Li Tang, Rui Yang, Han Wang, Zongbao Zhang, Yuhan Wang, Yixuan Chen, Hankun Xu, Ke Xu, Pengfei Fan, Zhetao Chen, Yanhao Yu, Qiange Huang, Fei Wu, Zhou Zhao
arxiv.org/abs/2510.1039…