Tootfinder

@inthehands@hachyderm.io
2025-06-16 01:35:43

❝Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels. These results raise concerns about the long-term educational implications of LLM reliance and underscore the need for deeper inquiry into AI's role in learning.❞
Hell of a research abstract there, via @…: https://fediscience.org/@gwagner/114690366530883451

@dichotomiker@dresden.network
2025-06-16 07:45:56

"Brain-only participants exhibited the strongest, most distributed networks; Search Engine users showed moderate engagement; and LLM users displayed the weakest connectivity."
"LLM users also struggled to accurately quote their own work."
"Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels."

Scene from Idiocracy (2013): An underperforming Oval Office advisor gazes thoughtfully into a glass ball, displaying a rather average level of brightness.

@arXiv_csSD_bot@mastoxiv.page
2025-06-12 07:57:21

Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction
Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li
https://arxiv.org/abs/2506.09792

Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction
Audio-visual target speaker extraction (AV-TSE) models primarily rely on target visual cues to isolate the target speaker's voice from others. We know that humans leverage linguistic knowledge, such as syntax and semantics, to support speech perception. Inspired by this, we explore the potential of pre-trained speech-language models (PSLMs) and pre-trained language models (PLMs) as auxiliary knowledge sources for AV-TSE. In this study, we propose incorporating the linguistic constraints from PS…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-12 08:05:21

You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks
\"Unal Ege Gaznepoglu, Anna Leschanowsky, Ahmad Aloradi, Prachi Singh, Daniel Tenbrinck, Emanu\"el A. P. Habets, Nils Peters
https://arxiv.org/abs/2506.09521

You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks
Speaker anonymization systems hide the identity of speakers while preserving other information such as linguistic content and emotions. To evaluate their privacy benefits, attacks in the form of automatic speaker verification (ASV) systems are employed. In this study, we assess the impact of intra-speaker linguistic content similarity in the attacker training and evaluation datasets, by adapting BERT, a language model, as an ASV system. On the VoicePrivacy Attacker Challenge datasets, our metho…

@trochee@dair-community.social
2025-06-12 00:21:24

OH in corpus-linguistic UX convo:
> He who go too far down long tail end up wagging dog

@arXiv_eessSY_bot@mastoxiv.page
2025-06-11 09:07:15

Linguistic Ordered Weighted Averaging based deep learning pooling for fault diagnosis in a wastewater treatment plant
Alicia Beneyto-Rodriguez, Gregorio I. Sainz-Palmero, Marta Galende-Hern\'andez, Mar\'ia J. Fuente
https://arxiv.org/abs/2506.08676

Linguistic Ordered Weighted Averaging based deep learning pooling for fault diagnosis in a wastewater treatment plant
Nowadays, water reuse is a serious challenge to help address water shortages. Here, the wastewater treatment plants (WWTP) play a key role, and its proper operation is mandatory. So, fault diagnosis is a key activity for these plants. Their high complexity and large-scale require of smart methodologies for that fault diagnosis and safety operation. All these large-scale and complex industrial processes are monitored, allowing the data collection about the plant operation, so data driven approac…

@dcm@social.sunet.se
2025-06-10 13:17:04

A new, updated, streamlined, and generally improved version of The Vector Grounding Problem paper, joint work by @… and me on the meaningfulness or else of LLM outputs and internal representations is now available on ArXiv.

The Vector Grounding Problem
The remarkable performance of large language models (LLMs) on complex linguistic tasks has sparked debate about their capabilities. Unlike humans, these models learn language solely from textual data without directly interacting with the world. Yet they generate seemingly meaningful text on diverse topics. This achievement has renewed interest in the classical `Symbol Grounding Problem' -- the question of whether the internal representations and outputs of symbolic AI systems can possess intrin…

@arXiv_csHC_bot@mastoxiv.page
2025-06-13 07:39:20

Intergenerational AI Literacy in Korean Immigrant Families: Interpretive Gatekeeping Meets Convenient Critical Deferment
Jeongone Seo, Ryan Womack, Tawfiq Ammari
https://arxiv.org/abs/2506.10197

Intergenerational AI Literacy in Korean Immigrant Families: Interpretive Gatekeeping Meets Convenient Critical Deferment
As artificial intelligence (AI) becomes deeply integrated into family life, immigrant families must navigate unique intergenerational, linguistic, and cultural challenges. This study examines how Korean immigrant families in the United States negotiate the use of AI tools such as ChatGPT and smart assistants in their homes. Through 20 semi-structured interviews with parents and teens, we identify two key practices that shape their engagement: interpretive gatekeeping, where parents mediate thei…

@arXiv_csCV_bot@mastoxiv.page
2025-06-10 19:09:11

This https://arxiv.org/abs/2506.03589 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCV_…

BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance
Text-video retrieval (TVR) systems often suffer from visual-linguistic biases present in datasets, which cause pre-trained vision-language models to overlook key details. To address this, we propose BiMa, a novel framework designed to mitigate biases in both visual and textual representations. Our approach begins by generating scene elements that characterize each video by identifying relevant entities/objects and activities. For visual debiasing, we integrate these scene elements into the vide…

@arXiv_csCY_bot@mastoxiv.page
2025-06-06 07:17:42

Early linguistic fingerprints of online users who engage with conspiracy communities
Francesco Corso, Giuseppe Russo, Francesco Pierri, Gianmarco De Francisci Morales
https://arxiv.org/abs/2506.05086

Early linguistic fingerprints of online users who engage with conspiracy communities
Online social media platforms are often seen as catalysts for radicalization, as they provide spaces where extreme beliefs can take root and spread, sometimes leading to real-world consequences. Conspiracy theories represent a specific form of radicalization that is notoriously resistant to online moderation strategies. One explanation for this resilience is the presence of a "conspiratorial mindset", a cognitive framework that fundamentally shapes how conspiracy believers perceive reality. How…

@arXiv_csMM_bot@mastoxiv.page
2025-06-04 07:23:10

StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
Fengjin Li, Jie Wang, Yadong Niu, Yongqing Wang, Meng Meng, Jian Luan, Zhiyong Wu
https://arxiv.org/abs/2506.02414

StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
Voice Conversion (VC) modifies speech to match a target speaker while preserving linguistic content. Traditional methods usually extract speaker information directly from speech while neglecting the explicit utilization of linguistic content. Since VC fundamentally involves disentangling speaker identity from linguistic content, leveraging structured semantic features could enhance conversion performance. However, previous attempts to incorporate semantic features into VC have shown limited eff…

@tschfflr@fediscience.org
2025-06-04 07:19:14

Students, THIS is how generative AI (ChatGPT et alia) "reads" papers and "understands" content. It's a bullshit machine, a gaslighting machine. It shows the linguistic behavior of a psychopath (is this what us humans average to, if one trains on all our "content" and online behavior?). Yikes.
https://amandaguinzburg.substack.com/p/diabolus-ex-machina

@hakona@im.alstadheim.no
2025-06-03 20:32:38

We have in fact encountered similar things to "linguistic sequence processing". Bullshit artists, politicians with the "truthiness" of Ronald Reagan (coined by a comedian at the time). I have had personal experience being a rabbit in a spotlight where this kind of "thinking" kicks in. Got out of that game, thankfully.

@arXiv_csAI_bot@mastoxiv.page
2025-06-05 09:36:37

This https://arxiv.org/abs/2502.00698 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…

MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
IQ testing has served as a foundational methodology for evaluating human cognitive capabilities, deliberately decoupling assessment from linguistic background, language proficiency, or domain-specific knowledge to isolate core competencies in abstraction and reasoning. Yet, artificial intelligence research currently lacks systematic benchmarks to quantify these critical cognitive capabilities in multimodal systems. To address this crucial gap, we propose MM-IQ, a comprehensive evaluation framew…

@arXiv_csSD_bot@mastoxiv.page
2025-06-09 07:54:52

Voice Impression Control in Zero-Shot TTS
Keinichi Fujita, Shota Horiguchi, Yusuke Ijima
https://arxiv.org/abs/2506.05688 https://arx…

Voice Impression Control in Zero-Shot TTS
Para-/non-linguistic information in speech is pivotal in shaping the listeners' impression. Although zero-shot text-to-speech (TTS) has achieved high speaker fidelity, modulating subtle para-/non-linguistic information to control perceived voice characteristics, i.e., impressions, remains challenging. We have therefore developed a voice impression control method in zero-shot TTS that utilizes a low-dimensional vector to represent the intensities of various voice impression pairs (e.g., dark-bri…

@arXiv_csLG_bot@mastoxiv.page
2025-06-03 08:21:31

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Mustafa Shukor, Dana Aubakirova, Francesco Capuano, Pepijn Kooijmans, Steven Palma, Adil Zouitine, Michel Aractingi, Caroline Pascal, Martino Russi, Andres Marafioti, Simon Alibert, Matthieu Cord, Thomas Wolf, Remi Cadene
https://arxiv.org/abs/2506.018…

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Vision-language models (VLMs) pretrained on large-scale multimodal datasets encode rich visual and linguistic knowledge, making them a strong foundation for robotics. Rather than training robotic policies from scratch, recent approaches adapt VLMs into vision-language-action (VLA) models that enable natural language-driven perception and control. However, existing VLAs are typically massive--often with billions of parameters--leading to high training costs and limited real-world deployability. …

@tanyakaroli@expressional.social
2025-05-29 11:33:07

Den fŸlelse når man er fan af en ny podcast - og så selv bliver inviteret på den! 🤓🎉
https://podcasts.apple.com/dk/podcast/writing-wrongs/id1797795962?l=da

Writing Wrongs
True crime · Månedligt · Every sentence tells a story, every word leaves a trace. Writing Wrongs, from the Aston Institute for Forensic Linguistics, explores historic and contemporary forensic linguistic cases. Hosts Profes...

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:20:50

From Guidelines to Practice: A New Paradigm for Arabic Language Model Evaluation
Serry Sibaee, Omer Nacar, Adel Ammar, Yasser Al-Habashi, Abdulrahman Al-Batati, Wadii Boulila
https://arxiv.org/abs/2506.01920

From Guidelines to Practice: A New Paradigm for Arabic Language Model Evaluation
This paper addresses critical gaps in Arabic language model evaluation by establishing comprehensive theoretical guidelines and introducing a novel evaluation framework. We first analyze existing Arabic evaluation datasets, identifying significant issues in linguistic accuracy, cultural alignment, and methodological rigor. To address these limitations in LLMs, we present the Arabic Depth Mini Dataset (ADMD), a carefully curated collection of 490 challenging questions spanning ten major domains …

@arXiv_csSD_bot@mastoxiv.page
2025-06-11 08:08:45

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang, Zixin Zhang, Bin Wang, Bo Li, Buyun Ma, Changxin Miao, Changyi Wan, Chen Xu, Dapeng Shi, Dingyuan Hu, Enle…

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a dual-codebook audio tokenizer for linguistic and semantic feature extraction, a 130-billion-parameter…

@arXiv_csAI_bot@mastoxiv.page
2025-06-05 09:45:09

This https://arxiv.org/abs/2506.02139 has been replaced.
link: https://scholar.google.com/scholar?q=a

The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning
Few-shot learning in large language models (LLMs) reveals a core paradox: certain tasks generalize from just a few examples, while others demand extensive supervision. To explain this, we introduce the Unified Cognitive Consciousness Theory (UCCT), which reconceptualizes LLMs not as deficient agents, but as unconscious substrates: dense, distributed repositories of linguistic and conceptual patterns that operate without explicit semantics, intention, or goal-directed reasoning. Under this view,…

@arXiv_astrophIM_bot@mastoxiv.page
2025-06-04 07:45:38

An Exploratory Framework for Future SETI Applications: Detecting Generative Reactivity via Language Models
Po-Chieh Yu
#toXiv_bot_toot

@arXiv_eessAS_bot@mastoxiv.page
2025-06-10 08:39:12

Rhythm Features for Speaker Identification
Nick Mehlman, Thomas Thebaud, Dani Byrd, Shri Narayanan
https://arxiv.org/abs/2506.06834 https://

Rhythm Features for Speaker Identification
While deep learning models have demonstrated robust performance in speaker recognition tasks, they primarily rely on low-level audio features learned empirically from spectrograms or raw waveforms. However, prior work has indicated that idiosyncratic speaking styles heavily influence the temporal structure of linguistic units in speech signals (rhythm). This makes rhythm a strong yet largely overlooked candidate for a speech identity feature. In this paper, we test this hypothesis by applying d…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-10 16:49:59

This https://arxiv.org/abs/2410.00527 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

Wanna hear your voice? A sample is all we need!
Research on audio clue-based target speaker extraction (TSE) has focused on modeling mixtures and reference speech, achieving strong results in English due to abundant datasets. However, cross-lingual properties remain underexplored, as low-resource languages face challenges from limited annotated data and linguistic resources. To bridge this gap, we propose WHYV (Wanna Hear Your Voice), a cross-lingual TSE framework enabling zero-shot adaptation without fine-tuning. WHYV employs a frequency-mo…

@arXiv_csMM_bot@mastoxiv.page
2025-06-02 09:58:31

This https://arxiv.org/abs/2505.23018 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csMM_…

EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations
In recent years, emotion recognition plays a critical role in applications such as human-computer interaction, mental health monitoring, and sentiment analysis. While datasets for emotion analysis in languages such as English have proliferated, there remains a pressing need for high-quality, comprehensive datasets tailored to the unique linguistic, cultural, and multimodal characteristics of Chinese. In this work, we propose \textbf{EmotionTalk}, an interactive Chinese multimodal emotion datase…

@arXiv_csSD_bot@mastoxiv.page
2025-06-04 07:33:42

Breaking the Barriers of Text-Hungry and Audio-Deficient AI
Hamidou Tembine, Issa Bamia, Massa NDong, Bakary Coulibaly, Oumar Issiaka Traore, Moussa Traore, Moussa Sanogo, Mamadou Eric Sangare, Salif Kante, Daryl Noupa Yongueng, Hafiz Tiomoko Ali, Malik Tiomoko, Frejus Laleye, Boualem Djehiche, Wesmanegda Elisee Dipama, Idris Baba Saje, Hammid Mohammed Ibrahim, Moumini Sanogo, Marie Coursel Nininahazwe, Abdul-Latif Siita, Haine Mhlongo, Teddy Nelvy Dieu Merci Kouka, Mariam Serine Jerid…

Breaking the Barriers of Text-Hungry and Audio-Deficient AI
While global linguistic diversity spans more than 7164 recognized languages, the current dominant architecture of machine intelligence remains fundamentally biased toward written text. This bias excludes over 700 million people particularly in rural and remote regions who are audio-literate. In this work, we introduce a fully textless, audio-to-audio machine intelligence framework designed to serve this underserved population, and all the people who prefer audio-efficiency. Our contributions in…

@arXiv_csMM_bot@mastoxiv.page
2025-05-30 09:54:17

This https://arxiv.org/abs/2503.01879 has been replaced.
link: https://scholar.google.com/scholar?q=a

Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
This work proposes an industry-level omni-modal large language model (LLM) pipeline that integrates auditory, visual, and linguistic modalities to overcome challenges such as limited tri-modal datasets, high computational costs, and complex feature alignments. Our pipeline consists of three main components: First, a modular framework enabling flexible configuration of various encoder-LLM-decoder architectures. Second, a lightweight training strategy that pre-trains audio-language alignment on t…

@arXiv_csSD_bot@mastoxiv.page
2025-06-05 07:21:48

Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion
Seymanur Akti, Tuan Nam Nguyen, Alexander Waibel
https://arxiv.org/abs/2506.04013

Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion
Expressive voice conversion aims to transfer both speaker identity and expressive attributes from a target speech to a given source speech. In this work, we improve over a self-supervised, non-autoregressive framework with a conditional variational autoencoder, focusing on reducing source timbre leakage and improving linguistic-acoustic disentanglement for better style transfer. To minimize style leakage, we use multilingual discrete speech units for content representation and reinforce embeddi…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-03 16:15:00

This https://arxiv.org/abs/2409.03636 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

ZSDEVC: Zero-Shot Diffusion-based Emotional Voice Conversion with Disentangled Mechanism
The human voice conveys not just words but also emotional states and individuality. Emotional voice conversion (EVC) modifies emotional expressions while preserving linguistic content and speaker identity, improving applications like human-machine interaction. While deep learning has advanced EVC models for specific target speakers on well-crafted emotional datasets, existing methods often face issues with emotion accuracy and speech distortion. In addition, the zero-shot scenario, in which emo…

@arXiv_csMM_bot@mastoxiv.page
2025-05-30 07:19:35

EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations
Haoqin Sun, Xuechen Wang, Jinghua Zhao, Shiwan Zhao, Jiaming Zhou, Hui Wang, Jiabei He, Aobo Kong, Xi Yang, Yequan Wang, Yonghua Lin, Yong Qin
https://arxiv.org/abs/2505.23018

EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations
In recent years, emotion recognition plays a critical role in applications such as human-computer interaction, mental health monitoring, and sentiment analysis. While datasets for emotion analysis in languages such as English have proliferated, there remains a pressing need for high-quality, comprehensive datasets tailored to the unique linguistic, cultural, and multimodal characteristics of Chinese. In this work, we propose \textbf{EmotionTalk}, an interactive Chinese multimodal emotion datase…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-02 10:04:35

This https://arxiv.org/abs/2505.15004 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
Emotion plays a significant role in speech interaction, conveyed through tone, pitch, and rhythm, enabling the expression of feelings and intentions beyond words to create a more personalized experience. However, most existing speaker anonymization systems employ parallel disentanglement methods, which only separate speech into linguistic content and speaker identity, often neglecting the preservation of the original emotional state. In this study, we introduce EASY, an emotion-aware speaker an…

Tootfinder

Opt-in global Mastodon full text search. Join the index!