Tootfinder

No exact results. Similar results found.

@arXiv_csSD_bot@mastoxiv.page
2025-06-24 10:55:20

Smooth Operators: LLMs Translating Imperfect Hints into Disfluency-Rich Transcripts
Duygu Altinok
https://arxiv.org/abs/2506.18510 https://

Smooth Operators: LLMs Translating Imperfect Hints into Disfluency-Rich Transcripts
Accurate detection of disfluencies in spoken language is crucial for enhancing the performance of automatic speech and language processing systems, as well as fostering the development of more inclusive speech and language technologies. Leveraging the growing trend of large language models (LLMs) as versatile learners capable of processing both lexical and non-lexical inputs (e.g., audio and video), we propose a novel approach to transcribing disfluencies as explicit tokens with timestamps, ena…

@mgorny@social.treehouse.systems
2025-07-23 05:59:20

I'm not opposed to neologisms. To the contrary, I do love them, sometimes coining my own or adapting happily. That is, as long as they make the language richer, or perhaps more precise.
What I truly hate is the modern goo that people are speaking, because they don't know their own language well. The business newspeak, so to say.
This is especially bad in Polish where people are randomly polonizing English words for no reason at all.

@arXiv_csCL_bot@mastoxiv.page
2025-07-21 09:53:50

Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic
Lilit Grigoryan, Nikolay Karpov, Enas Albasiri, Vitaly Lavrukhin, Boris Ginsburg
https://arxiv.org/abs/2507.13977

Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic
Despite Arabic being one of the most widely spoken languages, the development of Arabic Automatic Speech Recognition (ASR) systems faces significant challenges due to the language's complexity, and only a limited number of public Arabic ASR models exist. While much of the focus has been on Modern Standard Arabic (MSA), there is considerably less attention given to the variations within the language. This paper introduces a universal methodology for Arabic speech and text processing designed to …

@lysander07@sigmoid.social
2025-05-12 08:39:14

Last leg on our brief history of NLP (so far) is the advent of large language models with GPT-3 in 2020 and the introduction of learning from the prompt (aka few-shot learning).
T. B. Brown et al. (2020). Language models are few-shot learners. NIPS'20
https://…

Slide from Information System Engineering 2025 lecture, 02 - Natural Language Processing 01, A brief history of NLP, NLP Timeline.
The NLP timeline is in the middle of the page from top to bottom. The marker is at 2020. On the left side, an original screenshot of GPT-3 is shown, giving advise on how to present a talk about "Symbolic and Subsymbolic AI - An Epic Dilemma?".
The right side holds the following text:
2020: GPT-3 was released by OpenAI, based on 45TB data crawled from the web. A “da…

@arXiv_csCL_bot@mastoxiv.page
2025-06-18 09:16:21

A Variational Framework for Improving Naturalness in Generative Spoken Language Models
Li-Wei Chen, Takuya Higuchi, Zakaria Aldeneh, Ahmed Hussen Abdelaziz, Alexander Rudnicky
https://arxiv.org/abs/2506.14767

A Variational Framework for Improving Naturalness in Generative Spoken Language Models
The success of large language models in text processing has inspired their adaptation to speech modeling. However, since speech is continuous and complex, it is often discretized for autoregressive modeling. Speech tokens derived from self-supervised models (known as semantic tokens) typically focus on the linguistic aspects of speech but neglect prosodic information. As a result, models trained on these tokens can generate speech with reduced naturalness. Existing approaches try to fix this by…

@arXiv_csNE_bot@mastoxiv.page
2025-07-17 09:22:20

Simulated Language Acquisition in a Biologically Realistic Model of the Brain
Daniel Mitropolsky, Christos Papadimitriou
https://arxiv.org/abs/2507.11788 h…

Simulated Language Acquisition in a Biologically Realistic Model of the Brain
Despite tremendous progress in neuroscience, we do not have a compelling narrative for the precise way whereby the spiking of neurons in our brain results in high-level cognitive phenomena such as planning and language. We introduce a simple mathematical formulation of six basic and broadly accepted principles of neuroscience: excitatory neurons, brain areas, random synapses, Hebbian plasticity, local inhibition, and inter-area inhibition. We implement a simulated neuromorphic system based on t…

@tiotasram@kolektiva.social
2025-07-19 07:51:05

AI, AGI, and learning efficiency
My 4-month-old kid is not DDoSing Wikipedia right now, nor will they ever do so before learning to speak, read, or write. Their entire "training corpus" will not top even 100 million "tokens" before they can speak & understand language, and do so with real intentionally.
Just to emphasize that point: 100 words-per-minute times 60 minutes-per-hour times 12 hours-per-day times 365 days-per-year times 4 years is a mere 105,120,000 words. That's a ludicrously *high* estimate of words-per-minute and hours-per-day, and 4 years old (the age of my other kid) is well after basic speech capabilities are developed in many children, etc. More likely the available "training data" is at least 1 or 2 orders of magnitude less than this.
The point here is that large language models, trained as they are on multiple *billions* of tokens, are not developing their behavioral capabilities in a way that's remotely similar to humans, even if you believe those capabilities are similar (they are by certain very biased ways of measurement; they very much aren't by others). This idea that humans must be naturally good at acquiring language is an old one (see e.g. #AI #LLM #AGI

@arXiv_csCY_bot@mastoxiv.page
2025-06-03 07:20:20

SafeCOMM: What about Safety Alignment in Fine-Tuned Telecom Large Language Models?
Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche, Walid Saad
https://arxiv.org/abs/2506.00062

SafeCOMM: What about Safety Alignment in Fine-Tuned Telecom Large Language Models?
Fine-tuning large language models (LLMs) for telecom tasks and datasets is a common practice to adapt general-purpose models to the telecom domain. However, little attention has been paid to how this process may compromise model safety. Recent research has shown that even benign fine-tuning can degrade the safety alignment of LLMs, causing them to respond to harmful or unethical user queries. In this paper, we investigate this issue for telecom-tuned LLMs using three representative datasets fea…

@arXiv_csCL_bot@mastoxiv.page
2025-07-02 09:54:30

Natural language processing for African languages
David Ifeoluwa Adelani
https://arxiv.org/abs/2507.00297 https://arxiv.org/pdf/2507.…

Natural language processing for African languages
Recent advances in word embeddings and language models use large-scale, unlabelled data and self-supervised learning to boost NLP performance. Multilingual models, often trained on web-sourced data like Wikipedia, face challenges: few low-resource languages are included, their data is often noisy, and lack of labeled datasets makes it hard to evaluate performance outside high-resource languages like English. In this dissertation, we focus on languages spoken in Sub-Saharan Africa where all the …

@arXiv_csCL_bot@mastoxiv.page
2025-07-10 09:54:51

Elite Polarization in European Parliamentary Speeches: a Novel Measurement Approach Using Large Language Models
Gennadii Iakovlev
https://arxiv.org/abs/2507.06658

Elite Polarization in European Parliamentary Speeches: a Novel Measurement Approach Using Large Language Models
This project introduces a new measure of elite polarization via actor and subject detection using artificial intelligence. I identify when politicians mention one another in parliamentary speeches, note who is speaking and who is being addressed, and assess the emotional temperature behind these evaluations. This maps how elites evaluate their various out-parties, allowing us to create an index of mutual out-party hostility, that is, elite polarization. While I analyzed polarization data over t…

Tootfinder

Opt-in global Mastodon full text search. Join the index!