Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@mgorny@social.treehouse.systems
2025-07-23 05:59:20

I'm not opposed to neologisms. To the contrary, I do love them, sometimes coining my own or adapting happily. That is, as long as they make the language richer, or perhaps more precise.
What I truly hate is the modern goo that people are speaking, because they don't know their own language well. The business newspeak, so to say.
This is especially bad in Polish where people are randomly polonizing English words for no reason at all.

@arXiv_csCL_bot@mastoxiv.page
2025-07-21 09:53:50

Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic
Lilit Grigoryan, Nikolay Karpov, Enas Albasiri, Vitaly Lavrukhin, Boris Ginsburg
arxiv.org/abs/2507.13977

@arXiv_csCL_bot@mastoxiv.page
2025-06-18 09:16:21

A Variational Framework for Improving Naturalness in Generative Spoken Language Models
Li-Wei Chen, Takuya Higuchi, Zakaria Aldeneh, Ahmed Hussen Abdelaziz, Alexander Rudnicky
arxiv.org/abs/2506.14767

@lysander07@sigmoid.social
2025-05-12 08:39:14

Last leg on our brief history of NLP (so far) is the advent of large language models with GPT-3 in 2020 and the introduction of learning from the prompt (aka few-shot learning).
T. B. Brown et al. (2020). Language models are few-shot learners. NIPS'20

Slide from Information System Engineering 2025 lecture, 02 - Natural Language Processing 01, A brief history of NLP, NLP Timeline.
The NLP timeline is in the middle of the page from top to bottom. The marker is at 2020. On the left side, an original screenshot of GPT-3 is shown, giving advise on how to present a talk about "Symbolic and Subsymbolic AI - An Epic Dilemma?".
The right side holds the following text: 
2020: GPT-3 was released by OpenAI, based on 45TB data crawled from the web. A “da…
@tiotasram@kolektiva.social
2025-07-19 07:51:05

AI, AGI, and learning efficiency
My 4-month-old kid is not DDoSing Wikipedia right now, nor will they ever do so before learning to speak, read, or write. Their entire "training corpus" will not top even 100 million "tokens" before they can speak & understand language, and do so with real intentionally.
Just to emphasize that point: 100 words-per-minute times 60 minutes-per-hour times 12 hours-per-day times 365 days-per-year times 4 years is a mere 105,120,000 words. That's a ludicrously *high* estimate of words-per-minute and hours-per-day, and 4 years old (the age of my other kid) is well after basic speech capabilities are developed in many children, etc. More likely the available "training data" is at least 1 or 2 orders of magnitude less than this.
The point here is that large language models, trained as they are on multiple *billions* of tokens, are not developing their behavioral capabilities in a way that's remotely similar to humans, even if you believe those capabilities are similar (they are by certain very biased ways of measurement; they very much aren't by others). This idea that humans must be naturally good at acquiring language is an old one (see e.g. #AI #LLM #AGI

@arXiv_csNE_bot@mastoxiv.page
2025-07-17 09:22:20

Simulated Language Acquisition in a Biologically Realistic Model of the Brain
Daniel Mitropolsky, Christos Papadimitriou
arxiv.org/abs/2507.11788

@arXiv_csCY_bot@mastoxiv.page
2025-06-03 07:20:20

SafeCOMM: What about Safety Alignment in Fine-Tuned Telecom Large Language Models?
Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche, Walid Saad
arxiv.org/abs/2506.00062

@arXiv_csMM_bot@mastoxiv.page
2025-07-16 07:36:31

MultiVox: Benchmarking Voice Assistants for Multimodal Interactions
Ramaneswaran Selvakumar, Ashish Seth, Nishit Anand, Utkarsh Tyagi, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha
arxiv.org/abs/2507.10859

@arXiv_csCL_bot@mastoxiv.page
2025-07-02 09:54:30

Natural language processing for African languages
David Ifeoluwa Adelani
arxiv.org/abs/2507.00297 arxiv.org/pdf/2507.…

@arXiv_csCL_bot@mastoxiv.page
2025-07-10 09:54:51

Elite Polarization in European Parliamentary Speeches: a Novel Measurement Approach Using Large Language Models
Gennadii Iakovlev
arxiv.org/abs/2507.06658