Tootfinder

@EgorKotov@datasci.social
2025-09-10 08:59:43

From a Q2 (according to Elsevier Scopus ) journal asking for a review. They are not even trying. According to them, I have an "expertise in areas related to linguistics (if any)" [I love this "if any"!] , makes me "an ideal candidate to review the manuscript".

Dear E.A. Koto,
Hope you are well.
We are contacting you because your expertise in areas related to linguistics (if any)
makes you an ideal candidate to review the manuscript entitled "I.

submitted to@he Forum for Linguistic Studies (FLS). §)e manuscript
summary is provided below for YSummgierence:
[This study examines the morphological patterns of three languages—

I Focusing
on bound morphemes, both derivational and inflectional, the research adopts a
Forum for Linguistic Studies
COUNTRY SUB…

@sascha_wolfer@fediscience.org
2025-07-10 14:05:27

New #preprint by Alex Koplenig and me:
"Statistical errors undermine claims about the evolution of polysynthetic languages". (#PNAS (#Linguistics

@arXiv_csPL_bot@mastoxiv.page
2025-07-10 08:36:51

Sound Interval-Based Synthesis for Probabilistic Programs
Guilherme Espada, Alcides Fonseca
https://arxiv.org/abs/2507.06939 https://…

Sound Interval-Based Synthesis for Probabilistic Programs
Probabilistic programming has become a standard practice to model stochastic events and learn about the behavior of nature in different scientific contexts, ranging from Genetics and Ecology to Linguistics and Psychology. However, domain practitioners (such as biologists) also need to be experts in statistics in order to select which probabilistic model is suitable for a given particular problem, relying then on probabilistic inference engines such as Stan, Pyro or Edward to fine-tune the param…

@CerstinMahlow@mastodon.acm.org
2025-06-26 11:44:09

Half an hour to go: closing event of our #swissuniversities
project “Digital Literacy in University Contexts.”
https://www.zhaw.ch/en/linguistics/researc

Photo of a ZHAW “Leave a Mark” bottle in front of the welcome slide for the event.

@arXiv_csCL_bot@mastoxiv.page
2025-07-08 13:58:11

A Survey of Pun Generation: Datasets, Evaluations and Methodologies
Yuchen Su, Yonghua Zhu, Ruofan Wang, Zijian Huang, Diana Benavides-Prado, Michael Witbrock
https://arxiv.org/abs/2507.04793

A Survey of Pun Generation: Datasets, Evaluations and Methodologies
Pun generation seeks to creatively modify linguistic elements in text to produce humour or evoke double meanings. It also aims to preserve coherence and contextual appropriateness, making it useful in creative writing and entertainment across various media and contexts. Although pun generation has received considerable attention in computational linguistics, there is currently no dedicated survey that systematically reviews this specific area. To bridge this gap, this paper provides a comprehen…

@tschfflr@fediscience.org
2025-08-27 12:33:52

I arrived in Oslo! The Emoji Workshop is on Friday - if you're working on the linguistics of emojis and need the Zoom link, contact me via email
https://www.hf.uio.no/iln/english/research/groups/super-linguistics/events/the-emoji-workshop.html

@qurlyjoe@mstdn.social
2025-06-24 04:09:42

#linguistics
I vaguely recall from long ago studies the concept of #Register in describing and analyzing language usage, and how a given language has many registers, the use of which is prescribed by context.
There is, I think, a register that is used exclusively when narrating a History Ch…

@arXiv_csFL_bot@mastoxiv.page
2025-08-07 07:40:03

Identity Testing for Stochastic Languages
Smayan Agarwal, Shobhit Singh, Aalok Thakkar
https://arxiv.org/abs/2508.03826 https://arxiv.org/pdf/2508.03826

Identity Testing for Stochastic Languages
Determining whether an unknown distribution matches a known reference is a cornerstone problem in distributional analysis. While classical results establish a rigorous framework in the case of distributions over finite domains, real-world applications in computational linguistics, bioinformatics, and program analysis demand testing over infinite combinatorial structures, particularly strings. In this paper, we initiate the theoretical study of identity testing for stochastic languages, bridging…

@fanf@mendeddrum.org
2025-08-16 17:42:03

from my link log —
You know more Finnish than you think.
https://dannybate.com/2025/08/03/you-know-more-finnish-than-you-think/
saved 2025-08-07

You Know More Finnish Than You Think
Linguistics illuminates the linguistically obscure – or so I’ve always thought. It’s a common theme of my online output that a little bit of historical linguistics goes a long way, maki…

@arXiv_csCY_bot@mastoxiv.page
2025-09-04 07:38:21

Chatbot Deployment Considerations for Application-Agnostic Human-Machine Dialogues
Pablo Rivas, Chelsi Chelsi, Nishit Nishit, Laharika Ravula
https://arxiv.org/abs/2509.02611 ht…

Chatbot Deployment Considerations for Application-Agnostic Human-Machine Dialogues
Automatic conversation systems based on natural language responses are becoming ubiquitous, in part, due to major advances in computational linguistics and machine learning. The easy access to robust and affordable platforms are causing companies to have an unprecedented rush to adopt chatbot technologies for customer service and support. However, this rush has caused judgment lapses when releasing chatbot technologies into production systems. This paper aims to shed light on basic, elemental, …

@fgraver@hcommons.social
2025-07-30 08:27:01

writer identity and voice https://patthomson.net/2025/07/30/writer-identity-and-voice/

writer identity and voice
Still reading. This month it’s Schmit, John S (2022)The sociolinguistics of written identity, Constructing a self. Cham, Switzerland: PalgraveMacmillan. Schmit is a writing and linguistics professo…

@arXiv_csSD_bot@mastoxiv.page
2025-08-05 10:02:31

Non-Verbal Vocalisations and their Challenges: Emotion, Privacy, Sparseness, and Real Life
Anton Batliner, Shahin Amiriparian, Bj\"orn W. Schuller
https://arxiv.org/abs/2508.01960

Non-Verbal Vocalisations and their Challenges: Emotion, Privacy, Sparseness, and Real Life
Non-Verbal Vocalisations (NVVs) are short `non-word' utterances without proper linguistic (semantic) meaning but conveying connotations -- be this emotions/affects or other paralinguistic information. We start this contribution with a historic sketch: how they were addressed in psychology and linguistics in the last two centuries, how they were neglected later on, and how they came to the fore with the advent of emotion research. We then give an overview of types of NVVs (formal aspects) and fu…

@tschfflr@fediscience.org
2025-08-28 16:50:56

Very excited for The Emoji Workshop tomorrow! Warm-up starting now ☺️
https://www.hf.uio.no/iln/english/research/groups/super-linguistics/events/the-emoji-workshop.html

@relcfp@mastodon.social
2025-09-05 16:10:26

The Second Coming of Humanities: 2nd International Conference on Literature, Linguistics, and Language
https://ift.tt/ysZRf8M
updated: Friday, September 5, 2025 - 9:25amfull name / name of organization: University of Central…
via Input 4 RELCFP

@spamless@mastodon.social
2025-07-30 12:11:34

German-English connections: "anecken" and "to egg on." These words don't mean exactly the same thing, but they are related. Etymologically, they seem closely related. English is a Germanic language. And phrasal verbs offer up interesting connections.
#language #linguistics

@arXiv_csCL_bot@mastoxiv.page
2025-09-05 10:19:51

Joint Modeling of Entities and Discourse Relations for Coherence Assessment
Wei Liu, Michael Strube
https://arxiv.org/abs/2509.04182 https://arxiv.org/pdf/…

Joint Modeling of Entities and Discourse Relations for Coherence Assessment
In linguistics, coherence can be achieved by different means, such as by maintaining reference to the same set of entities across sentences and by establishing discourse relations between them. However, most existing work on coherence modeling focuses exclusively on either entity features or discourse relation features, with little attention given to combining the two. In this study, we explore two methods for jointly modeling entities and discourse relations for coherence assessment. Experimen…

@arXiv_csAI_bot@mastoxiv.page
2025-08-29 09:43:31

Bridging Minds and Machines: Toward an Integration of AI and Cognitive Science
Rui Mao, Qian Liu, Xiao Li, Erik Cambria, Amir Hussain
https://arxiv.org/abs/2508.20674 https://…

Bridging Minds and Machines: Toward an Integration of AI and Cognitive Science
Cognitive Science has profoundly shaped disciplines such as Artificial Intelligence (AI), Philosophy, Psychology, Neuroscience, Linguistics, and Culture. Many breakthroughs in AI trace their roots to cognitive theories, while AI itself has become an indispensable tool for advancing cognitive research. This reciprocal relationship motivates a comprehensive review of the intersections between AI and Cognitive Science. By synthesizing key contributions from both perspectives, we observe that AI pr…

@qurlyjoe@mstdn.social
2025-08-06 03:24:45

#linguistics #archeology #Paleogenetics #bookstodon

@felwert@fedihum.org
2025-07-21 11:49:09

https://wisskomm.social/@ids_mannheim/114890809294634742
Mit einem Beitrag von meinen Kolleg*innen @… und Sebastian Reimann aus u…

Institut für Deutsche Sprache (@ids_mannheim@wisskomm.social)
Angehängt: 2 Bilder 🆕 Kürzlich ist das Sonderheft des Journal for Language Technology and Computational Linguistics „LLM fails – Failed experiments with generative AI and what we can learn from them“ erschienen – hg. von Ngoc Duyen Tanja Tu, Annelen Brunner und Christian Lang. 🔗 https://jlcl.org/issue/view/69

@arXiv_csCL_bot@mastoxiv.page
2025-08-18 09:34:20

UNVEILING: What Makes Linguistics Olympiad Puzzles Tricky for LLMs?
Mukund Choudhary, KV Aditya Srivatsa, Gaurja Aeron, Antara Raaghavi Bhattacharya, Dang Khoa Dang Dinh, Ikhlasul Akmal Hanif, Daria Kotova, Ekaterina Kochmar, Monojit Choudhury
https://arxiv.org/abs/2508.11260

UNVEILING: What Makes Linguistics Olympiad Puzzles Tricky for LLMs?
Large language models (LLMs) have demonstrated potential in reasoning tasks, but their performance on linguistics puzzles remains consistently poor. These puzzles, often derived from Linguistics Olympiad (LO) contests, provide a minimal contamination environment to assess LLMs' linguistic reasoning abilities across low-resource languages. This work analyses LLMs' performance on 629 problems across 41 low-resource languages by labelling each with linguistically informed features to unveil weakne…

@tschfflr@fediscience.org
2025-08-27 05:13:05

#omw to The #Emoji Workshop*! Step 1 ✅
#DB true to form with trains just not running without notice, and this helpful schedule
* https://www.hf.uio.no/iln/english/research/groups/super-linguistics/events/the-emoji-workshop.html

@trochee@dair-community.social
2025-08-14 00:45:48

https://snarxiv.org/vs-arxiv/
This needs a version for the computational linguistics/machine-learning quartier of arXiv
I'm guessing at about 60% accurate in the string theory/supercollider / early time physics papers there, "better than a monkey" as the robot tells me
Via/blame

@fanf@mendeddrum.org
2025-08-27 08:42:03

from my link log —
Sapir-Whorf does not apply to programming languages.
https://buttondown.com/hillelwayne/archive/sapir-whorf-does-not-apply-to-programming/
saved 2025-08-21

Sapir-Whorf does not apply to Programming Languages
Linguistics is about human languages, not programming ones

@sascha_wolfer@fediscience.org
2025-06-17 06:12:21

Does anyone have access to this article?
Bromham et al. (2025): Macroevolutionary analysis of polysynthesis shows that language complexity is more likely to evolve in small, isolated populations.
#papersplease #paper #Linguistics

@sascha_wolfer@fediscience.org
2025-07-18 14:21:42

Hier eine persönliche Auswahl von tatsächlich im #Korpus vorkommenden schönen Modifikationen von "langsam" (von oben nach unten nach Häufigkeit sortiert). Mit dabei: Viele Tiere!
- zeitlupenlangsam
- schneckenlangsam 🐌
- schnelllangsam (?!?)
- schildkrötenlangsam 🐢
- schweinelangsam 🐷
- hyperlangsam
- hundslangsam 🐕‍🦺
- rasendlangsam
- ameisenlangsam 🐜
- krötenlangsam 🐸
- lavalangsam
- tuckerlangsam
- zentimeterlangsam
(Quelle: DeReKoGram, #linguistics

@arXiv_csCL_bot@mastoxiv.page
2025-07-03 10:03:00

Data interference: emojis, homoglyphs, and issues of data fidelity in corpora and their results
Matteo Di Cristofaro
https://arxiv.org/abs/2507.01764 https…

Data interference: emojis, homoglyphs, and issues of data fidelity in corpora and their results
Tokenisation - "the process of splitting text into atomic parts" (Brezina & Timperley, 2017: 1) - is a crucial step for corpus linguistics, as it provides the basis for any applicable quantitative method (e.g. collocations) while ensuring the reliability of qualitative approaches. This paper examines how discrepancies in tokenisation affect the representation of language data and the validity of analytical findings: investigating the challenges posed by emojis and homoglyphs, the study highligh…

@arXiv_qbioPE_bot@mastoxiv.page
2025-06-25 08:45:40

phylo2vec: a library for vector-based phylogenetic tree manipulation
Neil Scheidwasser, Ayush Nag, Matthew J Penn, Anthony MV Jakob, Frederik M{\o}lkj{\ae}r Andersen, Mark P Khurana, Landung Setiawan, Madeline Gordon, David A Duch\^ene, Samir Bhatt
https://arxiv.org/abs/2506.19490

phylo2vec: a library for vector-based phylogenetic tree manipulation
Phylogenetics is a fundamental component of many analysis frameworks in biology as well as linguistics to study the evolutionary relationships of different entities. Recently, the advent of large-scale genomics and the SARS-CoV-2 pandemic has underscored the necessity for phylogenetic software to handle large datasets of genomes or phylogenetic trees. While significant efforts have focused on scaling optimisation algorithms, visualization, and lineage identification, an emerging body of researc…

@sascha_wolfer@fediscience.org
2025-07-17 10:05:03

To whom it may concern...
Kurzauszug aus aktueller #Korpus -Studie
"hin- und/oder herX"
X = finites Verb, Verb im Infinitiv oder Partizip Perfekt
Platz 1: hin- und hergerissen (duh!)
Platz 2: hin- und hergeschoben
Platz 3: hin- und herschieben (s. Platz 2)
Platz 4: hin- und herfahren
Platz 5: hin- und herpendeln
Platz 6: hin- und hergeschickt
Platz 7: hin- und hergefahren (s. Platz 4)
Platz 8: hin- und hergeworfen
Platz 9: hin- und herwechseln
Platz 10: hin- und herpendelt (s. Platz 5)
"hin- und hergerissen" ist dabei immer noch häufiger als Plätze 2 bis 10 zusammengenommen.
Wie gesagt: "und" kann in der obigen Liste auch immer "oder" sein.
Quelle: DeReKoGram (#linguistics

@arXiv_csCL_bot@mastoxiv.page
2025-07-02 10:19:30

Generative AI and the future of scientometrics: current topics and future questions
Benedetto Lepori, Jens Peter Andersen, Karsten Donnay
https://arxiv.org/abs/2507.00783

Generative AI and the future of scientometrics: current topics and future questions
The aim of this paper is to review the use of GenAI in scientometrics, and to begin a debate on the broader implications for the field. First, we provide an introduction on GenAI's generative and probabilistic nature as rooted in distributional linguistics. And we relate this to the debate on the extent to which GenAI might be able to mimic human 'reasoning'. Second, we leverage this distinction for a critical engagement with recent experiments using GenAI in scientometrics, including topic lab…

@arXiv_csSD_bot@mastoxiv.page
2025-06-16 07:53:59

Assessing the Impact of Anisotropy in Neural Representations of Speech: A Case Study on Keyword Spotting
Guillaume Wisniewski (LLF - UMR7110), S\'everine Guillaume (LACITO), Clara Rosina Fern\'andez (LACITO)
https://arxiv.org/abs/2506.11096

Assessing the Impact of Anisotropy in Neural Representations of Speech: A Case Study on Keyword Spotting
Pretrained speech representations like wav2vec2 and HuBERT exhibit strong anisotropy, leading to high similarity between random embeddings. While widely observed, the impact of this property on downstream tasks remains unclear. This work evaluates anisotropy in keyword spotting for computational documentary linguistics. Using Dynamic Time Warping, we show that despite anisotropy, wav2vec2 similarity measures effectively identify words without transcription. Our results highlight the robustness …

@arXiv_csCL_bot@mastoxiv.page
2025-08-26 12:01:26

Feature-Refined Unsupervised Model for Loanword Detection
Promise Dodzi Kpoglu
https://arxiv.org/abs/2508.17923 https://arxiv.org/pdf/2508.17923

Feature-Refined Unsupervised Model for Loanword Detection
We propose an unsupervised method for detecting loanwords i.e., words borrowed from one language into another. While prior work has primarily relied on language-external information to identify loanwords, such approaches can introduce circularity and constraints into the historical linguistics workflow. In contrast, our model relies solely on language-internal information to process both native and borrowed words in monolingual and multilingual wordlists. By extracting pertinent linguistic feat…

@tschfflr@fediscience.org
2025-06-20 14:13:22

🤩 We spent today recording video for a new professional short introduction into our ViCom project on #emojis 📹 It was fun trying to find the most photogenic public spots on campus (as emoji research consists mostly of us staring at computer screens all day long, and is not very... visually interesting)
Very excited to show you the results in a few weeks or months! #visualCommunication #linguistics #sciComm @… https://vicom.info/projects/semantics-and-pragmatics-of-emojis-in-digital-communication/

@arXiv_csCL_bot@mastoxiv.page
2025-07-14 09:59:02

A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1
Marcin Pietro\'n, Rafa{\l} Olszowski, Jakub Gomu{\l}ka, Filip Gampel, Andrzej Tomski
https://arxiv.org/abs/2507.08621

A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1
Argument mining (AM) is an interdisciplinary research field that integrates insights from logic, philosophy, linguistics, rhetoric, law, psychology, and computer science. It involves the automatic identification and extraction of argumentative components, such as premises and claims, and the detection of relationships between them, such as support, attack, or neutrality. Recently, the field has advanced significantly, especially with the advent of large language models (LLMs), which have enhanc…

@sascha_wolfer@fediscience.org
2025-07-23 06:21:53

Just published:
Supplementing CEFR-graded vocabulary lists for language learners by leveraging information on dictionary views, corpus frequency, part-of-speech, and polysemy
A machine-learning method to suggest word candidates for CEFR-graded vocabulary lists.
#CEFR level of previously unlabeled words
#linguistics #CEFR #frequency #dictionary #LanguageLearning

Tootfinder

Opt-in global Mastodon full text search. Join the index!