
2025-06-06 09:41:05
This https://arxiv.org/abs/2505.24195 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csHC_…
This https://arxiv.org/abs/2505.24195 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csHC_…
wiki_article_words: Wikipedia article-words (en) (2010)
A bipartite network of English Wikipedia articles and the words they contain. The edge weight gives the number of times a word appeared in the connected article.
This network has 276739 nodes and 2941902 edges.
Tags: Informational, Language, Weighted
https://
I'm a #language fiend too. Lasswell taps into a big sore point for me as well. If you read this, stay away from the comment section. You'll go nuts!
Opinion | The phrase ‘begs the question’ is begging for oblivion - The Washington Post
SWELL (old-timey use) Vs SWILL. What a difference a letter makes.
#English #language #difference
«Sure, people from Latin-language-majority countries also list their “pronouns” in e-mail signatures and on social-media profiles, but this is, I think, largely the subconscious acceptance of the English-speakers’ hegemony over social norms (and queerness?).»
🔥 by @….
I'd even go one further and say "USian English's hegemony, both for online social norms generally and queerness in particular 🙈
https://blog.achintyarao.in/post/the-english-language/
MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
Zhixun Chen, Ping Guo, Wenhan Han, Yifan Zhang, Binbin Liu, Haobin Lin, Fengze Liu, Yan Zhao, Bingni Zhang, Taifeng Wang, Yin Zheng, Meng Fang
https://arxiv.org/abs/2507.01785
I'm a #language fiend too. Lasswell taps into a big sore point for me as well. If you read this, stay away from the comment section. You'll go nuts!
Opinion | The phrase ‘begs the question’ is begging for oblivion - The Washington Post
The Belfast News Letter, among the longest-running English dailies, digitizes and puts all its editions online, including the earliest surviving one from 1738 (Michael Savage/The Guardian)
https://www.theguardian.com/media/2025/jun/…
Teaching a Language Model to Speak the Language of Tools
Simeon Emanuilov
https://arxiv.org/abs/2506.23394 https://arxiv.org/pdf/2506…
WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions
Zining Wang, Yuxuan Zhang, Dongwook Yoon, Nicholas Vincent, Farhan Samir, Vered Shwartz
https://arxiv.org/abs/2505.24195
At the Semantic Digital Humanities 2025 Workshop, Jose Maldonado-Rodríguez is presenting "Natural Language Querying for Humanities #KnowledgeGraphs A case study on the GOLEM KG". Main contribution is a bilingual dataset (English-Spanish) specifically designed to evaluate automatic text-to-SPARQL translation systems for GOLEM, a specialized humanities KG.
paper:
As I'm learning Dutch, I'm reminded that the idea that there are people who believe that the bible is to be taken literally. The idea that a several hundred year old translation of a collection of texts in multiple languages, that were themselves translated multiple times between languages, before the whole thing was translated to Latin, then being translated to English, could somehow perfectly reflect the original text... Yeah, it's only possible to believe that if you have no idea how languages work and have never learned another language.
Like, just from linguistic drift alone if the bible were written in King James English you're losing *so* much context. But Hebrew, Aramaic, and Greek translated to Latin, then to English, then to English again?
There are so many things that erg can't be translated, even as a beginner. Dutch and English are two of the closest languages that exist, they're both Germanic languages and they're the closest to each other (other than Friesian). You can't really be much closer, and yet, there are so many things you can't mutually represent. Hebrew and Latin, Aramaic and Latin, Latin and English, Greek and English, these aren't even the same families at all... They're extremely distant. There's absolutely no way to represent concepts from one to another without another book's worth of explanation.
And that ignores all the cultural context, which is mostly lost and a library and decade of education to get the stuff that we *do* know.
Only monolingual Americans could come up with an idea so incredibly asinine.
An Exploratory Framework for Future SETI Applications: Detecting Generative Reactivity via Language Models
Po-Chieh Yu
#toXiv_bot_toot
Title: The End of Anarchism?
Author: Luigi Galleani
Topics: #AnarchoCommunism, #insurrectionary
Date: 1925
Link:
Verifiable Natural Language to Linear Temporal Logic Translation: A Benchmark Dataset and Evaluation Suite
William H English, Chase Walker, Dominic Simon, Sumit Kumar Jha, Rickard Ewetz
https://arxiv.org/abs/2507.00877
Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages
Samridhi Raj Sinha, Rajvee Sheth, Abhishek Upperwal, Mayank Singh
https://arxiv.org/abs/2507.01853
Mitigating Language Mismatch in SSL-Based Speaker Anonymization
Zhe Zhang, Wen-Chin Huang, Xin Wang, Xiaoxiao Miao, Junichi Yamagishi
https://arxiv.org/abs/2507.00458
wiki_talk: Wikipedia talk networks
Interactions among users of 10 language-specific Wikipedias: Arabic, Chinese, Dutch, English, French, German, Italian, Portuguese, Russian, and Spanish. Nodes are registered wiki editors, and an edge represents a user i having written a message on user j's talk page. Edges are timestamped. The precise dates of the snapshots are uncertain.
This network has 8097 nodes and 63809 edges.
Tags: Social, Communication, Unweighted, Multigraph, …
You'll never become a NATIVE English speaker
No matter how hard you try, the years in the UK or Ireland, the effort in your accent, or the AI applications you might use to fake it
There is a language wall, made of accents, cultural references and seemingly illogical phrasal verbs and idioms, that we cannot jump
But IT DOESN'T MATTER.
90% of your interactions are probably with other non-native speakers. As long as you understand each other, you're good.
Stop the decline!
Marked decline in semicolons in English books, study suggests | Language | The Guardian
https://www.theguardian.com/science/2025/may/18/marked-decline-semicolon-use-english-books-study-suggests
Contrasting Cognitive Styles in Vision-Language Models: Holistic Attention in Japanese Versus Analytical Focus in English
Ahmed Sabir, Azinovi\v{c} Gasper, Mengsay Loem, Rajesh Sharma
https://arxiv.org/abs/2507.00700
You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
Paige Tutt\"os\'i, H. Henny Yeung, Yue Wang, Jean-Julien Aucouturier, Angelica Lim
https://arxiv.org/abs/2506.23367
Language learning has been part of me since high school. I'm solid in 2 non-English languages, crappy but survivable in 2 others. I've played with & started learning others many times.
I'm real busy rn, but language learning could be a fun thing to do for myself & make me feel like I'm still me.
But I'm stumped about my language picks. I learnt the obvious European languages in school; later tried key Asian languages. What do I want to do now?
African languages? I won't be getting a chance to use them much in Aus, & I'm unlikely to get to a stage where I can read literature.
I tried Slovenian/Slovene on a whim & really love it, but I'll never go there. Is the practical but unfun answer grind out more kanji/hanzi? Or is whimsically learning a language spoken by only 2.5 million people reasonable? I will continue struggling through with Ukrainian, 'cause I think it's important.
#LanguageLearning
Linguine: A Natural-Language Programming Language with Formal Semantics and a Clean Compiler Pipeline
Lifan Hu
https://arxiv.org/abs/2506.08396 https://
Metadata Enrichment of Long Text Documents using Large Language Models
Manika Lamba, You Peng, Sophie Nikolov, Glen Layne-Worthey, J. Stephen Downie
https://arxiv.org/abs/2506.20918
EfficientXLang: Towards Improving Token Efficiency Through Cross-Lingual Reasoning
Sanchit Ahuja, Praneetha Vaddamanu, Barun Patra
https://arxiv.org/abs/2507.00246
Brains and language models converge on a shared conceptual space across different languages
Zaid Zada, Samuel A Nastase, Jixing Li, Uri Hasson
https://arxiv.org/abs/2506.20489
English-language trailer for "Quartet" just dropped!
https://chaos.social/@theateru34/114742812225916947
wiki_talk: Wikipedia talk networks
Interactions among users of 10 language-specific Wikipedias: Arabic, Chinese, Dutch, English, French, German, Italian, Portuguese, Russian, and Spanish. Nodes are registered wiki editors, and an edge represents a user i having written a message on user j's talk page. Edges are timestamped. The precise dates of the snapshots are uncertain.
This network has 41424 nodes and 73900 edges.
Tags: Social, Communication, Unweighted, Multigraph,…
Automatic Large Language Models Creation of Interactive Learning Lessons
Jionghao Lin, Jiarui Rao, Yiyang Zhao, Yuting Wang, Ashish Gurung, Amanda Barany, Jaclyn Ocumpaugh, Ryan S. Baker, Kenneth R. Koedinger
https://arxiv.org/abs/2506.17356
It bothers me big time that many Reddit threads are auto-translated to my native language and indexed in search engines.
As someone who frequently searches for highly local knowledge, I find this annoying.
If I search anything in my native language, it usually means "I want to know what my fellows think", and not "I want to read what English-speaking community says, just translated to Polish".
Outside of a few specific exceptions, Reddit in general is such a waste of time and data.
Natural language processing for African languages
David Ifeoluwa Adelani
https://arxiv.org/abs/2507.00297 https://arxiv.org/pdf/2507.…
I made a new friend today in Albion Europe.
ZairGT...I think native language is Spanish but also speaks English.
I found this Tree T4.3 in a blue zone and I went to it back and forth. At one point I find ZairGT(both him and me unflagged, no FW) there...and the gathering duel was "short and bloody" and I realized I have no chance to take that.
I admitted defeat and often went there and did the 1 sign and WP...
part 2 soon...
Of all the many crimes against the English language committed by people from the USA, I think the only one which I'd make a capital offence is that of refusing to pronounce the "L" in "soldering".
Dhvani: A Weakly-supervised Phonemic Error Detection and Personalized Feedback System for Hindi
Arnav Rustagi, Satvik Bajpai, Nimrat Kaur, Siddharth Siddharth
https://arxiv.org/abs/2506.02166
wordnet: WordNet relationships
A network of English words from the WordNet. Node is a word, and edge denotes relationships between words (synonymy, hyperonymy, meronymy, etc.). The date at which this network was extracted from WordNet is not unknown.
This network has 146005 nodes and 656999 edges.
Tags: Informational, Language, Unweighted
htt…
I don't know why I'm learning to say "right" and "left" in another language when I can't even keep them straight in English.
Day 13 (oh, really? ;))
Registration Form Implementation
I've just finished implementing a registration form with validation and language switching using Next.js and React Hook Form. Now users can register with dynamic language support (English/Polish) and data validation (email, password, phone).
Unfortunately, my account on Write.as has been temporarily blocked, so details about the implementation will be available once the account is unlocked. Stay tuned! 😊
I got to use `progenitor` in an uncontrived context at work today, and last week it was `prevaricate`, so if you need any fancy English vocabulary needs I'm right over here, under the sign `Fancy Language Person`.
Text2Cypher Across Languages: Evaluating Foundational Models Beyond English
Makbule Gulcin Ozsoy, William Tai
https://arxiv.org/abs/2506.21445 https://
wiki_article_words: Wikipedia article-words (en) (2010)
A bipartite network of English Wikipedia articles and the words they contain. The edge weight gives the number of times a word appeared in the connected article.
This network has 276739 nodes and 2941902 edges.
Tags: Informational, Language, Weighted
https://
skLEP: A Slovak General Language Understanding Benchmark
Marek \v{S}uppa, Andrej Ridzik, Daniel Hl\'adek, Tom\'a\v{s} Jav\r{u}rek, Vikt\'oria Ondrejov\'a, Krist\'ina S\'asikov\'a, Martin Tamajka, Mari\'an \v{S}imko
https://arxiv.org/abs/2506.21508
GLAP: General contrastive audio-text pretraining across domains and languages
Heinrich Dinkel, Zhiyong Yan, Tianzi Wang, Yongqing Wang, Xingwei Sun, Yadong Niu, Jizhong Liu, Gang Li, Junbo Zhang, Jian Luan
https://arxiv.org/abs/2506.11350
wordnet: WordNet relationships
A network of English words from the WordNet. Node is a word, and edge denotes relationships between words (synonymy, hyperonymy, meronymy, etc.). The date at which this network was extracted from WordNet is not unknown.
This network has 146005 nodes and 656999 edges.
Tags: Informational, Language, Unweighted
htt…
The Emergence of Abstract Thought in Large Language Models Beyond Any Language
Yuxin Chen, Yiran Zhao, Yang Zhang, An Zhang, Kenji Kawaguchi, Shafiq Joty, Junnan Li, Tat-Seng Chua, Michael Qizhe Shieh, Wenxuan Zhang
https://arxiv.org/abs/2506.09890
HyperCLOVA X THINK Technical Report
NAVER Cloud HyperCLOVA X Team
https://arxiv.org/abs/2506.22403 https://arxiv.org/pdf/2506.22403…
This https://arxiv.org/abs/2505.05660 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csHC_…
wordnet: WordNet relationships
A network of English words from the WordNet. Node is a word, and edge denotes relationships between words (synonymy, hyperonymy, meronymy, etc.). The date at which this network was extracted from WordNet is not unknown.
This network has 146005 nodes and 656999 edges.
Tags: Informational, Language, Unweighted
htt…
Replaced article(s) found for q-bio.OT. https://arxiv.org/list/q-bio.OT/new
[1/1]:
English dictionaries, gold and silver standard corpora for biomedical natural language processing...
wiki_talk: Wikipedia talk networks
Interactions among users of 10 language-specific Wikipedias: Arabic, Chinese, Dutch, English, French, German, Italian, Portuguese, Russian, and Spanish. Nodes are registered wiki editors, and an edge represents a user i having written a message on user j's talk page. Edges are timestamped. The precise dates of the snapshots are uncertain.
This network has 155820 nodes and 1358426 edges.
Tags: Social, Communication, Unweighted, Multigra…
AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation
Leah von der Heyde, Anna-Carolina Haensch, Bernd Wei{\ss}, Jessika Daikeler
https://arxiv.org/abs/2506.14634
The Teacher's Dilemma: Balancing Trade-Offs in Programming Education for Emergent Bilingual Students
Emma R. Dodoo, Tamara Nelson-Fromm, Mark Guzdial
https://arxiv.org/abs/2506.14147
wiki_users: Wikipedia user interaction (2011)
A network derived from interactions between editors of the English language Wikipedia, as derived from the edit histories of 563 wiki pages related to politics. A positive sign indicates positive links such as trust or similarities, and a negative sign indicates distrust or disagreement.
This network has 138592 nodes and 740397 edges.
Tags: Social, Online, Signed
Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[1/3]:
- Impact of Visual Context on Noisy Multimodal NMT: An Empirical Study for English to Indian Languages
Baban Gain, Dibyanayan Bandyopadhyay, Samrat Mukherjee, Chandranath Adak, Asif Ekbal
word_adjacency: Word Adjacency Networks
Directed Networks of word adjacency in texts of several languages including English, French, Spanish and Japanese.
This network has 7381 nodes and 46281 edges.
Tags: Informational, Language, Unweighted
https://networks.skewed.de/net/word_ad
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
Ammar Khairi, Daniel D'souza, Ye Shen, Julia Kreutzer, Sara Hooker
https://arxiv.org/abs/2506.20544
Intelligibility of Text-to-Speech Systems for Mathematical Expressions
Sujoy Roychowdhury, H. G. Ranjani, Sumit Soman, Nishtha Paul, Subhadip Bandyopadhyay, Siddhanth Iyengar
https://arxiv.org/abs/2506.11086
wordnet: WordNet relationships
A network of English words from the WordNet. Node is a word, and edge denotes relationships between words (synonymy, hyperonymy, meronymy, etc.). The date at which this network was extracted from WordNet is not unknown.
This network has 146005 nodes and 656999 edges.
Tags: Informational, Language, Unweighted
htt…
This https://arxiv.org/abs/2506.00759 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…