
2025-05-23 08:05:04
In the #ISE2025 lecture today we were introducing our students to the concept of distributional semantics as the foundation of modern large language models. Historically, Wittgenstein was one of the important figures in the Philosophy of Language stating thet "The meaning of a word is its use in the language."
#ReleaseWednesday — Extracted & extended the LISP-like DSL from an existing #ThingUmbrella example[1] as new small package for better/direct re-use in other projects:
1/ It's really hard to miss a particular demographic in Italy. Almost every cycle-based food delivery driver, outdoor market fruit/veg/cheap trinket seller, etc., is clearly from South Asia, down to the language and music. It's really stark. #Italy25 ↵
Home again, after a week and some. Today I made biscuits! Egg and bacon, tea and coffee. Work, transit board meeting, and play at our local Spanish-language theater tonight: “La Dama Boba” #TogetherBreakfast https://photos.app.goo.gl/7S7RtWAatP5Ps1ZN8
wordnet: WordNet relationships
A network of English words from the WordNet. Node is a word, and edge denotes relationships between words (synonymy, hyperonymy, meronymy, etc.). The date at which this network was extracted from WordNet is not unknown.
This network has 146005 nodes and 656999 edges.
Tags: Informational, Language, Unweighted
htt…
A post from the archive 📫:
Making production diagnostics easier with Source Link
https://www.poppastring.com/blog/making-production-diagnostics-easier-with-source-link
Language learning has been part of me since high school. I'm solid in 2 non-English languages, crappy but survivable in 2 others. I've played with & started learning others many times.
I'm real busy rn, but language learning could be a fun thing to do for myself & make me feel like I'm still me.
But I'm stumped about my language picks. I learnt the obvious European languages in school; later tried key Asian languages. What do I want to do now?
African languages? I won't be getting a chance to use them much in Aus, & I'm unlikely to get to a stage where I can read literature.
I tried Slovenian/Slovene on a whim & really love it, but I'll never go there. Is the practical but unfun answer grind out more kanji/hanzi? Or is whimsically learning a language spoken by only 2.5 million people reasonable? I will continue struggling through with Ukrainian, 'cause I think it's important.
#LanguageLearning
🇺🇦 Auf #radioeins läuft...
Nation of Language:
🎵 Inept Apollo
#NowPlaying #NationofLanguage
#radioeins gespielten Titel als #Spotify Playliste: https://open.spotify.com/playlist/3hdH98B6uyXilhcWxCA6nv
Our Father in Heaven - Pray As You Go
#dw4jc
Generating Shakespeare-like text with an n-gram language model is straight forward and quite simple. But, don't expect to much of it. It will not be able to recreate a lost Shakespear play for you ;-) It's merely a parrot, making up well sounding sentences out of fragments of original Shakespeare texts...
#ise2025
trec: TREC collection (2010)
A bipartite network of documents and the words they contain, extracted from NIST's Text Retrieval Conference (TREC) disks 4 and 5, from 2010. These archives contain material drawn from the Financial Times Ltd., the Congressional Record of the 103rd Congress, the Federal Register, the Foreign Broadcast Information Service, and the Los Angeles Times newspaper.
This network has 1729302 nodes and 83629405 edges.
Tags: Informational, Language, Un…
Does anyone have access to this article?
Bromham et al. (2025): Macroevolutionary analysis of polysynthesis shows that language complexity is more likely to evolve in small, isolated populations.
#papersplease #paper #Linguistics
German, the language with the most compelling argument for camelCase and PascalCase.
#languages #German #Deutsch #readability #comprehension
trec: TREC collection (2010)
A bipartite network of documents and the words they contain, extracted from NIST's Text Retrieval Conference (TREC) disks 4 and 5, from 2010. These archives contain material drawn from the Financial Times Ltd., the Congressional Record of the 103rd Congress, the Federal Register, the Foreign Broadcast Information Service, and the Los Angeles Times newspaper.
This network has 1729302 nodes and 83629405 edges.
Tags: Informational, Language, Un…
Finally you have the chance to work with me. 🥳
Join Olivers Team in #hamburg as a Software Developer for #DSpace rsp. #DSpaceCRIS
German language skills are helpful
on my way to the optional friday meeting of early arrivers to RetroNetConf, our small German-language meeting of various #retronetworking enthusiasts https://osmocom.org/projects/retronetw…
Saw this on my kitchen counter today and thought „hmm, ‚Solid Porcelain‘ might be a cool design language, actually.“
Can do the same 3d effects, nice highlights / reflections, and easier to read on than the transparency of #LiquidGlass. Plus, stuff sticking out of the icon is always fun. #iOS
How to Keep Up With New CSS Features | CSS-Tricks
https://css-tricks.com/how-to-keep-up-with-new-css-features/
Oh, the State of CSS survey is open... And ßmore good sources.
Guardians Of Language And Country
Great Australian Pods Podcast Directory: #GreatAusPods
This is everything I could never get out of #sonicpi:
🇺🇦 #NowPlaying on KEXP's #VarietyMix
Cryogeyser:
🎵 Love Language
#Cryogeyser
https://open.spotify.com/track/2lGHIv8Q5Ug896XTMz5oEr
I don't understand Mitch's body language here
#DavidHasselhoff #CarolGrow
Season 9 Episode 19 "Double Jeopardy"
#RandomBaywatch
I'm a #language fiend too. Lasswell taps into a big sore point for me as well. If you read this, stay away from the comment section. You'll go nuts!
Opinion | The phrase ‘begs the question’ is begging for oblivion - The Washington Post
I have time to experiment with different programming languages and while I'm a big fan of functional or functional style programming, my recent obsession is with #Go
It is a tremendously simple language, without surprises or elaborate mechanisms, procedural and totally boring.. and I love it.
Most satisfying thing is life reload with air and it's usually already compiled and…
A post from the archive 📫:
Making production diagnostics easier with Source Link
https://www.poppastring.com/blog/making-production-diagnostics-easier-with-source-link
I was just sent this photo, supposedly taken in 1994: Björn Beutel, developer of the Malaga language, and myself are working in the CLUE (Computational Linguistics University of Erlangen) lab.
I’m a bit hesitant to tag this #retrocomputing ;-)
#CompLing
I was just sent this photo, supposedly taken in 1994: Björn Beutel, developer of the Malaga language, and myself are working in the CLUE (Computational Linguistics University of Erlangen) lab.
I’m a bit hesitant to tag this #retrocomputing ;-)
#CompLing
I was just sent this photo, supposedly taken in 1994: Björn Beutel, developer of the Malaga language, and myself are working in the CLUE (Computational Linguistics University of Erlangen) lab.
I’m a bit hesitant to tag this #retrocomputing ;-)
#CompLing
NPR: The White House is sued over lack of sign language interpreters at press briefings
https://www.npr.org/2025/05/29/nx-s1-5415687/deaf-sign-language-trump-white-house-lawsuit
I just solved a problem at work thanks to #git bisect that no one else was able to figure out for two days.
And I don’t even really understand the language.
A Cult AI Computer’s Boom and Bust:
I am aware that CUDA isn’t a language. But 🤷♂️
📺 #video
Creationism in everyday language bugs me. "Entity A made N dollars" or "… generated N kilowatts" when 'extracted' states what actually happened, and maybe promotes a clearer mindset?
#wordsoundpower
🇺🇦 Auf radioeins läuft...
Nation Of Language:
🎵 Stumbling Still (Edit)
#NowPlaying #NationOfLanguage
https://nationoflanguage.bandcamp.com/track/stumbling-still
https://open.spotify.com/track/2kGKxXOSLegpsXQOAhMqCH
At @oapenbooks.bsky.social, we have updated our #Metadata feeds, to better integrate our #OpenAccess #books into #libraries
Title: The End of Anarchism?
Author: Luigi Galleani
Topics: #AnarchoCommunism, #insurrectionary
Date: 1925
Link:
Go Wiki: SliceTricks - The Go Programming Language
Cut:
a = append(a[:i], a[j:]...)
#golang
This week, we were discussing the central question Can we "predict" a word? as the basis for statistical language models in our #ISE2025 lecture. Of course, I wasx trying Shakespeare quotes to motivate the (international) students to complement the quotes with "predicted" missing words ;-)
"All the world's a stage, and all the men and women merely...."
#Anthropic #opensources circuit tracing method to reveal how large language models make decisions internally
🔍 Generate attribution graphs showing step-by-step model reasoning processes
🧵👇#research
An analysis of the top 100 trending TikTok videos under #mentalhealthtips finds 52 contain misinformation, including misused language and quick-fix methods (The Guardian)
I'm a #language fiend too. Lasswell taps into a big sore point for me as well. If you read this, stay away from the comment section. You'll go nuts!
Opinion | The phrase ‘begs the question’ is begging for oblivion - The Washington Post
SWELL (old-timey use) Vs SWILL. What a difference a letter makes.
#English #language #difference
#Openstreetmap people, how would you tag a greek restaurant that only has a greek name in greek letters? In this case, the restaurant is called ΑΝΕΜΟΣ, so I tagged `name=ΑΝΕΜΟΣ` and `name:de=Anemos` as that would be the name in my locale, but Osmosis whines about "Name with uppercase" and "Default and local language name not the same". Should the name be the local one…
🇺🇦 #NowPlaying on KEXP's #MiddayShow
Nation of Language:
🎵 Inept Apollo
#NationofLanguage
https://nationoflanguage.bandcamp.com/track/inept-apollo
https://open.spotify.com/track/0YikZXNNkvNlPInLAPMV06
Want to check the google trends for a topic? Use {gtrendsR} directly from within your favorite language: #googletrends
An analysis of the top 100 trending TikTok videos under #mentalhealthtips finds 52 contain misinformation, including misused language and quick-fix methods (The Guardian)
✨European Summer School for Logic, Language and Information 2025✨ in Bochum, Germany:
- 47 courses & workshops
- 4 exciting evening lectures
- social events
- explore the Ruhr area
Early bird registration deadline: May 31!
#esslli2025 #esslli #logic #linguistics #compSci #nlproc #summerSchool #rub https://2025.esslli.eu/
Edit: Boosts appreciated 🤗
One way to spend time instead of #doomscrolling has recentlt been #codewars
Bitesized problems of different difficulties for almost any language.
Not as big as #adventofcode
Want super-thorough AI responses by applying deep thought & reasoning AI technology?
Any Microsoft 365 Copilot (Commercial) licensed user can now use the Researcher & Analyst agents at no additional cost! These new agents provide very thorough responses to help you collect FACTS on informational topics & review DATA to derive insights... with human language.
✅ #Researcher
wow I don't even know `<data>` HTML tag exists #HTML
“#SFUSD sent a sternly worded letter to the #SFParksAlliance. The language was cool and technical, but the meaning was not. It mirrored the poem that won Flyguy the Pimp of the Year contest: ‘Better have my money! Through rain, sleet or snow! Better have my money! Not half! Not some! But alllllllll my…
Even for small projects, while it might work, I wouldn't want the context switching personally. Especially if the backend uses some weird templating language for generating the html snippets. I'd much rather use a fullstack framework like #SvelteKit where your endpoint is just a load function that returns some data (automatically serialized and made available to the page script), or create a proper API in another language that returns some JSON.
Very happy to have the printed version of the German edition of Dream Askew Dream Apart in my hands. Thank you, @… , for giving the German language community this important game. This game is special to me as I somehow had it again and again in my mind while developing my own future I want to live in, the sustainable community housing project @…
#ttrpg #indierpg #pnpde #bob #DADA #averyAlder
An Exploratory Framework for Future SETI Applications: Detecting Generative Reactivity via Language Models
Po-Chieh Yu
#toXiv_bot_toot
pokec: Pokec online social network (2012)
The online social network of Pokec, a popular OSN in Slovakia, from 2012. Date covers about 10 years and more than 1.6 million people. Profile data contains gender, age, hobbies, interest, education etc. Profile metadata are in Slovak language. Friendships in Pokec are oriented.
This network has 1632804 nodes and 30622564 edges.
Tags: Social, Online, Metadata
Let's say you find a really cool forum online that has lots of good advice on it. It's even got a very active community that's happy to answer questions very quickly, and the community seems to have a wealth of knowledge about all sorts of subjects.
You end up visiting this community often, and trusting the advice you get to answer all sorts of everyday questions you might have, which before you might have found answers to using a web search (of course web search is now full of SEI spam and other crap so it's become nearly useless).
Then one day, you ask an innocuous question about medicine, and from this community you get the full homeopathy treatment as your answer. Like, somewhat believable on the face of it, includes lots of citations to reasonable-seeming articles, except that if you know even a tiny bit about chemistry and biology (which thankfully you do), you know that the homoeopathy answers are completely bogus and horribly dangerous (since they offer non-treatments for real diseases). Your opinion of this entire forum suddenly changes. "Oh my God, if they've been homeopathy believers all this time, what other myths have they fed me as facts?"
You stop using the forum for anything, and go back to slogging through SEI crap to answer your everyday questions, because one you realize that this forum is a community that's fundamentally untrustworthy, you realize that the value of getting advice from it on any subject is negative: you knew enough to spot the dangerous homeopathy answer, but you know there might be other such myths that you don't know enough to avoid, and any community willing to go all-in on one myth has shown itself to be capable of going all in on any number of other myths.
...
This has been a parable about large language models.
#AI #LLM
🇺🇦 Auf radioeins läuft...
Nation of Language:
🎵 Inept Apollo
#NowPlaying #NationofLanguage
https://nationoflanguage.bandcamp.com/track/inept-apollo
https://open.spotify.com/track/0YikZXNNkvNlPInLAPMV06
Next stop on our NLP timeline (as part of the #ISE2025 lecture) was Terry Winograd's SHRDLU, an early natural language understanding system developed in 1968-70 that could manipulate blocks in a virtual world.
Winograd, T. Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. MIT AI Technical Report 235.
🇺🇦 #NowPlaying on KEXP's #VarietyMix
Nation of Language:
🎵 I'm Not Ready for the Change
#NationofLanguage
#newRelease 🆕 single
https://nationoflanguage.bandcamp.com/track/im-not-ready-for-the-change
https://open.spotify.com/track/5ORQX1wOPqCP031V1CxjUq
pokec: Pokec online social network (2012)
The online social network of Pokec, a popular OSN in Slovakia, from 2012. Date covers about 10 years and more than 1.6 million people. Profile data contains gender, age, hobbies, interest, education etc. Profile metadata are in Slovak language. Friendships in Pokec are oriented.
This network has 1632804 nodes and 30622564 edges.
Tags: Social, Online, Metadata
🇺🇦 Auf radioeins läuft...
Nation of Language:
🎵 Inept Apollo
#NowPlaying #NationofLanguage
https://nationoflanguage.bandcamp.com/track/inept-apollo
https://open.spotify.com/track/0YikZXNNkvNlPInLAPMV06
Building on the 90s, statistical n-gram language models, trained on vast text collections, became the backbone of NLP research. They fueled advancements in nearly all NLP techniques of the era, laying the groundwork for today's AI.
F. Jelinek (1997), Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA
#NLP
wow I don't even know `<data>` HTML tag exists #HTML
word_adjacency: Word Adjacency Networks
Directed Networks of word adjacency in texts of several languages including English, French, Spanish and Japanese.
This network has 7381 nodes and 46281 edges.
Tags: Informational, Language, Unweighted
https://networks.skewed.de/net/word_ad
🇺🇦 Auf radioeins läuft...
Nation of Language:
🎵 Inept Apollo
#NowPlaying #NationofLanguage
https://nationoflanguage.bandcamp.com/track/inept-apollo
https://open.spotify.com/track/0YikZXNNkvNlPInLAPMV06
bible_nouns: Bible noun phrases
A network of noun phrases (places and names) in the King James Version of the Bible. Each node is a noun phrase, and an edge exists if the noun phrases co-occur in a Bible verse. Edge weight denotes how often the two words co-occur.
This network has 1773 nodes and 9131 edges.
Tags: Informational, Language, Weighted
wiki_talk: Wikipedia talk networks
Interactions among users of 10 language-specific Wikipedias: Arabic, Chinese, Dutch, English, French, German, Italian, Portuguese, Russian, and Spanish. Nodes are registered wiki editors, and an edge represents a user i having written a message on user j's talk page. Edges are timestamped. The precise dates of the snapshots are uncertain.
This network has 155820 nodes and 1358426 edges.
Tags: Social, Communication, Unweighted, Multigra…
A post from the archive 📫:
Making production diagnostics easier with Source Link
https://www.poppastring.com/blog/making-production-diagnostics-easier-with-source-link
🇺🇦 Auf radioeins läuft...
Nation Of Language:
🎵 Across That Fine Line
#NowPlaying #NationOfLanguage
https://nationoflanguage.bandcamp.com/track/across-that-fine-line
https://open.spotify.com/track/0naG5PyrfwJQ0xtuQ1BGCM
Last week, we continued our #ISE2025 lecture on distributional semantics with the introduction of neural language models (NLMs) and compared them to traditional statistical n-gram models.
Benefits of NLMs:
- Capturing Long-Range Dependencies
- Computational and Statistical Tractability
- Improved Generalisation
- Higher Accuracy
@…
LLMs are starving for knowledge graphs. Raphael Troncy was pointing out that many LLM company crawlers are constantly visiting their KGs. Some crawlers even perform explicit SPARQL queries on the KGs.
#knowledgegraphs #eswc2025
wiki_users: Wikipedia user interaction (2011)
A network derived from interactions between editors of the English language Wikipedia, as derived from the edit histories of 563 wiki pages related to politics. A positive sign indicates positive links such as trust or similarities, and a negative sign indicates distrust or disagreement.
This network has 138592 nodes and 740397 edges.
Tags: Social, Online, Signed
bible_nouns: Bible noun phrases
A network of noun phrases (places and names) in the King James Version of the Bible. Each node is a noun phrase, and an edge exists if the noun phrases co-occur in a Bible verse. Edge weight denotes how often the two words co-occur.
This network has 1773 nodes and 9131 edges.
Tags: Informational, Language, Weighted
🇺🇦 Auf radioeins läuft...
Nation of Language:
🎵 Inept Apollo
#NowPlaying #NationofLanguage
https://nationoflanguage.bandcamp.com/track/inept-apollo
https://open.spotify.com/track/0YikZXNNkvNlPInLAPMV06
unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted
At the Semantic Digital Humanities 2025 Workshop, Jose Maldonado-Rodríguez is presenting "Natural Language Querying for Humanities #KnowledgeGraphs A case study on the GOLEM KG". Main contribution is a bilingual dataset (English-Spanish) specifically designed to evaluate automatic text-to-SPARQL translation systems for GOLEM, a specialized humanities KG.
paper:
word_assoc: Edinburgh word associations
A network of word associations showing the count of such associations as collected from subjects, from the Edinburgh Associative Thesaurus (EAT). Each node represents a word, and a directed edge (i, j) denotes that word i was used as a stimulus to which word j was given as a response. Multiple edges are allowed.
This network has 23132 nodes and 312342 edges.
Tags: Informational, Language, Unweighted, Multigraph
🇺🇦 Auf radioeins läuft...
Nation of Language:
🎵 Inept Apollo
#NowPlaying #NationofLanguage
https://nationoflanguage.bandcamp.com/track/inept-apollo
https://open.spotify.com/track/0YikZXNNkvNlPInLAPMV06
🇺🇦 Auf radioeins läuft...
Nation of Language:
🎵 Spare Me The Decision
#NowPlaying #NationofLanguage
https://nationoflanguage.bandcamp.com/track/spare-me-the-decision
https://open.spotify.com/track/055hvmkl2KJAg0o3wMJXrF
unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted
reuters: Reuters news stories (1987, 2000)
A bipartite network of Reuters news stories and words. Edges connect each story to all the words it contains.
This network has 1065176 nodes and 60569726 edges.
Tags: Informational, Language, Unweighted
https://networks.skewed.de/net/reuters…
wordnet: WordNet relationships
A network of English words from the WordNet. Node is a word, and edge denotes relationships between words (synonymy, hyperonymy, meronymy, etc.). The date at which this network was extracted from WordNet is not unknown.
This network has 146005 nodes and 656999 edges.
Tags: Informational, Language, Unweighted
htt…
pokec: Pokec online social network (2012)
The online social network of Pokec, a popular OSN in Slovakia, from 2012. Date covers about 10 years and more than 1.6 million people. Profile data contains gender, age, hobbies, interest, education etc. Profile metadata are in Slovak language. Friendships in Pokec are oriented.
This network has 1632804 nodes and 30622564 edges.
Tags: Social, Online, Metadata
trec: TREC collection (2010)
A bipartite network of documents and the words they contain, extracted from NIST's Text Retrieval Conference (TREC) disks 4 and 5, from 2010. These archives contain material drawn from the Financial Times Ltd., the Congressional Record of the 103rd Congress, the Federal Register, the Foreign Broadcast Information Service, and the Los Angeles Times newspaper.
This network has 1729302 nodes and 83629405 edges.
Tags: Informational, Language, Un…
trec: TREC collection (2010)
A bipartite network of documents and the words they contain, extracted from NIST's Text Retrieval Conference (TREC) disks 4 and 5, from 2010. These archives contain material drawn from the Financial Times Ltd., the Congressional Record of the 103rd Congress, the Federal Register, the Foreign Broadcast Information Service, and the Los Angeles Times newspaper.
This network has 1729302 nodes and 83629405 edges.
Tags: Informational, Language, Un…
wiki_talk: Wikipedia talk networks
Interactions among users of 10 language-specific Wikipedias: Arabic, Chinese, Dutch, English, French, German, Italian, Portuguese, Russian, and Spanish. Nodes are registered wiki editors, and an edge represents a user i having written a message on user j's talk page. Edges are timestamped. The precise dates of the snapshots are uncertain.
This network has 41424 nodes and 73900 edges.
Tags: Social, Communication, Unweighted, Multigraph,…
reuters: Reuters news stories (1987, 2000)
A bipartite network of Reuters news stories and words. Edges connect each story to all the words it contains.
This network has 1065176 nodes and 60569726 edges.
Tags: Informational, Language, Unweighted
https://networks.skewed.de/net/reuters…
pokec: Pokec online social network (2012)
The online social network of Pokec, a popular OSN in Slovakia, from 2012. Date covers about 10 years and more than 1.6 million people. Profile data contains gender, age, hobbies, interest, education etc. Profile metadata are in Slovak language. Friendships in Pokec are oriented.
This network has 1632804 nodes and 30622564 edges.
Tags: Social, Online, Metadata