Tootfinder

@adulau@infosec.exchange
2026-06-21 22:01:33

Training an LLM on a heavily cleaned, de-identified corpus can be like correcting every grammatical mistake in a large collection of texts: the result may look cleaner, but it can also lose the context, variation, and imperfections that reflect real-world language and behaviour.
A corpus scrubbed of every sensitive detail and irregularity can become a polished imitation of reality. Privacy protection could be necessary, but a model trained mostly on synthetic or over-sanitised data ris…

@davidaugust@mastodon.online
2026-06-15 17:09:41

“Some figures in the White House privately called suspending habeas corpus ‘insane.’”
Somehow, amidst some of the most destructive and monstrous moves of the US government under this administration, even some of their most extreme adherents might have limits.
Oddly heartening. Monsters may have limits too.

Frustrated by Courts, Trump Weighed Suspending a Constitutional Right
Secret memos show that the White House debated last year, to a greater degree than previously known, whether to limit habeas corpus rights for undocumented immigrants.

@memeorandum@universeodon.com
2026-06-17 18:25:46

Congress to Launch Probe into Trump White House Idea for Suspending Habeus Corpus (Scott MacFarlane/MeidasTouch)
https://meidasnews.com/news/congress-to-launch-probe-into-trump-white-house-idea-for-suspending-habeus-corpus
http://www.memeorandum.com/260617/p82#a260617p82

@tschfflr@fediscience.org
2026-07-17 16:19:08

"which emojis are used the least often?" - Great question, and we rarely find data on this! But luckily, we studied this at least for two corpora with German users (and admittedly only for face emojis). So: Many object or symbol emojis (from the "tail end" of the emoji list) are very, very rarely used. Overall, faces and hearts are the most frequent. Among the face emojis, we compiled the full list of usages from a huge Twitter corpus (semi-public communication, 2014-2022) and a much small WhatsApp chat corpus (private or group communication). You can find the frequencies in this table linked below (and sort by frequency by clicking on the column). The least frequent face emoji in the large corpus was the "frowning face with open mouth" 😦, which most people have never seen before. Personally, I think it's pretty ambiguous and not very useful, as there are better emojis for anything it might express. #emojis #linguistics #WorldEmojiDay
https://tscheffler.github.io/2024-Face-Emoji-Norming/ratings.html

@cdarwin@c.im
2026-06-15 16:17:28

Last spring, Will Scharf,
an arch-conservative lawyer serving as the White House staff secretary,
wrote a secret memo to the chief of staff that reflected growing unease in the West Wing
about one of the extreme measures being weighed by Stephen Miller,
the powerful adviser driving President Trump’s deportation campaign.
Dated April 29, 2025, and stamped “confidential,” the memo was careful and lawyerly
but amounted to a warning against end-running the rul…

Frustrated by Courts, Trump Weighed Suspending a Constitutional Right
Secret memos show that the White House debated last year, to a greater degree than previously known, whether to limit habeas corpus rights for undocumented immigrants.

@servelan@newsie.social
2026-05-13 03:07:58

Corpus Christi eyes 25% cut in water use if emergency called
https://www.texastribune.org/2026/05/12/texas-corpus-christi-water-emergency-restrictions-vote/

@netzschleuder@social.skewed.de
2026-06-13 16:00:06

email_enron: Email network (Enron corpus)
The Enron email corpus, containing all the email communication from the Enron corporation, which was made public as a result of legal action. Nodes are email addresses and node i links to node j if i sent at least one email to address j. Non-Enron email addresses are also present, but only their links to/from Enron addresses are observed.
This network has 36692 nodes and 367662 edges.
Tags: Social, Communication, Unweighted, Multigr…

email_enron: Email network (Enron corpus). 36692 nodes, 367662 edges. https://networks.skewed.de/net/email_enron

@memeorandum@universeodon.com
2026-06-15 20:50:41

White House lawyer raised alarms after Stephen Miller floated ending habeas corpus for illegal immigrants (Kaelan Deese/Washington Examiner)
https://www.washingtonexaminer.com/news/justice/4608769/white-house-lawyer-raised-alarm-miller-habeas-corpus/
http://www.memeorandum.com/260615/p89#a260615p89

@cwensel@fosstodon.org
2026-06-10 17:20:57

Arcaneum 0.7.0 is out
https://github.com/cwensel/arcaneum/releases/tag/v0.7.0

Release v0.7.0 · cwensel/arcaneum
Features corpus: improve corpus list display (40b8c24) cli: expose model catalog policy (c1b4ed1) cli: show corpus sync metadata in list (e596cb9) cli: add global json output mode (30f2636) cli: a...

@servelan@newsie.social
2026-06-18 03:57:28

"Democrats on the House Oversight Committee are now demanding answers from the White House. They want every document, every memo tied to two things: suspending habeas corpus, and invoking the Insurrection Act."
BREAKING: Investigation Launched Into White House Plot to Suspend Constitutional Rights, Vance Spends His Book Tour Doing Damage Control, Kash Patel Messes Up Again, and more..
https://www.adamkinzinger.com/p/breaking-investigation-launched-into

@BBC3MusicBot@mastodonapp.uk
2026-06-20 11:59:15

🇺🇦 #NowPlaying on BBCRadio3's #EarlierWithJoolsHolland
Wolfgang Amadeus Mozart, Choir of King’s College, Cambridge & Stephen Cleobury:
🎵 Ave verum corpus
#WolfgangAmadeusMozart #ChoirofKingsCollege #Cambridge #StephenCleobury

@avstockhausen@fedihum.org
2026-05-29 11:00:03

Bookmarked: CLLG — Corpus Liberatum Linguae Graecae #Digital_Editions #Digital_Humanities

CLLG — Corpus Liberatum Linguae Graecae
An open-science initiative for freely accessible ancient Greek texts via a complete pipeline from scanned editions to structured TEI XML.

@netzschleuder@social.skewed.de
2026-06-07 19:00:06

email_enron: Email network (Enron corpus)
The Enron email corpus, containing all the email communication from the Enron corporation, which was made public as a result of legal action. Nodes are email addresses and node i links to node j if i sent at least one email to address j. Non-Enron email addresses are also present, but only their links to/from Enron addresses are observed.
This network has 36692 nodes and 367662 edges.
Tags: Social, Communication, Unweighted, Multigr…

@servelan@newsie.social
2026-06-15 19:28:14

Trump considered suspending habeas corpus
Secret memos show Trump admin seriously weighed gutting key constitutional right: report - Raw Story
https://www.rawstory.com/stephen-miller-2677042197/

@arXiv_eessAS_bot@mastoxiv.page
2026-05-13 07:42:29

Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
Guojian Li, Zhixian Zhao, Zhennan Lin, Jingbin Hu, Qirui Zhan, Yuang Cao, Pengyuan Xie, Chuan Xie, Jie Liu, Qiang Zhang, Zhonghua Fu, Lei Xie
https://arxiv.org/abs/2605.12036 https://arxiv.org/pdf/2605.12036 https://arxiv.org/html/2605.12036
arXiv:2605.12036v1 Announce Type: new
Abstract: While speech Large Language Models (LLMs) excel at conventional tasks like basic speech recognition, they lack fine-grained, multi-dimensional perception. This deficiency is evident in their struggle to disentangle complex features like micro-acoustic cues, acoustic scenes, and paralinguistic signals. This resulting incomplete comprehension of real-world speech fundamentally bottlenecks the development of perceptive and empathetic next-generation speech systems. At its core, this persistent perceptual limitation primarily stems from three interacting factors: scarce high-quality expressive data, absent fine-grained modeling for multi-dimensional attributes, and reliance on restricted coverage, coarse-grained benchmarks. We address these challenges through three pillars: First, our robust data curation pipeline resolves complex acoustic environments and long-audio timestamp alignment challenges to extract a high-quality spontaneous speech corpus from audiovisual sources. Second, we construct FMSU-Bench, a pioneering benchmark covering 14 speech attribute dimensions to rigorously assess the fine-grained, multi-dimensional speech understanding capabilities of current models. Third, empowered by our curated corpus, we introduce FM-Speech. Driven by a decoupled attribute modeling and progressive curriculum fine-tuning framework, it substantially elevates fine-grained, multi-dimensional acoustic perception. Extensive evaluations on FMSU-Bench reveal that current speech LLMs still require significant improvement in multi-dimensional, fine-grained understanding. In contrast, FM-Speech substantially outperforms current open-source models, establishing a robust paradigm for real-world speech understanding.
toXiv_bot_toot

@davej@dice.camp
2026-05-01 22:45:21

I’m not even American, and I knew that Galveston and Corpus Christi had major flooding disasters in the early 20th century. By all means, sound the alarms on climate change, but for fuck's sake, crack a book once in a while. Posting easily refuted rhetoric does nothing to help the cause. https://mastodon.cc/@dtm/1165016885218…

David Todd McCarty (@dtm@mastodon.cc)
To better grasp climate change all you have to know is that you have never once before thought of Texas as a place that floods.

@servelan@newsie.social
2026-06-15 21:23:23

Trump considered suspending habeas corpus
Secret memos show Trump admin seriously weighed gutting key constitutional right: report - Raw Story
https://www.rawstory.com/stephen-miller-2677042197/

@netzschleuder@social.skewed.de
2026-05-05 23:00:05

email_enron: Email network (Enron corpus)
The Enron email corpus, containing all the email communication from the Enron corporation, which was made public as a result of legal action. Nodes are email addresses and node i links to node j if i sent at least one email to address j. Non-Enron email addresses are also present, but only their links to/from Enron addresses are observed.
This network has 36692 nodes and 367662 edges.
Tags: Social, Communication, Unweighted, Multigr…

@gwire@mastodon.social
2026-05-05 14:40:32

If AI isn't sentient then why does it sound exactly like the sentient AIs in decades of sci-fi stories that we scanned and added to the training corpus?

@cdarwin@c.im
2026-07-07 05:27:55

Ospreys in the Chesapeake Bay Are Starving to Death at Disastrous Rates.
What Will It Take to Save Them?
After a spectacular comeback from DDT, the Osprey population has plummeted within the watershed and is showing signs of trouble elsewhere.
The birds’ fate may once more rest on collective action

Ospreys in the Chesapeake Bay Are Starving to Death at Disastrous Rates. What Will It Take to Save Them?
After a spectacular comeback from DDT, the Osprey population has plummeted within the watershed and is showing signs of trouble elsewhere. The birds’ fate may once more rest on collective action.

@aredridel@kolektiva.social
2026-05-23 15:14:17

Personalization systems are actually the AI technology I have the most beef with: I don't think we've fully reckoned with the algorithmic manipulation of information they do. Suddenly the info sphere is not an object you can interrogate, but you become the object, and the infosphere around us is made into a mirror.
Weirdly, LLMs used well are actually _better_ about this because they're somewhat more able to be interrogated (though naive questioning is not probing so much why as one would hope). But the breadth of the corpus and the freedom we have to query within it are currently new capabilities. Not to say that some companies aren't well on their way to researching how to personalize that into a reflection again. Google must be stopped, and I trust OpenAI just as little. People's chats still end up adding quite a reflective lens on things, but the raw access is still there if we want to make use of it.

@netzschleuder@social.skewed.de
2026-05-03 18:00:05

email_enron: Email network (Enron corpus)
The Enron email corpus, containing all the email communication from the Enron corporation, which was made public as a result of legal action. Nodes are email addresses and node i links to node j if i sent at least one email to address j. Non-Enron email addresses are also present, but only their links to/from Enron addresses are observed.
This network has 36692 nodes and 367662 edges.
Tags: Social, Communication, Unweighted, Multigr…

@BBC3MusicBot@mastodonapp.uk
2026-07-02 18:30:40

🔊 #NowPlaying on #BBCRadio3:
#Radio3InConcert
- BBC Singers in Paris
The BBC Singers and the Maîtrise Notre-Dame de Paris present a special concert of Bach Motets, Roderick Williams' Ave Verum Corpus Re-Imagined and Martin's Mass for Double Choir.
Relisten now 👇
https://www.bbc.co.uk/programmes/m002y2t7

@netzschleuder@social.skewed.de
2026-04-27 21:00:06

email_enron: Email network (Enron corpus)
The Enron email corpus, containing all the email communication from the Enron corporation, which was made public as a result of legal action. Nodes are email addresses and node i links to node j if i sent at least one email to address j. Non-Enron email addresses are also present, but only their links to/from Enron addresses are observed.
This network has 36692 nodes and 367662 edges.
Tags: Social, Communication, Unweighted, Multigr…

@arXiv_physicscompph_bot@mastoxiv.page
2026-07-03 08:51:47

Crosslisted article(s) found for physics.comp-ph. https://arxiv.org/list/physics.comp-ph/new
[1/1]:
- Hybrid Two-Level Transport Method with Solution Decomposition in Macro and Micro Components
Caleb A. Shaw, Dmitriy Y. Anistratov
https://arxiv.org/abs/2607.01346 https://mastoxiv.page/@arXiv_mathNA_bot/116854910030264363
- Predicting Novel Stable Materials for Experimental Synthesis
Yuqi An, Sihong Zhu, Joseph Montoya, Xingyu Guo, Zhenbin Wang
https://arxiv.org/abs/2607.01713 https://mastoxiv.page/@arXiv_condmatmtrlsci_bot/116854978846171607
- An Optimisation Framework for the Well-Conditioned Training of Physics-Informed Neural Networks
Joseph Webb, Sadok Jerad, Coralia Cartis
https://arxiv.org/abs/2607.02194 https://mastoxiv.page/@arXiv_csLG_bot/116855143607520453
- Efficient Large-Scale STEM-EELS Simulations With Torched-TACAW
Martin Osmera, Jo\~ao Vaz, Paul M. Zeiger, J\'an Rusz
https://arxiv.org/abs/2607.02236 https://mastoxiv.page/@arXiv_condmatmtrlsci_bot/116855081674592173
- Grounded autonomous research: a fault-tolerant LLM pipeline from corpus to manuscript in frontier...
Haonan Huang
https://arxiv.org/abs/2607.02329 https://mastoxiv.page/@arXiv_csAI_bot/116855126306616827
- Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic P...
Harari, Zimmermann, Kulseng, Zichi, Tan, Descoteaux, Kozinsky
https://arxiv.org/abs/2607.02499 https://mastoxiv.page/@arXiv_csLG_bot/116855152847861336
toXiv_bot_toot

@netzschleuder@social.skewed.de
2026-05-25 05:00:06

email_enron: Email network (Enron corpus)
The Enron email corpus, containing all the email communication from the Enron corporation, which was made public as a result of legal action. Nodes are email addresses and node i links to node j if i sent at least one email to address j. Non-Enron email addresses are also present, but only their links to/from Enron addresses are observed.
This network has 36692 nodes and 367662 edges.
Tags: Social, Communication, Unweighted, Multigr…

@BBC3MusicBot@mastodonapp.uk
2026-06-17 22:20:19

🇺🇦 #NowPlaying on BBCRadio3's #NightTracks
Wolfgang Amadeus Mozart, Víkingur œlafsson & Franz Liszt:
🎵 Ave Verum Corpus K.618
#WolfgangAmadeusMozart #VíkingurÓlafsson #FranzLiszt
https://open.spotify.com/track/6HPnobfvxQk0J56gx1OgB3

@BBC3MusicBot@mastodonapp.uk
2026-04-22 20:45:58

🔊 #NowPlaying on #BBCRadio3:
#TheEssay
- The Death and Life of Christopher Marlowe
Jerry Brotton visits Corpus Christi College in Cambridge, where Kit Marlowe studied and was transformed from scholarship boy to gentleman – and spy.
Relisten now 👇
https://www.bbc.co.uk/programmes/m002v6z6

@BBC3MusicBot@mastodonapp.uk
2026-07-21 09:18:24

🇺🇦 #NowPlaying on BBCRadio3's #EssentialClassics
Tenebrae, Wolfgang Amadeus Mozart, The Chamber Orchestra of Europe & Nigel Short:
🎵 Ave verum corpus, K 618
#Tenebrae #WolfgangAmadeusMozart #NigelShort
https://open.spotify.com/track/6nYkXGzsBMInJ5WLrOYT7f

@BBC3MusicBot@mastodonapp.uk
2026-06-13 16:42:07

🇺🇦 #NowPlaying on BBCRadio3's #ThisClassicalLife
William Byrd, Roderick Williams, ORA Singers & Suzi Digby:
🎵 Ave Verum Corpus Re-imagined
#WilliamByrd #RoderickWilliams #ORASingers #SuziDigby

@BBC3MusicBot@mastodonapp.uk
2026-07-19 14:45:16

🇺🇦 #NowPlaying on BBCRadio3's #SofiJeanninSingingTogether
Peter Maxwell Davies & BBC Singers:
🎵 Corpus Christi, with Cat and Mouse
#PeterMaxwellDavies #BBCSingers

@BBC3MusicBot@mastodonapp.uk
2026-06-10 18:03:58

🇺🇦 #NowPlaying on BBCRadio3's #ClassicalMixtape
Wolfgang Amadeus Mozart, Albrecht Mayer, The Deutsche Kammerphilharmonie Bremen & Vital Julian Frey:
🎵 Ave verum corpus, K 618 (arr. cor anglais, strings & orchestra)
#WolfgangAmadeusMozart #AlbrechtMayer #VitalJulianFrey

@BBC3MusicBot@mastodonapp.uk
2026-07-10 14:29:55

🇺🇦 #NowPlaying on BBCRadio3's #ClassicalLive
William Byrd, Christian Forshaw, Christian Forshaw, BBC Singers & Graham Ross:
🎵 Ave verum corpus
#WilliamByrd #ChristianForshaw #BBCSingers #GrahamRoss

@BBC3MusicBot@mastodonapp.uk
2026-05-05 14:45:59

🇺🇦 #NowPlaying on BBCRadio3's #ClassicalLive
William Byrd & Stile Antico:
🎵 Ave Verum Corpus
#WilliamByrd #StileAntico
https://open.spotify.com/track/7BrXygaE3fz0fyLUqvcTEX

@BBC3MusicBot@mastodonapp.uk
2026-07-02 18:48:43

🇺🇦 #NowPlaying on BBCRadio3's #Radio3InConcert
Roderick Williams, BBC Singers & Sofi Jeannin:
🎵 Ave verum corpus Re-imagined
#RoderickWilliams #BBCSingers #SofiJeannin
https://open.spotify.com/track/7jbVbEmKjCzloSINVICHLw

@BBC3MusicBot@mastodonapp.uk
2026-06-02 11:38:21

🇺🇦 #NowPlaying on BBCRadio3's #EssentialClassics
Tenebrae, Wolfgang Amadeus Mozart, The Chamber Orchestra of Europe & Nigel Short:
🎵 Ave verum corpus, K 618
#Tenebrae #WolfgangAmadeusMozart #NigelShort
https://open.spotify.com/track/6nYkXGzsBMInJ5WLrOYT7f

@BBC3MusicBot@mastodonapp.uk
2026-04-26 19:42:26

🇺🇦 #NowPlaying on BBCRadio3's #WordsAndMusic
Roderick Williams, ORA Singers & Suzi Digby:
🎵 Ave verum corpus Re-imagined
#RoderickWilliams #ORASingers #SuziDigby
https://open.spotify.com/track/7jbVbEmKjCzloSINVICHLw

@BBC3MusicBot@mastodonapp.uk
2026-06-23 03:16:30

🇺🇦 #NowPlaying on BBCRadio3's #ThroughTheNight
Imant Raminsh, Vancouver Chamber Choir & Jon Washburn:
🎵 Ave Verum Corpus
#ImantRaminsh #VancouverChamberChoir #JonWashburn

@BBC3MusicBot@mastodonapp.uk
2026-05-23 00:59:30

🇺🇦 #NowPlaying on BBCRadio3's #ThroughTheNight
William Byrd, Ars Nova Copenhagen & Sofi Jeannin:
🎵 Ave verum corpus
#WilliamByrd #ArsNovaCopenhagen #SofiJeannin
https://open.spotify.com/track/7J6b58JOnf4RGGqJmxq0bD

Tootfinder

Opt-in global Mastodon full text search. Join the index!