
2025-05-26 11:57:45
{dtrack} makes documentation of data wrangling part of the analysis and creates pretty flow charts: #rstats
pokec: Pokec online social network (2012)
The online social network of Pokec, a popular OSN in Slovakia, from 2012. Date covers about 10 years and more than 1.6 million people. Profile data contains gender, age, hobbies, interest, education etc. Profile metadata are in Slovak language. Friendships in Pokec are oriented.
This network has 1632804 nodes and 30622564 edges.
Tags: Social, Online, Metadata
RIVM Update rioolwaarden en percentage positief.
Met vandaag een afname die alweer tot stilstand lijkt te zijn gekomen.
Dit door 4 nieuwe dagen in de data: 19 t/m 22 juni, met 45%-10% van de meetstations.
#qp2t
"Microplastics are everywhere—but our methods to track them are all over the place"
#Microplastics #Plastic #Plastics
GitHub MCP Exploited: Accessing private repositories via MCP.
#ai
Data management software vendor Rubrik agrees to acquire Predibase, which helps with deploying AI models; source: Rubrik plans to pay between $100M and $500M (Jordan Novet/CNBC)
https://www.cnbc.com/2025/06/25/rubrik-agrees-to-bu…
Antenna: US SVOD subscriptions grew 11% from March 2024 to March 2025; the average churn rate for premium SVOD services has hovered around 5% since January 2023 (Katie Campione/Deadline)
https://deadline.com/2025/06/antenna-s…
Leider werden viele Leute keinen Widerspruch bei Meta eingelegt haben gegen die KI-Verwendung ihrer Facebook/Instagram-Daten, weil sie dachten "Boah, schon wieder so ein nutzloser Ich-widerspreche Post der nicht funktioniert."
Bin da wohl nicht ganz unschuldig dran, hab vor 11 Jahren einen Scherz gepostet, den viele nicht erkannten (siehe ab Zeile 10)
#Meta
«Real-world map data is helping make better games about farms and transportation»
Interesting piece on how games use #OpenStreetMap data. No mention of how those games might accidentally incentivize vandalism to achieve game objectives though!
https://www.theverge.com/games/672035/openstreetmap-data-games
Russian developer Yegor from yegor256.com uses a simple example of two similar approaches to modeling an action, and their implications from an object-oriented design and programming patterns perspective. One of the two approaches provides superior extensibility, data encapsulation, and more flexible error handling.
"remove(42) vs. find(42).remove()"
Arztterminvergabe in Deutschland funzt nicht - zeigt eine Recherche der #SWR-Data-Kolleg*innen:
https://www.swr.de/swraktuell/baden-wuertte…
DoorDash isn't sharing data with ICE to deport drivers | Snopes.com
https://www.snopes.com/fact-check/doordash-shares-data-with-ice-to-deport-drivers/
Stop Uploading Your Data to #Google
http://ignorethecode.net/blog/2025/06/11/stop_uploading_your_data_to_google/
The US is finally building its National Database of Ruin.
https://theintercept.com/2025/05/22/intel-agencies-buying-data-portal-privacy/
Heute vor 58 Jahren: Am 26. Juni 1967 testen die #USA die Atombombe "Midi Mist". Die Operation #Latchkey war eine Serie von 38 US-amerikanischen #Kernwaffentests, die 1966/67 auf der Neva…
これ、マストドンぬいぐるみ新作のプロトタイプだそうだけど、かわいいよねえ……
QT: https://mastodon.social/@Gargron/114746110797394999
Zum 20. Geburtstag der #Vorratsdatenspeicherung im Jahr 2025 freut sich die EU-Kommission über Eure Meinung zur Idee, die #VDS für Internet-Provider zum Zweck der Strafverfolgung einzuführen.
Noch bis zum 30.6. mit SMS für die Authentifizierung, danach nur mit eID, per Facebook od…
Annual DC Privacy Forum: Convening Top Voices in Governance in the Digital Age
https://fpf.org/blog/annual-dc-privacy-forum-convening-top-voices-in-governance-in-the-digital-age/
@…
Centific, an AI "data foundry" that works with businesses to develop, train, and deploy their AI models, raised a $60M Series A led by Granite Asia (Nicholas Gordon/Fortune)
https://fortune.com/asia/2025/06/25/centi…
Jazzaria – Incalculable Gumbo
#byncnd
Ah fuck, people are using LLMs for kernel code. They really are going to fuck over everything, aren't they?
https://lwn.net/SubscriberLink/1026558/9f3079fb392c4a9a/
terrorists_911: 9-11 terrorist network
Network of individuals and their known social associations, centered around the hijackers that carried out the September 11th, 2001 terrorist attacks. Associations extracted after-the-fact from public data. Metadata labels say which plane a person was on, if any, on 9/11.
This network has 62 nodes and 152 edges.
Tags: Social, Offline, Unweighted, Metadata
«Real-world map data is helping make better games about farms and transportation»
Interesting piece on how games use #OpenStreetMap data. No mention of how those games might accidentally incentivize vandalism to achieve game objectives though!
https://www.theverge.com/games/672035/openstreetmap-data-games
Noted while reading: 'a data structure or a block of code are things that make implicit and subjective arguments about how to see the world. This is possibly the single most important basic insight that Digital Humanities as a field needs to impart, because it affects so much of the world around us' - excellent post by @…
Ever wondered how Kubernetes runs on a tractor? At Berlin Buzzwords 2025, join Wieneke Keller & Sebastian Lenartowicz from Aurea Imaging to explore their innovative AgTech journey! Discover how they built an edge device with Python microservices and K3s for precision farming, tackling unique rural challenges.
Learn more:
Every company is undergoing an invisible reorg. You report to your boss but your boss reports to an #AI, offloading the job of management entirely onto a bot and then merely communicating its wishes back to the team.
This is the Nothing Manager, surrounded by #LLM tools to avoid having to interact with…
Blaine Cook, an original Twitter architect is one of the backers of the new Canadian social platform #Gander
https://www.theglobeandmail.com/busi…
Nach einem Brandanschlag auf ein Umspannwerk(?) in Südfrankreich ist dort das Netz teilweise zusammengebrochen. Zeitweise waren mehr als 160k Haushalte betroffen, u.A. auch in Cannes, wo grade das gleichnamige Filmfestival stattfindet.
Spekulation: Dürfte im Bereich Poste électrique du Biançon gewesen sein. Wichtiger Knotenpunkt mit 400/225kV und nem 20MW Wasserkraftwerk (Barrage de Saint-Cassien) dran.
@… New PNG revision dropped
https://social.linux.pizza/@knurd42/114743763415059021
Paris-based Zama, which is developing fully homomorphic encryption tech for blockchain and AI apps, raised a $57M Series B at a $1B valuation (Cate Lawrence/Tech.eu)
https://tech.eu/2025/06/25/zama-becomes-1st-i-fhe-unicorn…
Heute vor 40 Jahren: Am 26. Juni 1985 zündeten die #USA im Rahmen von Operation Grenadier die 12. Atombombe "Maribo". Grenadier war eine Serie von #Kernwaffentests bei der 1984/85 insgesamt 16 Bomben im Testgebiet in
Fascinating read on #Bluesky app view retro engineering https://whtwnd.com/futur.blue/3ls7sbvpsqc2w
During Trump 1.0, esp. in the aftermath of Cambridge Analytica, people demonstrated significant concerns about our data privacy. Consider how Apple & Google deployed Covid Exposure Notifs to be data protecting. During Trump 2.0, we’re witnessing an all-out destruction of our data privacy under Musk & RFK Jr as intentionally decentralized databases are illegally consolidated/appended/blended and feasted upon by crony silicon vulture backed govt contractors.
dom: Animal dominance archive (2022)
Animal dominance interaction data published over a century of research. The archive contains 434 agonistic interaction datasets, totaling over 241,000 interactions. A directed edge (i,j) corresponds to an antagonist interaction between i (winner) and j (loser). If a 'weight' edge property map exists, it counts the number of such interactions.
This network has 14 nodes and 54 edges.
Tags: Social, Animal, Weighted
Another winning podcast by Nick Norwitz:
Cardiologist reacts to breaking cholesterol research: "Data Challenges Dogma"
https://www.youtube.com/watch?v=I9TOMH332eA&ab_channel=NickNorwitz
And the world suddenly became a better place: https://mastodon.social/@verge/114563010203972684
The Federal Court of Justice ruled that German police may forcibly lay the suspect's finger onto their smartphone's fingerprint sensor to unlock it, if there's a search warrant intended to seize the phone and if the data access is proportionate.
Shut down your phone if police tell you to hand it out, I guess 🤷
@…
US prosecutors charge Kai West, who is known as IntelBroker and was arrested in February in France, with conspiring to steal data from dozens of companies (Chris Dolmetsch/Bloomberg)
https://www.bloomberg.com/news/articles/20
Heute vor 58 Jahren: Am 26. Mai 1967 testen die #USA die Atombombe "Absinthe". Die Operation #Latchkey war eine Serie von 38 US-amerikanischen #Kernwaffentests, die 1966/67 auf der Nevada…
Big-Data based AI humanities will be just as boring as big-data based "old-style" Digital Humanities (I'm looking at you, distant reading advocates).
Idly wondered if there are any carolean post boxes, and a search of OSM suggests there's at least eight.
https://overpass-turbo.eu/s/24R8
I think we will soon see an AlphaGo moment somewhere in embodiment. Maybe in robot football?
pi_0 is the Atari moment: https://www.physicalintelligence.company/blog/pi0 We now know that training at scale works and generalizes remarkably well.
This is the trigge…
law_firm: Lazega law firm network
Multiplex network with 3 edge types representing relationships (coworkers, friendship, advice) between partners and associates of a corporate law firm. Data hosted by Manlio De Domenico.
This network has 71 nodes and 2571 edges.
Tags: Social, Offline, Multilayer, Unweighted
https://net…
SocioXplorer: An Interactive Tool for Topic and Network Analysis in Social Data
Sandrine Chausson, Youssef Al Hariri, Walid Magdy, Bj\"orn Ross
https://arxiv.org/abs/2506.18845
Friends Don't Let Friends Make Bad Graphs! Do you agree with the examples of bad graphs and the alternatives Chenxin Li (@chenxinli2.bsky.social) lists at https://github.com/cxli233/FriendsDontLetFriends
Analysis: of 750 data brokers registered in at least one US state, many failed to register in other states with transparency laws, undermining consumer privacy (Electronic Frontier Foundation)
https://www.eff.org/deeplinks/2025/06/why-are…
Heute vor 58 Jahren: Am 26. Mai 1967 testen die #USA die Atombombe "Knickerbocker". Die Operation #Latchkey war eine Serie von 38 US-amerikanischen #Kernwaffentests, die 1966/67 auf der N…
florentine_families: Padgett Florentine families
Multiplex network with 2 edge types representing marriage alliances and business relationships between Florentine families during the Italian Renaissance. Data hosted by Manlio De Domenico.
This network has 16 nodes and 35 edges.
Tags: Social, Relationships, Multilayer, Unweighted
…
No tracking across the web. No surveillance. No selling your data. That's it—that's the privacy policy.
"I fly with Gander. Because ragebait isn’t very Canadian."
#Gander Social Inc
Eventual, which develops Daft, a Python-native open-source data processing engine, raised a $7.5M seed led by CRV and a $20M Series A led by Felicis (Rebecca Szkutak/TechCrunch)
https://techcrunch.com/2025/06/24/how-a-data-pro…
'A Black Hole of Energy Use': Meta's Massive AI Data Center Is Stressing Out a Louisiana Community
https://www.404media.co/a-black-hole-of-energy-use-metas-massive-ai-data-center-is-stressing-out-a-louisiana-community/?ref=daily-stories-newsletter
route_views: Route Views AS graphs (1997-1998)
733 daily network snapshots denoting BGP traffic among autonomous systems (ASs) on the Internet, from the Oregon Route Views Project, spanning 8 November 1997 to 2 January 2000. Data collected by NLANR/MOAT.
This network has 3414 nodes and 6574 edges.
Tags: Technological, Communication, Unweighted, Temporal
RIVM rioolwaarden update.
Met de data van gisteren uiteraard.
Waarin een (vooralsnog lichte) daling te zien is. Twee nieuwe dagen in de data met behoorlijk lage waarden: 16 en 17 juni, met resp. 30% en 20% van de meetstations, zitten beide onder de 200. Deze zouden nog fors opgehoogd kunnen worden maandag, maar ook 14 en 15 juni, met resp. 50% en 40%, geven duidelijk lagere waarden dan de 260-270 waar we op zaten de laatste dagen.
"xAI is facing a lawsuit for operating over 400 MW of gas turbines without permits"
#xAI #AI #ArtificialIntelligence
Inside Amazon's Indiana data center complex: built for Anthropic with plans for ~30 centers, consuming 2.2GW of power and millions of gallons of water per year (New York Times)
https://www.nytimes.com/2025/06/24/technology/amazon-ai-data-centers.html
Medium writer Paolo Perrone curates a short list of interesting algorithms, the rationale behind them, along with graphs and diagrams to boot.
Algorithms that made this short list:
Wave Function Collapse
The Diffusion Model
Simulated Annealing
Sleep Sort
BOGO Sort
BOID
SHOR’s
Marching Cubes
Practical Byzantine Fault Tolerance and,
Boyer Moore
"The 10 Weirdest, Most Brilliant Algorithms Ever Devised and What They Actually Do&…
yahoo_song: Yahoo song ratings (2011)
A bipartite network of users and songs they rated, as used in the 2011 KDD Cup and extracted from Yahoo! Music. Edge weights denote a rating scaled from 0 to 100. More information about this data set is available at http://konect.cc/networks/yahoo-song.
Th…
Heute vor 56 Jahren: Am 27. Mai 1969 zündeten die #USA im Rahmen von Operation Bowline die Atombombe "Torrido". Bowline war eine Serie von #Kernwaffentests bei der 68/69 insgesamt 58 Bomben im Testgebiet in #Nevada
Scale AI used Google Docs to track work for customers like Google, Meta, and xAI, and left confidential AI training documents accessible to anyone with the link (Business Insider)
https://africa.businessinsider.com/new
Heute vor 42 Jahren: Am 26.05.1983 zündeten die #USA im Rahmen von Operation Phalanx die 9. Atombombe "Fahada". Phalanx war eine Serie von #Kernwaffentests bei der 1982/83 insgesamt 19 Bomben größtenteils im Testgebiet in
This is such a good summary of "the problem with 'data science'", as it tries to replace any domain expertise with the misguided notion that some form of statistics.
And so you end up with (maybe even well-meaning) researchers that are so far in over their head that they don't even know what they don't know.
Case in point: the authors of the preprint come from mathematics, computer science, physics, and 'future studies' (lol)…
@…
Heute vor 48 Jahren: Am 25. Mai 1977 zündeten die #USA im Rahmen von Operation Fulcrum die 14. Atombombe "Crewline". Fulcrum war eine Serie von #Kernwaffentests bei der 1976/77 insgesamt 24 Bomben im Testgebiet in
Lets be honest, we spend too much time cleaning data. {janitor} can help with that: #rstats
Heute vor 42 Jahren: Am 26.05.1983 zündeten die #USA im Rahmen von Operation Phalanx die 8. Atombombe "Mini Jade". Phalanx war eine Serie von #Kernwaffentests bei der 1982/83 insgesamt 19 Bomben größtenteils im Testgebiet in
Google, the Earth Fire Alliance, and Muon Space's Fire Sat aims to launch 52 satellites by 2029 to detect wildfires globally and will make the data accessible (Boone Ashworth/Wired)
https://www.wired.com/story/google-earth-fire-alliance-sp…
Heute vor 56 Jahren: Am 27. Mai 1969 zündeten die #USA im Rahmen von Operation Bowline die Atombomben "Ipecac - 1" & "Ipecac - 2". Bowline war eine Serie von #Kernwaffentests bei der 68/69 insgesamt 58 Bomben im Testgebiet in
As Apple rolls out CarPlay Ultra, many carmakers are developing their own infotainment systems in hopes of generating more revenue from in-car services and data (Financial Times)
https://www.ft.com/content/cdd7c98a-10a7-437c-a85a-68f7f9be2b0b
Heute vor 55 Jahren: Am 26. Mai 1970 testen die #USA die Atombombe "Hudson Moon". Operation Mandrel war eine Serie von 53 US-amerikanischen #Kernwaffentests, die 1969 und 1970 hauptsächlich auf der Nevada Test Site in Nevada unterirdisch durchgeführt wurde.
Heute vor 55 Jahren: Am 26.05.1970 testen die #USA 3 Atombomben "Flask-Green", "Flask-Red" und "Flask-Yellow". Operation Mandrel war eine Serie von 53 US-amerikanischen #Kernwaffentests, die 1969/70 hauptsächlich auf der Nevada Test Site in Nevada unterirdisch durch…
Microsoft makes Windows 10's extended security updates free for an extra year for users who sync PC settings via a Microsoft Account and the Windows Backup app (Zac Bowden/Windows Central)
https://www.windowscentral.com/software-ap
eu_procurements_alt: EU national procurement networks (2008-2016)
These 234 networks represent the annual national public procurement markets of 26 European countries from 2008-2016, inclusive. Data is sourced from Tenders Electronic Daily (TED), the official procurement portal of the European Union.
This network has 17345 nodes and 21524 edges.
Tags: Economic, Commerce, Weighted, Temporal
Heute vor 67 Jahren: Am 26.05.1958 kam es bei Enewetak zum Atomtest Operation Hardtack I, "Magnolia". Dieser Test war Teil einer Serie von 35 #Atomtests, die die USA im Sommer 1958 auf den #Marshallinseln im Pazifik durchführten.
At its Discover 2025 conference, HPE unveiled HPE CloudOps Software, GreenLake Intelligence, which can deploy AI agents across its hybrid cloud stack, and more (Larry Dignan/Constellation Research)
https://www.constellationr.com/…
Heute vor 67 Jahren: Am 26.05.1958 kam es bei Enewetak zum Atomtest Operation Hardtack I, "Yellowwood". Dieser Test war Teil einer Serie von 35 #Atomtests, die die USA im Sommer 1958 auf den #Marshallinseln im Pazifik durchführten.
sp_colocation: Social co-locations (2018)
Network of colocations between peoople, based on the information on which RFID readers received information from the RFID tags. Namely, we define two individuals to be in co-presence if the same exact set of readers have received signals from both individuals during a 20s time window.
This network has 81 nodes and 150126 edges.
Tags: Social, Offline, Unweighted, Weighted, Temporal, Metadata
Heute vor 73 Jahren: Am 25. Mai 1952 wurde "Fox", ein Teil der Operation Tumbler–Snapper, auf dem Nevada Testgelände durchgeführt. Ziel war es, die #Einleitungskurve und Ausbeute von #Kernwaffen zu testen. Insgesamt kam es während dieser Operation zu 8
Heute vor 62 Jahren: Am 25. Mai 1963 kam es zum Test "Clean Slate I", Teil der Operation Roller Coaster, eine Serie von 4 US-UK-Tests im Nevada Test Site, die die Verteilung radioaktiver Partikel und deren Eindämmung in einem "Dirty #Bomb"-Szenario untersuchten.
terrorists_911: 9-11 terrorist network
Network of individuals and their known social associations, centered around the hijackers that carried out the September 11th, 2001 terrorist attacks. Associations extracted after-the-fact from public data. Metadata labels say which plane a person was on, if any, on 9/11.
This network has 62 nodes and 152 edges.
Tags: Social, Offline, Unweighted, Metadata
Study: only 32 countries, mostly in the Northern Hemisphere, host AI data centers, with the US, China, and the EU controlling 50% of the world's top facilities (New York Times)
https://www.nytimes.com…
Source: the US House's CAO informed congressional staffers that WhatsApp is now banned on their government devices as the app is deemed "a high-risk to users" (Andrew Solender/Axios)
https://www.axios.com/2025/06/23/whatsapp-house-cong…
faa_routes: FAA Preferred Routes (2010)
A network of air traffic routes, from the FAA (Federal Aviation Administration) National Flight Data Center (NFDC) preferred routes database (www.fly.faa.gov). Date of extraction is prior to 2010. Nodes represent airports or service centers, and a directed edge is the preferred route between airport i and airport j.
This network has 1226 nodes and 2615 edges.
Tags: Transportation, Airport, Unweighted
crime: Rosenfeld crime network (1991)
A network of associations among suspects, victims, and/or witnesses involved in crimes in St. Louis in the 1990s. Data are derived from police records, via snowball sampling from five initial homicides. Left nodes are people, right nodes are crime events, and edges connect people to particular crimes events they were associated with. Metadata includes names, genders, and roles (suspects, victims, and/or witnesses).
This network has 1380 nodes…
Inside the Vera C. Rubin Observatory, whose 3.2-gigapixel camera will produce 60PB of space image data over 10 years, to be analyzed using ML and deep learning (New York Times)
https://www.nytimes.com/2025/06/20/science…
crime: Rosenfeld crime network (1991)
A network of associations among suspects, victims, and/or witnesses involved in crimes in St. Louis in the 1990s. Data are derived from police records, via snowball sampling from five initial homicides. Left nodes are people, right nodes are crime events, and edges connect people to particular crimes events they were associated with. Metadata includes names, genders, and roles (suspects, victims, and/or witnesses).
This network has 1380 nodes…
Uber expands its data-labeling platform Uber AI Solutions to 30 countries, and adds new offerings like tools for developing AI agents and ready-to-use datasets (Richard Nieva/Forbes)
http://www.forbes.com/sites/richardnieva/2025/06/20/uber-scale-data-…
us_roads: United States roads (2000)
The road networks of the 50 US States and the District of Columbia based on UA Census 2000 TIGER/Line Files. Edges are stretches of road and vertices are intersections of roads. The data sets were assembled by Dominik Schultes. The 'merged' network contains all the states merged together.
This network has 116920 nodes and 133415 edges.
Tags: Transportation, Roads, Unweighted
dbpedia_all: DBpedia network (v3.6)
A network among all entries in DBpedia, a project that extracts structured information from Wikipedia. Nodes represent entities in DBpedia and an edge connects two entities based on DBpedia's notion of their relatedness. The data is extracted from the version 3.6 of the database.
This network has 3966924 nodes and 13820853 edges.
Tags: Informational, Relatedness, Unweighted, Multigraph
citeseer: CiteSeer citations (2014)
Citations among papers indexed by the CiteSeer digital library. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) Self-loops may be present.
This network has 384413 nodes and 1751463 edges.
Tags: Informational, Citation, Unweighted
dnc: DNC emails (2016)
A network representing the exchange of emails among members of the Democratic National Committee, in the email data leak released by WikiLeaks in 2016.
This network has 2029 nodes and 12085 edges.
Tags: Social, Communication, Unweighted, Multigraph
https://networks.skewed.de/net/dnc
dutch_school: Dutch school friendships (2003)
A series of snapshots of the friendships among freshmen at secondary school in The Netherlands, in 2003-2004. Friendship ties were surveyed four times, at three month intervals. The direction of an edge indicates that student i is friends with student j. Missing data is coded as 0 or 10. Metadata includes sex, age, ethnicity, and religion.
This network has 26 nodes and 170 edges.
Tags: Social, Offline, Unweighted, Metadata, Temp…
arxiv_citation: arXiv citation networks (1993-2003)
Citations among papers posted on arxiv.org under the hep-ph and hep-th categories, between 1993 and 2003. This time begins a few months after axiv was launched. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) These data were originally released as part of the 2003 KDD Cup.
This network has 27770 nodes and 352807 edges.
Tags: Informational,…