Tootfinder

Opt-in global Mastodon full text search. Join the index!

@Techmeme@techhub.social
2025-06-10 20:45:52

Twenty-seven states and DC sue 23andMe to oppose the sale of DNA data from its customers without their direct consent (Rylee Kirk/New York Times)
nytimes.com/2025/06/10/busines

@arXiv_csHC_bot@mastoxiv.page
2025-07-11 07:42:51

Dirty Data in the Newsroom: Comparing Data Preparation in Journalism and Data Science
Stephen Kasica, Charles Berret, Tamara Munzner
arxiv.org/abs/2507.07238

@kubikpixel@chaos.social
2025-06-12 05:25:25

Web 3.0 Requires Data Integrity
New integrity-focused standards are necessary to enable the trusted AI services of tomorrow.
🔐 cacm.acm.org/opinion/web-3-0-r

@datascience@genomic.social
2025-06-12 10:00:01

{nplyr} has helper functions to work on nested dataframes: #rstats #datascience

@lightweight@mastodon.nzoss.nz
2025-06-11 05:06:51

Wow. themarkup.org/pixel-hunt/2025/ My opinion of LinkedIn was already vanishingly low (one way to ensure I won't read something is to po…

@metacurity@infosec.exchange
2025-06-11 09:22:32

theguardian.com/society/2025/j
Public health bodies urged to launch period tracking apps to protect data

@arXiv_csCY_bot@mastoxiv.page
2025-06-12 07:24:11

Understanding and Improving Data Repurposing
J. Parsons, R. Lukyanenko, B. Greenwood, C. Cooper
arxiv.org/abs/2506.09073

@muz4now@mastodon.world
2025-07-11 12:28:23

The Duty Comes From the Data: Rethinking Platform Liability in the Age of Algorithmic Harm
musictechpolicy.com/2025/07/05

@NFL@darktundra.xyz
2025-06-11 19:54:46

NFL, Genius Sports extend, expand data deal espn.com/nfl/story/_/id/454935

@chiraag@mastodon.online
2025-06-10 13:25:43

From @… for @… :

@arXiv_csDC_bot@mastoxiv.page
2025-07-11 08:06:21

Analysing semantic data storage in Distributed Ledger Technologies for Data Spaces
Juan Cano-Benito, Andrea Cimmino, Sven Hertling, Heiko Paulheim, Ra\'ul Garc\'ia-Castro
arxiv.org/abs/2507.07116

@Techmeme@techhub.social
2025-06-11 07:56:17

President Trump's spending bill could limit local control over zoning and environmental regulations for AI data centers, worrying state lawmakers (Molly Taft/Wired)
wired.com/story/a-political-ba

@gwire@mastodon.social
2025-06-10 23:51:40

Just scrolling through the "Tree Preservation Orders" MHCLG has published geo-data for.
planning.data.gov.uk/dataset/t

@netzschleuder@social.skewed.de
2025-07-11 07:00:05

arxiv_citation: arXiv citation networks (1993-2003)
Citations among papers posted on arxiv.org under the hep-ph and hep-th categories, between 1993 and 2003. This time begins a few months after axiv was launched. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) These data were originally released as part of the 2003 KDD Cup.
This network has 27770 nodes and 352807 edges.
Tags: Informational,…

arxiv_citation: arXiv citation networks (1993-2003). 27770 nodes, 352807 edges. https://networks.skewed.de/net/arxiv_citation#HepTh
@newsie@darktundra.xyz
2025-06-10 13:07:09

Airlines Don't Want You to Know They Sold Your Flight Data to DHS 404media.co/airlines-dont-want

@mia@hcommons.social
2025-06-11 20:24:27

Very excited about this! Code to access GRIN will help lots of Google Books partners, and the example might open other doors, as well as the obvious benefits of access to data!
'Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability' arxiv.org/abs/2506…

@blaise@mastodon.cloud
2025-06-10 19:07:17

It's almost like the only way to prevent your entire life from being used against you is for someone to pass a law...
404media.co/airlines-dont-want

@arXiv_csLG_bot@mastoxiv.page
2025-06-12 09:13:21

CFMI: Flow Matching for Missing Data Imputation
Vaidotas Simkus, Michael U. Gutmann
arxiv.org/abs/2506.09258 arxiv.or…

@arXiv_csDB_bot@mastoxiv.page
2025-06-11 07:27:43

RADAR: Benchmarking Language Models on Imperfect Tabular Data
Ken Gu, Zhihan Zhang, Kate Lin, Yuwei Zhang, Akshay Paruchuri, Hong Yu, Mehran Kazemi, Kumar Ayush, A. Ali Heydari, Maxwell A. Xu, Girish Narayanswamy, Yun Liu, Ming-Zher Poh, Yuzhe Yang, Mark Malhotra, Shwetak Patel, Hamid Palangi, Xuhai Xu, Daniel McDuff, Tim Althoff, Xin Liu

@memeorandum@universeodon.com
2025-06-11 22:31:03

Trump's DOJ makes its most sweeping demand for election data yet (NPR)
npr.org/2025/06/11/nx-s1-54260
memeorandum.com/250611/p142#a2

@servelan@newsie.social
2025-06-10 23:21:17

States sue to block 23andMe from auctioning genetic data in bankruptcy plan | Courthouse News Service
courthousenews.com/states-sue-

@metacurity@infosec.exchange
2025-06-11 13:44:19

Check out today's Metacurity for a concise round-up of the critical infosec developments you should know, including
--Cambridge researchers warn that private companies are harvesting period tracker data
--United Natural Foods says system restoration likely by June 15,
--Gabbard wants feds to use private sector for intel tech needs,
--States sue to stop sale of 23andMe DNA data,
--Microsoft issues at least 67 patches,
--Microsoft fixes zero day exploited…

@tante@tldr.nettime.org
2025-06-11 09:59:04

Calculating water/energy usage for "AI" per token is a bit problematic: A data center has a massive base load even if nobody uses it just by sheer existence. And since we have no actual data for any of the popular platforms all numbers floating around are problematic and not very useful.
Like how much power does one of those servers NVIDIA cards really save if its utilization is only 50%? And are the overhead costs actually counted?

@arXiv_eessSY_bot@mastoxiv.page
2025-06-12 08:04:21

Data-Driven Nonlinear Regulation: Gaussian Process Learning
Telema Harry, Martin Guay, Shimin Wang, Richard D. Braatz
arxiv.org/abs/2506.09273

@arXiv_csCL_bot@mastoxiv.page
2025-06-12 08:37:51

Error-Guided Pose Augmentation: Enhancing Rehabilitation Exercise Assessment through Targeted Data Generation
Omar Sherif, Ali Hamdi
arxiv.org/abs/2506.09833

@privacity@social.linux.pizza
2025-05-09 23:11:41

Consent for Processing Personal Data in the Age of AI: Key Updates Across Asia-Pacific
fpf.org/blog/consent-for-proce

@noellabo@fedibird.com
2025-06-12 09:37:09

こいつ…動くぞ!
(ぞーぺんのiOS版を作る、技術検証のやつです。まだ長いよ!)
QT: fedibird.com/@takke/1146694283

@arXiv_csCR_bot@mastoxiv.page
2025-06-11 07:38:23

Lightweight Electronic Signatures and Reliable Access Control Included in Sensor Networks to Prevent Cyber Attacks from Modifying Patient Data
Mishall Al-Zubaidie
arxiv.org/abs/2506.08828

@arXiv_statME_bot@mastoxiv.page
2025-06-11 09:52:35

Gaussian copula correlation network analysis with application to multi-omics data
Ekaterina Tomilina (MaIAGE, GABI), Florence Jaffr\'ezic (GABI), Gildas Mazo (MaIAGE)
arxiv.org/abs/2506.08586

@johl@mastodon.xyz
2025-06-09 13:28:02

A five-month investigation found that data centers in Mexico, Chile, South Africa, the Netherlands, and the U.S. overhyped their economic benefits and downplayed environmental damage, especially their water use, even in drought zones.
themaybe.org/research/data-cen

@arXiv_astrophHE_bot@mastoxiv.page
2025-07-11 09:57:21

Calibrated Lanthanide Atomic Data for Kilonova Radiative Transfer. I. Atomic Structure and Opacities
Andreas Fl\"ors, Ricardo Ferreira da Silva, Jos\'e P. Marques, Jorge M. Sampaio, Gabriel Mart\'inez-Pinedo
arxiv.org/abs/2507.07785

@arXiv_csDL_bot@mastoxiv.page
2025-06-12 07:30:11

Linking Data Citation to Repository Visibility: An Empirical Study
Fakhri Momeni, Janete Saldanha Bach, Brigitte Mathiak, Peter Mutschke
arxiv.org/abs/2506.09530

@marcel@waldvogel.family
2025-06-09 10:03:45

Data Exfiltration in plain sight.
“Internet dead zones”, of course!
NOT!
thedailybeast.com/elon-musks-d

@arXiv_csNI_bot@mastoxiv.page
2025-06-12 07:49:31

Real-Time Network Traffic Forecasting with Missing Data: A Generative Model Approach
Lei Deng, Wenhan Xu, Jingwei Li, Danny H. K. Tsang
arxiv.org/abs/2506.09647

@roelgrif@mstdn.social
2025-06-11 14:38:59

RIVM update rioolwaarden en percentage positief.
We lijken in deze 16e golf andermaal op een plateau beland te zijn van ongeveer 280. Echter, wegens Pinksteren loopt de verwerking van de data zo'n 3 dagen achter en de dagwaarden variëren nogal, dus houd daar rekening mee.
Er zitten dan ook maar 2 nieuwe dagen in de data: 4 en 5 juni, met resp. 35% en 25% van de meetstations.
#qp2t

@arXiv_astrophCO_bot@mastoxiv.page
2025-07-11 08:55:41

Exploring non-cold dark matter in a scenario of dynamical dark energy with DESI DR2 data
Tian-Nuo Li, Peng-Ju Wu, Guo-Hong Du, Yan-Hong Yao, Jing-Fei Zhang, Xin Zhang
arxiv.org/abs/2507.07798

@arXiv_csPL_bot@mastoxiv.page
2025-06-11 07:49:04

Gradual Metaprogramming
Tianyu Chen, Darshal Shetty, Jeremy G. Siek, Chao-Hong Chen, Weixi Ma, Arnaud Venet, Rocky Liu
arxiv.org/abs/2506.09043

@arXiv_csRO_bot@mastoxiv.page
2025-07-11 09:12:11

Data-driven Kinematic Modeling in Soft Robots: System Identification and Uncertainty Quantification
Zhanhong Jiang, Dylan Shah, Hsin-Jung Yang, Soumik Sarkar
arxiv.org/abs/2507.07370

@arXiv_csCV_bot@mastoxiv.page
2025-06-10 19:11:31

This arxiv.org/abs/2506.06155 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCV_…

@paulomalley@c.im
2025-05-12 04:30:26

Stop battling confusing Google Sheets charts when you've got different types of data! 🙅‍♀️ There's a much better way to show everything clearly.
My new video dives deep into Combo Charts, making even wildly different scales (like baby names & wind energy! 🌬️👶 – yes, really!) look perfectly clear with dual axes. It’s all about making your data understandable at a glance.
Check out the full tutorial for the how-to:

@arXiv_csAI_bot@mastoxiv.page
2025-07-11 07:35:41

Neurosymbolic Feature Extraction for Identifying Forced Labor in Supply Chains
Zili Wang, Frank Montabon, Kristin Yvonne Rozier
arxiv.org/abs/2507.07217

@cheryanne@aus.social
2025-06-10 06:33:16

High Signal: Data Science | Career | AI
Great Australian Pods Podcast Directory: #GreatAusPods

High Signal: Data Science | Career | AI
Screenshot of the podcast listing on the Great Australian Pods website
@arXiv_csGR_bot@mastoxiv.page
2025-07-11 08:53:21

Hi-d maps: An interactive visualization technique for multi-dimensional categorical data
Radi Muhammad Reza, Benjamin A Watson
arxiv.org/abs/2507.07890

@Techmeme@techhub.social
2025-07-11 16:10:56

Virtru, which offers data security services to clients like the US DOD and Salesforce, raised a $50M Series D led by Iconiq Capital at a $500M valuation (Allie Garfinkle/Fortune)
fortune.com/2025/07/11/exclusi

@pbloem@sigmoid.social
2025-07-11 18:21:41

Interesting stuff. This touches on a lot of work that was on my todo list. Especially estimating the "interestingness" of data by measuring the maximum compute needed to reach the optimal compression.
AIT is definitely becoming practically relevant.
arxiv.org/abs/2507.07995

@dawid@social.craftknight.com
2025-07-11 17:26:01

Bardzo udany "film z dnia" vanlifera :) oczywiście nie mój, ale to jeden z autentyczniejszych kanałów, które śledzę i nowa forma POV w tym odcinku https://youtu.be/VTNR2vFPXcI

#vanlife #yt

@arXiv_grqc_bot@mastoxiv.page
2025-06-11 09:25:55

GW170817 Viable Einstein-Gauss-Bonnet Inflation Compatible with the Atacama Cosmology Telescope Data
S. D. Odintsov, V. K. Oikonomou
arxiv.org/abs/2506.08193

@netzschleuder@social.skewed.de
2025-07-11 06:00:04

arxiv_citation: arXiv citation networks (1993-2003)
Citations among papers posted on arxiv.org under the hep-ph and hep-th categories, between 1993 and 2003. This time begins a few months after axiv was launched. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) These data were originally released as part of the 2003 KDD Cup.
This network has 27770 nodes and 352807 edges.
Tags: Informational,…

arxiv_citation: arXiv citation networks (1993-2003). 27770 nodes, 352807 edges. https://networks.skewed.de/net/arxiv_citation#HepTh
@arXiv_csSE_bot@mastoxiv.page
2025-06-11 08:07:15

MBTModelGenerator: A software tool for reverse engineering of Model-based Testing (MBT) models from clickstream data of web applications
Sasidhar Matta, Vahid Garousi
arxiv.org/abs/2506.08179

@arXiv_mathNA_bot@mastoxiv.page
2025-06-12 08:25:31

An Introduction to Solving the Least-Squares Problem in Variational Data Assimilation
I. Dau\v{z}ickait\.e, M. A. Freitag, S. G\"urol, A. S. Lawless, A. Ramage, J. A. Scott, J. M. Tabeart
arxiv.org/abs/2506.09211

@arXiv_csGT_bot@mastoxiv.page
2025-07-11 07:35:31

Incentive Mechanism for Mobile Crowd Sensing with Assumed Bid Cost Reverse Auction
Jowa Yangchin, Ningrinla Marchang
arxiv.org/abs/2507.07688

@stephane_klein@social.coop
2025-07-11 21:45:12

J'ai découvert les noms "Wide data" et "Long data" : #BuisnessIntelligence

@lysander07@sigmoid.social
2025-05-11 13:16:51

Next stop in our NLP timeline is 2013, the introduction of low dimensional dense word vectors - so-called "word embeddings" - based on distributed semantics, as e.g. word2vec by Mikolov et al. from Google, which enabled representation learning on text.
T. Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space.

Slide from the Information Service Engineering 2025 lecture, lecture 02, Natural Language Processing 01, NLP Timeline. The timeline is in the middle of the slide from top to bottom, indicating a marker at 2013. On the left, a diagram is shown, displaying vectors  for "man" and "woman" in a 2D diagram. An arrow leades from the point of "man" to the point of "woman". Above it, there is also the point marked for "king" and the same difference vector is transferred from "man - > woman" to "king - ?…
@arXiv_physicsdataan_bot@mastoxiv.page
2025-07-10 13:10:53

Replaced article(s) found for physics.data-an. arxiv.org/list/physics.data-an
[1/1]:
- Orthogonal projections of hypercubes
Yoshiaki Horiike, Shin Fujishiro

@arXiv_mathOC_bot@mastoxiv.page
2025-06-12 09:38:51

A Saddle Point Algorithm for Robust Data-Driven Factor Model Problems
Shabnam Khodakaramzadeh, Soroosh Shafiee, Gabriel de Albuquerque Gleizer, Peyman Mohajerin Esfahani
arxiv.org/abs/2506.09776

@arXiv_qbioNC_bot@mastoxiv.page
2025-06-10 18:13:40

This arxiv.org/abs/2407.00976 has been replaced.
initial toot: mastoxiv.page/@arXiv_qbi…

@arXiv_csSC_bot@mastoxiv.page
2025-06-12 07:50:11

Gradient-Weighted, Data-Driven Normalization for Approximate Border Bases -- Concept and Computation
Hiroshi Kera, Achim Kehrein
arxiv.org/abs/2506.09529

@arXiv_csDB_bot@mastoxiv.page
2025-06-10 07:25:32

KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
Eugenie Lai, Gerardo Vitagliano, Ziyu Zhang, Sivaprasad Sudhir, Om Chabra, Anna Zeng, Anton A. Zabreyko, Chenning Li, Ferdi Kossmann, Jialin Ding, Jun Chen, Markos Markakis, Matthew Russo, Weiyang Wang, Ziniu Wu, Michael J. Cafarella, Lei Cao, Samuel Madden, Tim Kraska

@netzschleuder@social.skewed.de
2025-07-11 04:00:05

arxiv_citation: arXiv citation networks (1993-2003)
Citations among papers posted on arxiv.org under the hep-ph and hep-th categories, between 1993 and 2003. This time begins a few months after axiv was launched. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) These data were originally released as part of the 2003 KDD Cup.
This network has 27770 nodes and 352807 edges.
Tags: Informational,…

arxiv_citation: arXiv citation networks (1993-2003). 27770 nodes, 352807 edges. https://networks.skewed.de/net/arxiv_citation#HepTh
@Techmeme@techhub.social
2025-07-10 08:50:58

The Allen Institute for AI launches FlexOlmo, an LLM architecture that lets data owners control and remove their training data from a model even after training (Will Knight/Wired)
wired.com/story/flexolmo-ai-mo

@arXiv_csLG_bot@mastoxiv.page
2025-06-12 09:36:32

ErrorEraser: Unlearning Data Bias for Improved Continual Learning
Xuemei Cao, Hanlin Gu, Xin Yang, Bingjun Wei, Haoyang Liang, Xiangkun Wang, Tianrui Li
arxiv.org/abs/2506.09347

@arXiv_csCY_bot@mastoxiv.page
2025-06-10 16:25:59

This arxiv.org/abs/2502.07732 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCY_…

@arXiv_csCL_bot@mastoxiv.page
2025-07-11 09:59:31

Conditional Unigram Tokenization with Parallel Data
Gianluca Vico, Jind\v{r}inch Libovick\'y
arxiv.org/abs/2507.07824

@arXiv_csDC_bot@mastoxiv.page
2025-06-11 07:29:23

Towards Provenance-Aware Earth Observation Workflows: the openEO Case Study
H. Omidi, L. Sacco, V. Hutter, G. Irsiegler, M. Claus, M. Schobben, A. Jacob, M. Schramm, S. Fiore
arxiv.org/abs/2506.08597

@newsie@darktundra.xyz
2025-07-10 13:43:39

Qantas says 5.7 million affected by breach, leaked info not enough to access frequent flyer accounts therecord.media/qantas-airline

@memeorandum@universeodon.com
2025-07-11 23:26:04

US customs duties top $100 billion for first time in a fiscal year (David Lawder/Reuters)
reuters.com/business/trumps-ta
memeorandum.com/250711/p103#a2

@arXiv_eessSY_bot@mastoxiv.page
2025-06-11 08:32:05

Learning event-triggered controllers for linear parameter-varying systems from data
Renjie Ma, Su Zhang, Wenjie Liu, Zhijian Hu, Peng Shi
arxiv.org/abs/2506.08366

@kubikpixel@chaos.social
2025-07-10 05:05:50

»Bitcoin Depot breach exposes data of nearly 27,000 crypto users:
Bitcoin Depot, an operator of Bitcoin ATMs, is notifying customers of a data breach incident that has exposed their sensitive information.«
Who ever believed in digital security with Bitcoin service providers?
🤷‍♂️

@arXiv_csCR_bot@mastoxiv.page
2025-06-10 16:25:09

This arxiv.org/abs/2410.16316 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCR_…

@arXiv_astrophCO_bot@mastoxiv.page
2025-06-11 08:44:05

Integrated Galaxy Light from Stacking $10^5$ Random Pointings in the Dark Energy Survey Data
Jenna E. Moore, Seth H. Cohen, Philip Mauskopf, Evan Scannapieco
arxiv.org/abs/2506.08162

@arXiv_csSE_bot@mastoxiv.page
2025-06-10 17:18:39

This arxiv.org/abs/2506.02791 has been replaced.
initial toot: mastoxiv.page/@arXiv_csSE_…

@arXiv_physicsdataan_bot@mastoxiv.page
2025-07-11 12:37:40

Replaced article(s) found for physics.data-an. arxiv.org/list/physics.data-an
[1/1]:
- Fuzzy permutation time irreversibility for nonequilibrium analysis of complex system
Wenpo Yao

@Techmeme@techhub.social
2025-06-11 04:35:58

Salesforce is restricting third-party companies from long-term indexing and storing of Slack messages, which would hamper rival enterprise AI firms like Glean (The Information)
theinformation.com/articles/sa

@arXiv_csDB_bot@mastoxiv.page
2025-07-11 08:16:51

Algorithmic Complexity Attacks on All Learned Cardinality Estimators: A Data-centric Approach
Yingze Li, Xianglong Liu, Dong Wang, Zixuan Wang, Hongzhi Wang, Kaixing Zhang, Yiming Guan
arxiv.org/abs/2507.07438

@metacurity@infosec.exchange
2025-06-09 19:12:32

(Trying this again, only this time with the right threat group. It's only Monday 🤪)
I had always assumed Salt Typhoon hit Comcast.
nextgov.com/cybersecurity/2025

@arXiv_csLG_bot@mastoxiv.page
2025-06-12 09:57:51

Generalization Error Analysis for Attack-Free and Byzantine-Resilient Decentralized Learning with Data Heterogeneity
Haoxiang Ye, Tao Sun, Qing Ling
arxiv.org/abs/2506.09438

@arXiv_csCL_bot@mastoxiv.page
2025-07-10 09:53:11

Expediting data extraction using a large language model (LLM) and scoping review protocol: a methodological study within a complex scoping review
James Stewart-Evans, Emma Wilson, Tessa Langley, Andrew Prayle, Angela Hands, Karen Exley, Jo Leonardi-Bee
arxiv.org/abs/2507.06623

@netzschleuder@social.skewed.de
2025-07-10 13:00:13

citeseer: CiteSeer citations (2014)
Citations among papers indexed by the CiteSeer digital library. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) Self-loops may be present.
This network has 384413 nodes and 1751463 edges.
Tags: Informational, Citation, Unweighted

citeseer: CiteSeer citations (2014). 384413 nodes, 1751463 edges. https://networks.skewed.de/net/citeseer
@arXiv_csCR_bot@mastoxiv.page
2025-07-11 08:16:31

FedP3E: Privacy-Preserving Prototype Exchange for Non-IID IoT Malware Detection in Cross-Silo Federated Learning
Rami Darwish, Mahmoud Abdelsalam, Sajad Khorsandroo, Kaushik Roy
arxiv.org/abs/2507.07258

@metacurity@infosec.exchange
2025-07-10 14:11:41

Check out today's Metacurity for the critical infosec developments you should know, including
--UK's NCA arrested four people for M&S, Co-Op cyberattacks
--Russian hoops player Kasatkin busted in France in connection with ransomware,
--McDonald's employee chatbot was riddled with absurd flaws,
--Hackers stole $40m from GMX protocol,
--Customer data exposed in Bitcoin Depot breach,
--Hackers run scam messages in old Mt. Gox wallets,
--Ni…

@arXiv_csDC_bot@mastoxiv.page
2025-06-11 07:37:13

Mycelium: A Transformation-Embedded LSM-Tree
Holly Casaletto, Jeff Lefevre, Aldrin Montana, Peter Alvaro
arxiv.org/abs/2506.08923

@kubikpixel@chaos.social
2025-06-09 07:55:26

»Data Broker – Warum der Datenhandel trotz #DSGVO floriert:
Mit jedem Klick im Netz werden massenweise #Daten generiert. Der #Handel damit ist lukrativ – und ein potenzielles

@arXiv_csCL_bot@mastoxiv.page
2025-06-10 19:04:11

This arxiv.org/abs/2506.04929 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCL_…

@netzschleuder@social.skewed.de
2025-06-11 01:00:06

cora: CORA citations (1998)
Citations among papers indexed by CORA, from 1998, an early computer science research paper search engine. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) Self-loops may be present. The dates of these snapshots are uncertain.
This network has 23166 nodes and 91500 edges.
Tags: Informational, Citation, Unweighted

cora: CORA citations (1998). 23166 nodes, 91500 edges. https://networks.skewed.de/net/cora
@arXiv_csLG_bot@mastoxiv.page
2025-07-09 10:22:32

Universal Embeddings of Tabular Data
Astrid Franz, Frederik Hoppe, Marianne Michaelis, Udo G\"obel
arxiv.org/abs/2507.05904

@Techmeme@techhub.social
2025-06-11 11:56:19

Israeli data security startup Cyera raised $540M led by Georgian, Greenoaks, and Lightspeed at a $6B valuation, up from $3B in November 2024 after raising $300M (Steven Scheer/Reuters)
reuters.com/world/middle-east/

@metacurity@infosec.exchange
2025-06-10 14:39:36

Never a let-up in cybersecurity developments, so don't miss today's Metacurity for the most critical infosec developments you should know, including
--US grocery distributor United Natural Foods is the latest retail-related cyber victim
--M&S reopens website to shoppers,
--Google account phone numbers could have been brute-forced,
--TX and IL warn of breach-related data exposure,
--NHS blood supply still short a year after ransomware attack,
--C…

@arXiv_csCR_bot@mastoxiv.page
2025-06-09 08:13:02

PrivTru: A Privacy-by-Design Data Trustee Minimizing Information Leakage
Lukas Gehring, Florian Tschorsch
arxiv.org/abs/2506.06124

@netzschleuder@social.skewed.de
2025-06-11 00:00:03

product_space: Atlas of Economic Complexity export network
Two networks of economic products, where a pair of products are connected if they are exported at similar rates by the same countries. The data are a projection from a bipartite network of nations and the products they export. Edges weights represent a similarity score (called "proximity"). Data based on UN Comtrade worldwide trade patterns. SITC network based on the Standard International Trade Classification and HS …

product_space: Atlas of Economic Complexity export network. 774 nodes, 1779 edges. https://networks.skewed.de/net/product_space#SITC
@Techmeme@techhub.social
2025-07-10 20:05:46

LGND, which uses vector embeddings to analyze geospatial data and has an enterprise app to query it, raised a $9M seed led by Javelin Venture Partners (Tim De Chant/TechCrunch)
techcrunch.com/2025/07/10/lgnd

@arXiv_csCL_bot@mastoxiv.page
2025-06-10 18:59:31

This arxiv.org/abs/2506.00759 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCL_…

@arXiv_csLG_bot@mastoxiv.page
2025-06-12 09:59:31

Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform
Jay Roberts, Kyle Mylonakis, Sidhartha Roy, Kaan Kale
arxiv.org/abs/2506.09452

@Techmeme@techhub.social
2025-06-11 11:46:02

Will Cathcart says WhatsApp plans to support Apple in its legal case against the UK Home Office over weakening encryption, which may "set a dangerous precedent" (Zoe Kleinman/BBC)
bbc.com/news/articles/cgmjrn42

@Techmeme@techhub.social
2025-07-11 18:45:53

Sources: JPMorgan Chase told fintech companies it will start charging fees for access to customers' account data, which could drastically reshape the industry (Bloomberg)
bloomberg.com/news/articles/20

@arXiv_csCL_bot@mastoxiv.page
2025-07-10 10:07:31

FlexOlmo: Open Language Models for Flexible Data Use
Weijia Shi, Akshita Bhagia, Kevin Farhat, Niklas Muennighoff, Pete Walsh, Jacob Morrison, Dustin Schwenk, Shayne Longpre, Jake Poznanski, Allyson Ettinger, Daogao Liu, Margaret Li, Dirk Groeneveld, Mike Lewis, Wen-tau Yih, Luca Soldaini, Kyle Lo, Noah A. Smith, Luke Zettlemoyer, Pang Wei Koh, Hannaneh Hajishirzi, Ali Farhadi, Sewon Min

@metacurity@infosec.exchange
2025-07-09 12:10:11

So are sports gambling and horse-racing authorities the latest vertical to be hit by Scattered Spider? Yesterday, bookmakers Paddy Power and Betfair admitted to a cyberattack, an incident that follows an attack last month on the British Horseracing Authority.

@arXiv_csLG_bot@mastoxiv.page
2025-06-09 10:10:32

Synthetic Tabular Data: Methods, Attacks and Defenses
Graham Cormode, Samuel Maddock, Enayat Ullah, Shripad Gade
arxiv.org/abs/2506.06108

@netzschleuder@social.skewed.de
2025-07-12 04:00:03

sp_colocation: Social co-locations (2018)
Network of colocations between peoople, based on the information on which RFID readers received information from the RFID tags. Namely, we define two individuals to be in co-presence if the same exact set of readers have received signals from both individuals during a 20s time window.
This network has 81 nodes and 150126 edges.
Tags: Social, Offline, Unweighted, Weighted, Temporal, Metadata

sp_colocation: Social co-locations (2018). 81 nodes, 150126 edges. https://networks.skewed.de/net/sp_colocation#LH10
@arXiv_csLG_bot@mastoxiv.page
2025-06-12 10:03:01

Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Shuai Wang, Zhenhua Liu, Jiaheng Wei, Xuanwu Yin, Dong Li, Emad Barsoum
arxiv.org/abs/2506.09532

@Techmeme@techhub.social
2025-07-08 06:25:44

The data industry is consolidating, with Databricks' $1B Neon purchase, Salesforce's $8B Informatica deal, and more, fueled by the need for quality data for AI (Rebecca Szkutak/TechCrunch)
techcrunch.com/2025/07/07/ai-i