2025-11-03 17:23:00
The Tow Center releases a tracker that monitors lawsuits, deals, grants, and other developments between news publishers and AI companies (Klaudia Jaźwińska/Columbia Journalism Review)
https://www.cjr.org/analysis/lawsuit-license-openai-micros…
🚜 California farmland doused with 2.5 million pounds of PFAS pesticides each year, analysis finds
https://www.thenewlede.org/2025/11/california-farmland-doused-with-2-5-million-pounds-of-pfas-pesticides-each-y…
The government’s biggest coal sales in more than a decade, is coming in a few days,
offering 600 million tons from publicly owned reserves next to strip mines in Montana and Wyoming.
The sales are a signature piece of Trump’s ambitions for companies to dig more coal from federal lands and burn it for electricity.
Yet most power plants served by those mines plan to quit burning coal altogether within 10 years, an Associated Press data analysis shows.
Three other mines po…
An analysis of Waymo's data covering ~100M driverless miles across four US cities: Waymo cars have far lower crash rates per million miles than human drivers (Jonathan Slotkin/New York Times)
https://www.nytimes.com/2025/12/02/o…
End-of-Year Threat Intelligence Sightings Forecast
This report presents an analysis of Threat Intelligence (TI) Sightings aggregated from several key data sources, including social platforms, code repositories, and specialized TI feeds. The primary objective is to visually track historical trends per source and provide a short-term adaptive forecast for a defined period (in days).
#opensource
An analysis of Crunchbase and PitchBook data: in 2025 so far, 80 tech startups reached $1B valuations, many of them focused on AI with exceptions like Kalshi (Dominic-Madori Davis/TechCrunch)
https://techcrunch.com/2025/12/01/at-least-36-new-tech-…
Regional temperature records broken across the world in 2025 #environment
An analysis of 30 US data center proposals in 14 states: in most cases, local officials signed NDAs and worked with apparent shell companies to hide details (Natalie Kainz/NBC News)
https://www.nbcnews.com/tech/tech-news/dat
Analysis: New Data Suggests Russia Is Sustaining Mi-8 Output Despite Wartime Losses: https://benborges.xyz/2025/12/11/analysis-new-data-suggests-russia.html
ICE shift in tactics leads to soaring number of unjustifiable arrests
Government data shows that more than 60 percent of the people detained in at-large arrests since June did NOT have criminal convictions or pending charges.
-- even as authorities insist that immigration officers are focusing on violent criminals whom they describe as “the worst of the worst.”
🕶️ Community Analysis of Social Virtual Reality Based on Large-Scale Log Data of a Commercial Metaverse Platform
#vr
"This paper presents a comprehensive scientometric analysis of the long-term impact of [event] on the nation scientific development."
*oh, interesting!*
"Using Scopus-indexed data..."
*closes tab*
Teaching students simple #WebScraping was always quite rewarding. It opens up numerous relevant, real-world data sources that are the foundation for any further analysis. Things already got more complicated with dynamic content loading, but now bot-exclusion-mechanisms make it almost impossible in many cases. Is web scraping for the
An analysis of AI training datasets, compiled by The Atlantic, shows AI models were trained on hundreds of thousands of YouTube videos from news publishers (Andrew Deck/Nieman Lab)
https://www.niemanlab.org/2025/10/hundred…
A template for data analysis projects structured as R packages (or not) https://github.com/Pakillo/template by @…
RE: https://mastodon.social/@cheeaun/115415146417702654
After looking at this, got curious to know the limits in most servers.
So I did a little data analysis. Servers list from @…
Aixel: A Unified, Adaptive and Extensible System for AI-powered Data Analysis
Meihui Zhang, Liming Wang, Chi Zhang, Zhaojing Luo
https://arxiv.org/abs/2510.12642 https://…
Synthetic Series-Symbol Data Generation for Time Series Foundation Models
Wenxuan Wang, Kai Wu, Yujian Betterest Li, Dan Wang, Xiaoyu Zhang
https://arxiv.org/abs/2510.08445 http…
Inside the deportation machine (giftlink)
https://www.nytimes.com/interactive/2025/12/22/us/trump-immigration-deportation-network-ice-arrests.html?unlocked_art…
Mapping the Urban Mobility Intelligence Frontier: A Scientometric Analysis of Data-Driven Pedestrian Trajectory Prediction and Simulation
Junhao Xu, Hui Zeng
https://arxiv.org/abs/2510.10327
This is a good start but the subway should curve south down 19th Ave, meet up with Daly City BART and continue on the BART tracks down to Millbrae. That part is essential; a branch to Outer Richmond could be added later as a nice-to-have.
https://musubi3.github.io/sfmta-geary-subway…
High-dimensional Analysis of Synthetic Data Selection
Parham Rezaei, Filip Kovacevic, Francesco Locatello, Marco Mondelli
https://arxiv.org/abs/2510.08123 https://
The Data Enclave Advantage: A New Paradigm for Least-Privileged Data Access in a Zero-Trust World
Nico Bistolfi, Andreea Georgescu, Dave Hodson
https://arxiv.org/abs/2510.09494 …
MS-Mix: Unveiling the Power of Mixup for Multimodal Sentiment Analysis
Hongyu Zhu, Lin Chen, Mounim A. El-Yacoubi, Mingsheng Shang
https://arxiv.org/abs/2510.11579 https://
Geospatial Reasoning fuses weather, satellite and population
data with Gemini AI for risk analysis
It runs in Google's Trusted Tester program for early access
(That ain't you)
https://www.testingcatalog.com/google-laun…
PowerPlots: An Open Source Power Grid Visualization and Data Analysis Framework for Academic Research
Noah Rhodes
https://arxiv.org/abs/2510.05063 https://…
If there is any truth in these allegations, we really have to worry about that is going on in both AGS and GSOC.
The initial story was bad enough: the Irish police service being so incompetent that their statistics on homicides were wildly incorrect, and the whistleblower getting penalised; but this is batshit.
Most Cambodia & Laos tree cover loss in 2024 happened inside protected areas https://news.mongabay.com/short-article/2025/10/most-cambodia-laos-tree-cover-loss-in-2024-happened-inside-protected-areas/
…
Updated constraints on interacting dark energy: A comprehensive analysis using multiple CMB probes, DESI DR2, and supernovae observations
Tian-Nuo Li, Guo-Hong Du, Yun-He Li, Yichao Li, Jia-Le Ling, Jing-Fei Zhang, Xin Zhang
https://arxiv.org/abs/2510.11363
Multi-Physics-Enhanced Bayesian Inverse Analysis: Information Gain from Additional Fields
Lea J. Haeusel, Jonas Nitzler, Lea J. K\"oglmeier, Wolfgang A. Wall
https://arxiv.org/abs/2510.11095
Dataminr to acquire cybersecurity firm ThreatConnect for $290 million
https://cyberscoop.com/dataminr-threatconnect-acquisition/
CoDA: Agentic Systems for Collaborative Data Visualization
Zichen Chen, Jiefeng Chen, Sercan \"O. Arik, Misha Sra, Tomas Pfister, Jinsung Yoon
https://arxiv.org/abs/2510.03194
Monitoring 3D Lattice Structures in Additive Manufacturing Using Topological Data Analysis
Yulin An, Xueqi Zhao, Enrique del Castillo
https://arxiv.org/abs/2510.11740 https://…
Exploring Quantum Spacetime with Topological Data Analysis
J. van der Duin, R. Loll, M. Schiffer, A. Silva
https://arxiv.org/abs/2510.05693 https://arxiv.o…
First simultaneous global QCD analysis of kaon and pion parton distributions with lattice QCD constraints
P. C. Barry, Chueng-Ryong Ji, W. Melnitchouk, N. Sato, Fernanda Steffens
https://arxiv.org/abs/2510.11979
The impact of missing data on the construction of LISA Time Delay Interferometry Michelson variables
Ollie Burke, Martina Muratore, Graham Woan
https://arxiv.org/abs/2510.06406 …
SenWave: A Fine-Grained Multi-Language Sentiment Analysis Dataset Sourced from COVID-19 Tweets
Qiang Yang, Xiuying Chen, Changsheng Ma, Rui Yin, Xin Gao, Xiangliang Zhang
https://arxiv.org/abs/2510.08214
Analysis: of the 8,808 global data centers in October 2025, ~7,000 are in areas outside the optimal 18C to 27C temperature range; 600 are in areas above 27C (Rest of World)
https://restofworld.org/2025/data-center-heat-map/
98 % des arbres fruitiers et oliviers de Gaza ont été détruits. 90 % des serres sont endommagées et 75 % détruites, selon une analyse des images satellitaires.
https://www.zmescience.com/science/news-sc
Energy Efficiency in Cloud-Based Big Data Processing for Earth Observation: Gap Analysis and Future Directions
Adhitya Bhawiyuga, Serkan Girgin, Rolf A. de By, Raul Zurita-Milla
https://arxiv.org/abs/2510.02882
Claude skills are a big deal™️
Thanks to skills, you can reduce your multi-agent setup to a single agent with skills, greatly reducing complexity and increasing speed of execution.
In fact, if in the past you could have a number of agents each specialized in, for example, data analysis, getting data from a particular set of websites, making that data available in a dashboard, etc., with skills you can substitute all these agents with skills. (1/2)
I rewrote a data analysis pipeline, moving it from #python to #julialang . I am now in love with the threading support in Julia.
The task is very parallelizable but each thread needs random read access to a tens-of-GB dataset. In Python (with multiprocessing, shared stores, etc) data bookkeeping was a nightmar…
Sensitivity Analysis for Causal ML: A Use Case at Booking.com
Philipp Bach, Victor Chernozhukov, Carlos Cinelli, Lin Jia, Sven Klaassen, Nils Skotara, Martin Spindler
https://arxiv.org/abs/2510.09109
There is enough data to start publishing reports of my statistical analysis of the Italian Volleyball Serie A1 championship.
https://davideaversa.it/experiment/volley/seriea1w2025.html
Crosslisted article(s) found for physics.data-an. https://arxiv.org/list/physics.data-an/new
[1/1]:
- Ensemble-Based Data Assimilation for Material Model Characterization in High-Velocity Impact
Rong Jin, Guangyao Wang, Xingsheng Sun
Kayak: in principle, an application that may be well-served by "old" rules-based AI. Its function is supposed to be deterministic, needing more data ingestion & analysis than humans can tolerate.
But if they're calling it "AI" today, I'm sure it's a LLM/neural net gadget which will hallucinate flights & fares. Because we're playing out the theory that the #XRisk
Photometric Redshift Estimation for Rubin Observatory Data Preview 1 with Redshift Assessment Infrastructure Layers (RAIL)
T. Zhang, E. Charles, J. F. Crenshaw, S. J. Schmidt, P. Adari, J. Gschwend, S. Mau, B. Andrews, E. Aubourg, Y. Bains, K. Bechtol, A. Boucaud, D. Boutigny, P. Burchat, J. Chevalier, J. Chiang, H. -F. Chiang, D. Clowe, J. Cohen-Tanugi, C. Combet, A. Connolly, S. Dagoret-Campagne, P. N. Daly, F. Daruich, G. Daubard, J. De Vicente, H. Drass, K. Fanning, E. Gawiser, M. …
Sentiment Matters: An Analysis of 200 Human-SAV Interactions
Lirui Guo, Michael G. Burke, Wynita M. Griggs
https://arxiv.org/abs/2510.08202 https://arxiv.o…
Missing Data Imputation in the Context of Propensity Score Analysis: A Systematic Review
Saghar Garayemi, Reza Ali Akbari Khoei, Sarah Friedrich
https://arxiv.org/abs/2510.05857
Evaluation of Real-Time Preprocessing Methods in AI-Based ECG Signal Analysis
Jasmin Freudenberg, Kai Hahn, Christian Weber, Madjid Fathi
https://arxiv.org/abs/2510.12541 https:…
Joint neutrino oscillation analysis from the T2K and NOvA experiments: #neutrinos may hold the keys to why we exist: https://www.eurekalert.org/news-releases/1103650 - MSU scientists help merge data from two neutrino experiments to offer most precise look yet at elusive particles.
The Pitfalls of Continuous Heavy-Tailed Distributions in High-Frequency Data Analysis
Vladim\'ir Hol\'y
https://arxiv.org/abs/2510.09785 https://ar…
RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms
Samah Kansab, Francis Bordeleau, Ali Tizghadam
https://arxiv.org/abs/2510.04796 https://…
Analysis: since 2023, data center power demands have delayed 15 coal plants' retirements; the Trump administration has ordered two power plants to remain open (Ariel Wittenberg/Politico)
https://www.politico.com/news/2025/11/27/a
Analysis of the Supernova Remnant IC 443 using H.E.S.S. Data
Alison M. W. Mitchell (for the H.E.S.S. Collaboration), Lukas Grosspietsch (for the H.E.S.S. Collaboration), Tina Wach (for the H.E.S.S. Collaboration)
https://arxiv.org/abs/2510.02843
Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations
Jakub Frac, Alexander Schmatz, Qiang Li, Guido Van Wingen, Shujian Yu
https://arxiv.org/abs/2510.05177
“This seems the best bang for your buck; it’s less per year than private school.”, said the future mother.
UK IVF couples use legal loophole to rank embryos based on potential IQ, height and health
https://www.theguar…
Data or Language Supervision: What Makes CLIP Better than DINO?
Yiming Liu, Yuhui Zhang, Dhruba Ghosh, Ludwig Schmidt, Serena Yeung-Levy
https://arxiv.org/abs/2510.11835 https:/…
PromptLocate: Localizing Prompt Injection Attacks
Yuqi Jia, Yupei Liu, Zedian Shao, Jinyuan Jia, Neil Gong
https://arxiv.org/abs/2510.12252 https://arxiv.o…
Developing an information criterion for spatial data analysis through Bayesian generalized fused lasso
Yuko Kakikawa, Yoshiyuki Ninomiya
https://arxiv.org/abs/2510.11172 https:/…
👨🏿🌾 Traces of old farm chemicals contaminate water across the US
#chemicals
Claude skills are a big deal™️
Thanks to skills, you can reduce your multi-agent setup to a single agent with skills, greatly reducing complexity and increasing speed of execution.
In fact, if in the past you could have a number of agents each specialized in, for example, data analysis, getting data from a particular set of websites, making that data available in a dashboard, etc., with skills you can substitute all these agents with skills. (1/2)
BeSTAD: Behavior-Aware Spatio-Temporal Anomaly Detection for Human Mobility Data
Junyi Xie, Jina Kim, Yao-Yi Chiang, Lingyi Zhao, Khurram Shafique
https://arxiv.org/abs/2510.12076
Bridging Imperative Process Models and Process Data Queries-Translation and Relaxation
Abdur Rehman Anwar Qureshi, Adrian Rebmann, Timotheus Kampik, Matthias Weidlich, Mathias Weske
https://arxiv.org/abs/2510.06414
Cost Analysis of Human-corrected Transcription for Predominately Oral Languages
Yacouba Diarra, Nouhoum Souleymane Coulibaly, Michael Leventhal
https://arxiv.org/abs/2510.12781 …
Replaced article(s) found for physics.data-an. https://arxiv.org/list/physics.data-an/new
[1/1]:
- Maximum entropy temporal networks
Paolo Barucca
A look at Meta's 2GW Hyperion data center in Louisiana, with the first phase opening in 2028; an analysis shows sales tax breaks on GPUs could total $3.3B (Jon Keegan/Sherwood News)
https://sherwood.news/tech/hyperion/
nCTEQ global analysis of nuclear PDFs
M. Klasen
https://arxiv.org/abs/2510.05880 https://arxiv.org/pdf/2510.05880…
Transformed $\ell_1$ Regularizations for Robust Principal Component Analysis: Toward a Fine-Grained Understanding
Kun Zhao, Haoke Zhang, Jiayi Wang, Yifei Lou
https://arxiv.org/abs/2510.03624
The Adoption Paradox: A Comparative Analysis of Veterinary AI Adoption in China and the North America
Shumin Li, Xiaoyun Lai
https://arxiv.org/abs/2510.11758 https://
Data-driven Practical Stabilization of Nonlinear Systems via Chain Policies: Sample Complexity and Incremental Learning
Roy Siegelmann, Enrique Mallada
https://arxiv.org/abs/2510.03982
"Your Doctor is Spying on You": An Analysis of Data Practices in Mobile Healthcare Applications
Luke Stevenson, Sanchari Das
https://arxiv.org/abs/2510.06015 https://
The $\alpha$--regression for compositional data: a unified framework for standard, spatially-lagged, and geographically-weighted regression models
Michail Tsagris
https://arxiv.org/abs/2510.12663
Analysis: Oracle has moved $66B of debt for building AI data centers off its balance sheet using SPVs; Meta has moved $30B, xAI moved $20B, and CoreWeave $2.6B (Tabby Kinder/Financial Times)
https://www.ft.com/content/0ae9d6cd-6b94-4e22-a559-f047734bef83
TelecomTS: A Multi-Modal Observability Dataset for Time Series and Language Analysis
Austin Feng, Andreas Varvarigos, Ioannis Panitsas, Daniela Fernandez, Jinbiao Wei, Yuwei Guo, Jialin Chen, Ali Maatouk, Leandros Tassiulas, Rex Ying
https://arxiv.org/abs/2510.06063
Crosslisted article(s) found for physics.data-an. https://arxiv.org/list/physics.data-an/new
[1/1]:
- A mathematical theory for understanding when abstract representations emerge in neural networks
Bin Wang, W. Jeffrey Johnston, Stefano Fusi
Comparative Performance Analysis of Modern NoSQL Data Technologies: Redis, Aerospike, and Dragonfly
Deep Bodra, Sushil Khairnar
https://arxiv.org/abs/2510.08863 https://
Protected areas hit hard as Mekong countries’ forest cover shrank in 2024 https://news.mongabay.com/2025/10/protected-areas-hit-hard-as-mekong-countries-forest-cover-shrank-in-2024/
Parrot Analytics: movies made up ~50% of streaming revenue across Netflix, Prime Video, Disney , and WBD in 2024, up from ~27% in 2022, based on engagement data (Alejandro Rojas/The Hollywood Reporter)
https://www.hollywoodreporter.com/business/b…
You Only Train Once: Differentiable Subset Selection for Omics Data
Daphn\'e Chopard, Jorge da Silva Gon\c{c}alves, Irene Cannistraci, Thomas M. Sutter, Julia E. Vogt
https://arxiv.org/abs/2512.17678 https://arxiv.org/pdf/2512.17678 https://arxiv.org/html/2512.17678
arXiv:2512.17678v1 Announce Type: new
Abstract: Selecting compact and informative gene subsets from single-cell transcriptomic data is essential for biomarker discovery, improving interpretability, and cost-effective profiling. However, most existing feature selection approaches either operate as multi-stage pipelines or rely on post hoc feature attribution, making selection and prediction weakly coupled. In this work, we present YOTO (you only train once), an end-to-end framework that jointly identifies discrete gene subsets and performs prediction within a single differentiable architecture. In our model, the prediction task directly guides which genes are selected, while the learned subsets, in turn, shape the predictive representation. This closed feedback loop enables the model to iteratively refine both what it selects and how it predicts during training. Unlike existing approaches, YOTO enforces sparsity so that only the selected genes contribute to inference, eliminating the need to train additional downstream classifiers. Through a multi-task learning design, the model learns shared representations across related objectives, allowing partially labeled datasets to inform one another, and discovering gene subsets that generalize across tasks without additional training steps. We evaluate YOTO on two representative single-cell RNA-seq datasets, showing that it consistently outperforms state-of-the-art baselines. These results demonstrate that sparse, end-to-end, multi-task gene subset selection improves predictive performance and yields compact and meaningful gene subsets, advancing biomarker discovery and single-cell analysis.
toXiv_bot_toot
As Cyber Threats Escalate, the National Vulnerability Database Is Falling Behind
The National Institute of Standards and Technology (NIST) is struggling.
It faces a growing backlog to process data in its vulnerability repository, which publicly shares information assessing and detailing mitigation solutions against new cyber exploits.
With nearly 1,800 new reported vulnerabilities sitting in a queue for analysis this year, delays in processing leave the United States increa…
Two new approaches to multiple canonical correlation analysis for repeated measures data
Tomasz G\'orecki, Miros{\l}aw Krzy\'sko, Felix Gnettner, Piotr Kokoszka
https://arxiv.org/abs/2510.04457
Early Multimodal Prediction of Cross-Lingual Meme Virality on Reddit: A Time-Window Analysis
Sedat Dogan, Nina Dethlefs, Debarati Chakraborty
https://arxiv.org/abs/2510.05761 ht…
An analysis of 47,000 publicly shared ChatGPT conversations: ~10% related to emotional or mental health, ChatGPT exhibits a "default to yes" behavior, and more (Washington Post)
https://www.washingtonpost.com/technology/2025/11/12/how-people-use-ch…
Replaced article(s) found for physics.data-an. https://arxiv.org/list/physics.data-an/new
[1/1]:
- Universal emergence of local Zipf-Mandelbrot law
Davide Cugini, Andr\'e Timpanaro, Giacomo Livan, Giacomo Guarnieri
Efficient Mining of Low-Utility Sequential Patterns
Jian Zhu, Zhidong Lin, Wensheng Gan, Ruichu Cai, Zhifeng Hao, Philip S. Yu
https://arxiv.org/abs/2510.10243 https://
Threat intel company Dataminr plans to acquire cybersecurity threat intel provider ThreatConnect for $290M; Dataminr raised $85M in convertible funding in March (Greg Otto/CyberScoop)
https://cyberscoop.com/dataminr-threatconnect-acquisition/
Replaced article(s) found for physics.data-an. https://arxiv.org/list/physics.data-an/new
[1/1]:
- DeepOHeat-v1: Efficient Operator Learning for Fast and Trustworthy Thermal Simulation and Optimiz...
Xinling Yu, Ziyue Liu, Hai Li, Yixing Li, Xin Ai, Zhiyu Zeng, Ian Young, …
Crosslisted article(s) found for physics.data-an. https://arxiv.org/list/physics.data-an/new
[1/1]:
- Deep Learning of the Biswas-Chatterjee-Sen Model
Neto, Alencar, Brito, Alves, Lima, Macedo-Filho, Ferreira, Alves
Google adds Gemini's Deep Search to Google Finance, which will also have prediction market data from Kalshi and Polymarket for event analysis, first in the US (Aamir Siddiqui/Android Authority)
https://www.androidauthority.com/google-finance…
Replaced article(s) found for physics.data-an. https://arxiv.org/list/physics.data-an/new
[1/1]:
- Comparative Analysis of Richardson-Lucy Deconvolution and Data Unfolding with Mean Integrated Squ...
Nikolay D. Gagunashvili
Crosslisted article(s) found for physics.data-an. https://arxiv.org/list/physics.data-an/new
[1/1]:
- Functional Connectivity Networks for Transportation Delay Analysis: from Theory to Software
Carlson Moses B\"uth, Massimiliano Zanin