
2025-07-11 07:42:51
Dirty Data in the Newsroom: Comparing Data Preparation in Journalism and Data Science
Stephen Kasica, Charles Berret, Tamara Munzner
https://arxiv.org/abs/2507.07238
Dirty Data in the Newsroom: Comparing Data Preparation in Journalism and Data Science
Stephen Kasica, Charles Berret, Tamara Munzner
https://arxiv.org/abs/2507.07238
KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
Eugenie Lai, Gerardo Vitagliano, Ziyu Zhang, Sivaprasad Sudhir, Om Chabra, Anna Zeng, Anton A. Zabreyko, Chenning Li, Ferdi Kossmann, Jialin Ding, Jun Chen, Markos Markakis, Matthew Russo, Weiyang Wang, Ziniu Wu, Michael J. Cafarella, Lei Cao, Samuel Madden, Tim Kraska
Gaussian copula correlation network analysis with application to multi-omics data
Ekaterina Tomilina (MaIAGE, GABI), Florence Jaffr\'ezic (GABI), Gildas Mazo (MaIAGE)
https://arxiv.org/abs/2506.08586
Constraints on cosmology and baryonic feedback with joint analysis of Dark Energy Survey Year 3 lensing data and ACT DR6 thermal Sunyaev-Zel'dovich effect observations
S. Pandey, J. C. Hill, A. Alarcon, O. Alves, A. Amon, D. Anbajagane, F. Andrade-Oliveira, N. Battaglia, E. Baxter, K. Bechtol, M. R. Becker, G. M. Bernstein, J. Blazek, S. L. Bridle, E. Calabrese, H. Camacho, A. Campos, A. Carnero Rosell, M. Carrasco Kind, R. Cawthon, C. Chang, R. Chen, P. Chintalapati, A. Choi, J. C…
Reduction Techniques for Survival Analysis
Johannes Piller, L\'ea Orsini, Simon Wiegrebe, John Zobolas, Lukas Burk, Sophie Hanna Langbein, Philip Studener, Markus Goeswein, Andreas Bender
https://arxiv.org/abs/2508.05715
Analysis: Nvidia's $500B US spending plan has accelerated the US AI server ecosystem development, with at least eight suppliers unveiling new investment plans (Nikkei Asia)
https://asia.nikkei.com/Business/Technol…
PrivTru: A Privacy-by-Design Data Trustee Minimizing Information Leakage
Lukas Gehring, Florian Tschorsch
https://arxiv.org/abs/2506.06124 https://
GloBIAS: strengthening the foundations of BioImage Analysis
A. A. Corbat (BioImage Informatics Unit, Science for Life Laboratory and Department of Information Technology, Uppsala University, Sweden), C. G. Walther (German BioImaging, Gesellschaft f\"ur Mikroskopie und Bildanalyse e.V., Konstanz, Germany, University of Vienna, Vienna, Austria), L. R. de la Ballina (Centre for Cancer Cell Reprogramming, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Montebe…
A Comprehensive Analysis of COVID-19 Detection Using Bangladeshi Data and Explainable AI
Shuvashis Sarker
https://arxiv.org/abs/2506.07234 https://
Replaced article(s) found for physics.data-an. https://arxiv.org/list/physics.data-an/new
[1/1]:
- Fuzzy permutation time irreversibility for nonequilibrium analysis of complex system
Wenpo Yao
Learning event-triggered controllers for linear parameter-varying systems from data
Renjie Ma, Su Zhang, Wenjie Liu, Zhijian Hu, Peng Shi
https://arxiv.org/abs/2506.08366
This https://arxiv.org/abs/2506.01348 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
For all the other passengers and crews of #Dreamliners around the globe, I hope that #AirIndia flight 171 indeed didn't crash due to general issues with the aircraft type. 🧐 #Boeing
“Th…
From July 1 if a new maintainer isn't found then tcpflow will be archived. From the GitHub page:
> tcpflow is a program that captures data transmitted as part of TCP connections (flows), and stores the data in a way that is convenient for protocol analysis and debugging.
https://github.com/simsong/tc…
Algorithmic Analysis of GTFS-RT vehicle position accuracy
Joshua Wong
https://arxiv.org/abs/2506.06479 https://arxiv.org/pdf/2506.064…
This year at #CCN2025 we will be showcasing our research on the classification of Mental Workload 🥵 Spatial Effects using Riemannian Manifold.
📅 When: Wednesday, August 13, 1:00 – 4:00 pm
📍 Where: CCN 2025 Conference Venue, de Brug & E-Hall
📋 What: Poster B152
This https://arxiv.org/abs/2410.08911 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…
Topological Sequence Analysis of Genomes: Delta Complex approaches
Jian Liu, Li Shen, Dong Chen, Guo-Wei Wei
https://arxiv.org/abs/2507.05452 https://
New #preprint by Alex Koplenig and me:
"Statistical errors undermine claims about the evolution of polysynthetic languages". (#PNAS (#Linguistics
Thermodynamic Analysis of Transverse Momentum Spectra in Pb-Pb Collisions at 2.76 TeV: Centrality Dependence of Temperature, Freezeout Parameters and Non-Extensitivity
M. Waqas, Hassan Ali Khan, Wolfgang Bietenholz, Muhammad Ajaz, Jihane Ben Slimane, Haifa I. Alrebdi, A. Haj Ismail
https://arxiv.org/abs/2507.07369
A Simple PTAS for Weighted $k$-means and Sensor Coverage
Akash Pareek, Supratim Shit
https://arxiv.org/abs/2508.06460 https://arxiv.org/pdf/2508.06460
Operator theoretic measure of causality in linear dynamical systems
Ankit Srivastava, Louis Cattafesta, Scott Dawson
https://arxiv.org/abs/2506.08118 https…
Analysis: Mail Online is among major news brands most-impacted by Google's AI Overviews, with 68.8% of the top 100 keyword searches resulting in no site visits (Charlotte Tobitt/Press Gazette)
https://press…
Brazil’s ANPD Preliminary Study on Generative AI highlights the dual nature of data protection law: balancing rights with technological innovation
https://fpf.org/blog/brazils-anpd-prel
Differential Attention for Multimodal Crisis Event Analysis
Nusrat Munia, Junfeng Zhu, Olfa Nasraoui, Abdullah-Al-Zubaer Imran
https://arxiv.org/abs/2507.05165
Investigating non-Keplerian motion in flare events with astrometric data
Fengting Xie, Qing-Hua Zhu, Xin Li
https://arxiv.org/abs/2507.07411 https://
A Unified Ontology for Scalable Knowledge Graph-Driven Operational Data Analytics in High-Performance Computing Systems
Junaid Ahmed Khan, Andrea Bartolini
https://arxiv.org/abs/2507.06107
Superposed Parameterised Quantum Circuits
Viktoria Patapovich, Mo Kordzanganeh, Alexey Melnikov
https://arxiv.org/abs/2506.08749 https://
A pictures says more than 1000 words. How much more can an audio representation of your data tell you? #rstats
Exploring non-cold dark matter in a scenario of dynamical dark energy with DESI DR2 data
Tian-Nuo Li, Peng-Ju Wu, Guo-Hong Du, Yan-Hong Yao, Jing-Fei Zhang, Xin Zhang
https://arxiv.org/abs/2507.07798
The Open Cluster Chemical Abundances and Mapping Survey: VIII. Galactic Chemical Gradient and Azimuthal Analysis from SDSS/MWM DR19
Jonah M. Otto, Peter M Frinchaboy, Natalie R. Myers, James W. Johnson, John Donor, Ahabar Hossain, Szabolcs M\'esz\'aros, Katia Cunha, Binod Bhattarai, Gail Zasowski, Sarah R. Loebman, Alessa I. Wiggins, Adrian M. Price-Whelan, Taylor Spoo, Diogo Souto, Dmitry Bizyaev, Kaike Pan, Andrew K. Saydjari
Mediation Analysis for Sparse and Irregularly Spaced Longitudinal Outcomes with Application to the MrOS Sleep Study
Rui Ren, Haoyi Yang, Qian Xiao, Lingzhou Xue, Yuan Huang
https://arxiv.org/abs/2506.07953
On the computational feasibility of Bayesian end-to-end analysis of LiteBIRD simulations within Cosmoglobe
R. Aurvik, M. Galloway, E. Gjerl{\o}w, U. Fuskeland, A. Basyrov, M. Bortolami, M. Brilenkov, P. Campeti, H. K. Eriksen, L. T. Hergt, D. Herman, M. Monelli, L. Pagano, G. Puglisi, N. Raffuzzi, N. -O. Stutzer, R. M. Sullivan, H. Thommesen, D. J. Watts, I. K. Wehus, D. Adak, E. Allys, A. Anand, J. Aumont, C. Baccigalupi, M. Ballardini, A. J. Banday, R. B. Barreiro, N. Bartolo, S. Bas…
AI-Based Reconstruction from Inherited Personal Data: Analysis, Feasibility, and Prospects
Mark Zilberman
https://arxiv.org/abs/2507.03059 https://
Long-term exposure to outdoor air pollution linked to increased risk of dementia
https://www.cam.ac.uk/research/news/long-term-exposure-to-outdoor-air-pollution-linked-to-increased-risk-of-dementia
Air pol…
An Architecture for Privacy-Preserving Telemetry Scheme
Kenneth Odoh
https://arxiv.org/abs/2507.06350 https://arxiv.org/pdf/2507.0635…
Replaced article(s) found for physics.data-an. https://arxiv.org/list/physics.data-an/new
[1/1]:
- Orthogonal projections of hypercubes
Yoshiaki Horiike, Shin Fujishiro
Customs and Border Protection
is asking companies to pitch tools
for performing deep analysis on the contents of devices seized at the US border.
https://www.wired.com/story/cbp-wants-new-tech-to-search-for-hidden-data-on-seiz…
A flexible, GPU-accelerated approach for the joint characterization of LISA instrumental noise and Stochastic Gravitational Wave Backgrounds
Alessandro Santini, Martina Muratore, Jonathan Gair, Olaf Hartwig
https://arxiv.org/abs/2507.06300
This https://arxiv.org/abs/2207.03997 has been replaced.
link: https://scholar.google.com/scholar?q=a
\textit{QuantMCP}: Grounding Large Language Models in Verifiable Financial Reality
Yifan Zeng
https://arxiv.org/abs/2506.06622 https://
IRAS 22272 5435 (V354 Lac): Multicolor Photometry of a Variable Carbon-rich post-AGB Star and a Dust Formation Episode
N. P. Ikonnikova, V. I. Shenavrin, G. V. Komissarova, M. A. Burlak
https://arxiv.org/abs/2507.06777
Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis
Xintong Hu, Yixuan Chen, Rui Yang, Wenxiang Guo, Changhao Pan
https://arxiv.org/abs/2507.06116
This https://arxiv.org/abs/2411.06564 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…
"Customs and Border Protection is asking companies to pitch tools for performing deep analysis on the contents of devices seized at the US border."
https://www.wired.com/story/cbp-wants-new-tech-to-search-for-hidden-data-on-seized-phones/…
RCA Copilot: Transforming Network Data into Actionable Insights via Large Language Models
Alexander Shan, Jasleen Kaur, Rahul Singh, Tarun Banka, Raj Yavatkar, T. Sridhar
https://arxiv.org/abs/2507.03224
Towards Serverless Processing of Spatiotemporal Big Data Queries
Diana Baumann, Tim C. Rese, David Bermbach
https://arxiv.org/abs/2507.06005 https://
Progress and new challenges in image-based profiling
Erik Serrano, John Peters, Jesko Wagner, Rebecca E. Graham, Zhenghao Chen, Brian Feng, Gisele Miranda, Alexandr A. Kalinin, Loan Vulliard, Jenna Tomkinson, Cameron Mattson, Michael J. Lippincott, Ziqi Kang, Divya Sitani, Dave Bunten, Srijit Seal, Neil O. Carragher, Anne E. Carpenter, Shantanu Singh, Paula A. Marin Zapata, Juan C. Caicedo, Gregory P. Way
Transfer Learning and Explainable AI for Brain Tumor Classification: A Study Using MRI Data from Bangladesh
Shuvashis Sarker
https://arxiv.org/abs/2506.07228
Scalable Language Agnostic Taint Tracking using Explicit Data Dependencies
Sedick David Baker Effendi, Xavier Pinho, Andrei Michael Dreyer, Fabian Yamaguchi
https://arxiv.org/abs/2506.06247
A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data
Lama Alqazlan, Zheng Fang, Michael Castelle, Rob Procter
https://arxiv.org/abs/2506.06083
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Haoyu Wu, Diankun Wu, Tianyu He, Junliang Guo, Yang Ye, Yueqi Duan, Jiang Bian
https://arxiv.org/abs/2507.07982
Research on integrated intelligent energy management system based on big data analysis and machine learning
Jinzhou Xu, Yadan Zhang, Paola Tapia
https://arxiv.org/abs/2508.05583
All-Sky Cosmic-Ray Anisotropy Update at Multiple Energies
Juan Carlos D\'iaz-V\'elez (for the HAWC,IceCube Collaborations), Riya Yogesh Kore (for the HAWC,IceCube Collaborations), Paolo Desiati (for the HAWC,IceCube Collaborations), Ferris Wolf (for the HAWC,IceCube Collaborations)
https://arxiv.org/abs/2507.07070…
This https://arxiv.org/abs/2505.18344 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
KeyDroid: A Large-Scale Analysis of Secure Key Storage in Android Apps
Jenny Blessing, Ross J. Anderson, Alastair R. Beresford
https://arxiv.org/abs/2507.07927
Bayesian Bootstrap-based Gaussian Copula Model for Mixed Data with High Missing Rates
Seongmin Kim, Jeunghun Oh, Hungkuk Ko, Jeongmin Park, Jaeyong Lee
https://arxiv.org/abs/2507.06785
The z=1.03 Merging Cluster SPT-CL J0356-5337: New Strong Lensing Analysis with HST and MUSE
Grace Smith, Guillaume Mahler, Kate Napier, Keren Sharon, Matthew Bayliss, Bradford Benson, Lindsey Bleem, Benjamin Floyd, Vittorio Ghirardini, Michael D. Gladders, Gourav Khullar, Tim Schrabback
https://arxiv.org/abs/2507.07404
SkimROOT: Accelerating LHC Data Filtering with Near-Storage Processing
Narangerelt Batsoyol, Jonathan Guiang, Diego Davila, Aashay Arora, Philip Chang, Frank W\"urthwein, Steven Swanson
https://arxiv.org/abs/2506.04507
IRAS 16475-4609: A Young Compact HII Region Sculpting Its Molecular Environment
Felipe Navarete, Sean D. Points, Augusto Damineli
https://arxiv.org/abs/2506.08180
Testing time order and Leggett-Garg inequalities with noninvasive measurements on public quantum computers
Tomasz Rybotycki, Tomasz Bia{\l}ecki, Josep Batle, Bart{\l}omiej Zglinicki, Adam Szereszewski, Wolfgang Belzig, Adam Bednorz
https://arxiv.org/abs/2507.07904
Stability and Extension of Steady and Ranging Persistence
Yann-Situ Gazull
https://arxiv.org/abs/2506.07911 https://arxiv.org/pdf/250…
Analysis: Community Notes submissions on X fell from a high of 120,000 in January to fewer than 60,000 in May; Musk now rarely mentions the feature (Joe Murphy/NBC News)
https://www.nbcnews.com/tech/social-media/x-twitter-community-not…
QUEST: Query Optimization in Unstructured Document Analysis
Zhaoze Sun, Qiyan Deng, Chengliang Chai, Kaisen Jin, Xinyu Guo, Han Han, Ye Yuan, Guoren Wang, Lei Cao
https://arxiv.org/abs/2507.06515
Group structure as a foundation for entropies
Henrik Jeldtoft Jensen, Piergiulio Tempesta
https://arxiv.org/abs/2507.06847 https://ar…
Determining $\alpha_s(m_Z)$ from the Heavy Jet Mass distribution
Miguel A. Benitez
https://arxiv.org/abs/2506.07723 https://arxiv.org…
Analysis: Nvidia has the biggest weight in the S&P 500 of any individual stock since 1981 and the highest P/E as the index's top stock since Microsoft in 1999 (Luke Kawa/Sherwood News)
https://sherwood.news/markets/nvidia-i
MatBYIB: A Matlab-based code for Bayesian inference of extreme mass-ratio inspiral binary with arbitrary eccentricity
Gen-Liang Li, Shu-Jie Zhao, Huai-Ke Guo, Jing-Yu Su, Zhen-Heng Lin
https://arxiv.org/abs/2506.05954
LISTEN: Lightweight Industrial Sound-representable Transformer for Edge Notification
Changheon Han, Yun Seok Kang, Yuseop Sim, Martin Byung-Guk Jun, Hyung Wook Park
https://arxiv.org/abs/2507.07879
Differentially Private Explanations for Clusters
Amir Gilad, Tova Milo, Kathy Razmadze, Ron Zadicario
https://arxiv.org/abs/2506.05900 https://
PAST: A multimodal single-cell foundation model for histopathology and spatial transcriptomics in cancer
Changchun Yang, Haoyang Li, Yushuai Wu, Yilan Zhang, Yifeng Jiao, Yu Zhang, Rihan Huang, Yuan Cheng, Yuan Qi, Xin Guo, Xin Gao
https://arxiv.org/abs/2507.06418
Submillimeter and Mid-Infrared Variability of Young Stellar Objects in the M17 HII Region
Zhiwei Chen, Doug Johnstone, Carlos Contreras Pe\~na, Jeong-Eun Lee, Sheng-Yuan Liu, Gregory Herczeg, Steve Mairs, Geumsook Park, Kee-Tae Kim, Mi-Ryang Kim, Keping Qiu, Yao-Te Wang, Xu Zhang, Megan Reiter, the JCMT Transient Team
https://ar…
AgenticData: An Agentic Data Analytics System for Heterogeneous Data
Ji Sun, Guoliang Li, Peiyao Zhou, Yihui Ma, Jingzhe Xu, Yuan Li
https://arxiv.org/abs/2508.05002 https://
Is Your Training Pipeline Production-Ready? A Case Study in the Healthcare Domain
Daniel Lawand (University of S\~ao Paulo), Lucas Quaresma (University of S\~ao Paulo), Roberto Bolgheroni (University of S\~ao Paulo), Alfredo Goldman (University of S\~ao Paulo), Renato Cordeiro Ferreira (University of S\~ao Paulo, Jheronimus Academy of Data Science, Technical University of Eindhoven, Tilburg University)
Designing Parallel Algorithms for Community Detection using Arachne
Fuhuan Li, Zhihui Du, David A. Bader
https://arxiv.org/abs/2507.06471 https://
From Fads to Classics -- Analyzing Video Game Trend Evolutions through Steam Tags
Nicolas Grelier, Johannes Pfau, Nicolas Mathieu, St\'ephane Kaufmann
https://arxiv.org/abs/2506.08881
This https://arxiv.org/abs/2503.09714 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_…
Replaced article(s) found for physics.data-an. https://arxiv.org/list/physics.data-an/new
[1/1]:
- A COMPASS to Model Comparison and Simulation-Based Inference in Galactic Chemical Evolution
Berkay Gunes, Sven Buder, Tobias Buck
State of the Ice Model in the IceCube Observatory
Dmitry Chirkin (for the IceCube Collaboration), Martin Rongen (for the IceCube Collaboration)
https://arxiv.org/abs/2507.06341
A neutrino data analysis of extra-dimensional theories with massive bulk fields
Philipp Eller, Manuel Ettengruber, Alan Zander
https://arxiv.org/abs/2508.04274 https://
Edge-assisted Parallel Uncertain Skyline Processing for Low-latency IoE Analysis
Chuan-Chi Lai, Yan-Lin Chen, Bo-Xin Liu, Chuan-Ming Liu
https://arxiv.org/abs/2508.04596 https:/…
An analysis of Crunchbase and PitchBook data shows that in 2025 so far at least 36 tech startups hit a $1B valuation, including seven in June and six in May (Dominic-Madori Davis/TechCrunch)
https://techcrunch.com/2025/07/06/7-new-tech-unicorns-w…
Replaced article(s) found for physics.data-an. https://arxiv.org/list/physics.data-an/new
[1/1]:
- Self-consistent gravity model for inferring node mass in flow networks
Daekyung Lee, Wonguk Cho, Heetae Kim, Gunn Kim, Hyeong-Chai Jeong, Beom Jun Kim
PROVSYN: Synthesizing Provenance Graphs for Data Augmentation in Intrusion Detection Systems
Yi Huang, Wajih UI Hassan, Yao Guo, Xiangqun Chen, Ding Li
https://arxiv.org/abs/2506.06226
Tensor Stochastic Regression for High-dimensional Time Series via CP Decomposition
Shibo Li, Yao Zheng
https://arxiv.org/abs/2506.06919 https://
Observational Insights on DBI K-essence Models Using Machine Learning and Bayesian Analysis
Samit Ganguly, Arijit Panda, Eduardo Guendelman, Debashis Gangopadhyay, Abhijit Bhattacharyya, Goutam Manna
https://arxiv.org/abs/2506.05674
Generative Neural Network for Simulating Radio Emission from Extensive Air Showers
Pranav Sampathkumar, Tim Huege, Andreas Haungs, Ralph Engel
https://arxiv.org/abs/2507.07713
Resonant leptogenesis in inverse see-saw framework with modular $S_4$ symmetry
Abhishek, V. Suryanarayana Mummidi
https://arxiv.org/abs/2507.06610 https://…
Analysis: Community Notes submissions on X fell from a high of 120,000 in January to fewer than 60,000 in May; Musk now rarely mentions the feature (Joe Murphy/NBC News)
https://www.nbcnews.com/tech/social-media/x-twitter-community-not…
Layered, Overlapping, and Inconsistent: A Large-Scale Analysis of the Multiple Privacy Policies and Controls of U.S. Banks
Lu Xian, Van Tran, Lauren Lee, Meera Kumar, Yichen Zhang, Florian Schaub
https://arxiv.org/abs/2507.05415
A Statistical Framework for Co-Mediators of Zero-Inflated Single-Cell RNA-Seq Data
Seungjun Ahn, Zhigang Li
https://arxiv.org/abs/2507.06113 https://
[2025-07-10 Thu (UTC), no new articles found for physics.data-an Data Analysis, Statistics and Probability]
toXiv_bot_toot
Computationally Intensive Research: Advancing a Role for Secondary Analysis of Qualitative Data
Kaveh Mohajeri, Amir Karami
https://arxiv.org/abs/2506.04230
When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation
Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese, Omer Akgul, Athanasios Theocharis, Petros Efstathopoulos
https://arxiv.org/abs/2508.06394
This https://arxiv.org/abs/2212.07917 has been replaced.
link: https://scholar.google.com/scholar?q=a
Akaike information criterion for segmented regression models
Kazuki Nakajima, Yoshiyuki Ninomiya
https://arxiv.org/abs/2506.08760 https://
[2025-07-11 Fri (UTC), no new articles found for physics.data-an Data Analysis, Statistics and Probability]
toXiv_bot_toot
Galaxy Cluster Mass Estimation Through The Splashback Radius
Lucas Gabriel-Silva, Laerte Sodr\'e Jr
https://arxiv.org/abs/2506.07425 https://
[2025-08-11 Mon (UTC), no new articles found for physics.data-an Data Analysis, Statistics and Probability]
toXiv_bot_toot