Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csHC_bot@mastoxiv.page
2025-07-11 07:42:51

Dirty Data in the Newsroom: Comparing Data Preparation in Journalism and Data Science
Stephen Kasica, Charles Berret, Tamara Munzner
arxiv.org/abs/2507.07238

@cheryanne@aus.social
2025-06-10 06:33:16

High Signal: Data Science | Career | AI
Great Australian Pods Podcast Directory: #GreatAusPods

High Signal: Data Science | Career | AI
Screenshot of the podcast listing on the Great Australian Pods website
@netzschleuder@social.skewed.de
2025-06-11 01:00:06

cora: CORA citations (1998)
Citations among papers indexed by CORA, from 1998, an early computer science research paper search engine. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) Self-loops may be present. The dates of these snapshots are uncertain.
This network has 23166 nodes and 91500 edges.
Tags: Informational, Citation, Unweighted

cora: CORA citations (1998). 23166 nodes, 91500 edges. https://networks.skewed.de/net/cora
@arXiv_csDB_bot@mastoxiv.page
2025-06-10 07:25:32

KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
Eugenie Lai, Gerardo Vitagliano, Ziyu Zhang, Sivaprasad Sudhir, Om Chabra, Anna Zeng, Anton A. Zabreyko, Chenning Li, Ferdi Kossmann, Jialin Ding, Jun Chen, Markos Markakis, Matthew Russo, Weiyang Wang, Ziniu Wu, Michael J. Cafarella, Lei Cao, Samuel Madden, Tim Kraska

@v_i_o_l_a@openbiblio.social
2025-06-10 05:59:02

"Applications of the Critical Incident Technique in Library and Information Science Research: A Literature Review" doi.org/10.1515/libri-2024-006

@arXiv_csCV_bot@mastoxiv.page
2025-08-11 10:17:39

Effective Training Data Synthesis for Improving MLLM Chart Understanding
Yuwei Yang, Zeyu Zhang, Yunzhong Hou, Zhuowan Li, Gaowen Liu, Ali Payani, Yuan-Sen Ting, Liang Zheng
arxiv.org/abs/2508.06492

@arXiv_qbioPE_bot@mastoxiv.page
2025-07-11 07:54:21

Science at Risk: The Urgent Need for Institutional Support of Long-Term Ecological and Evolutionary Research in an Era of Data Manipulation and Disinformation
Vincent A. Viblanc (UMR ISEM), Elise Huchard (UMR ISEM), Gilles Pinay (CEFE), Elena Orme\~no (CEFE), C\'eline Teplitsky (CEFE), Fran\c{c}ois Criscuolo (IGE), Dominique Joly (IGE), David Renault (IGE), C\'ecile Callou (IGE), Fran\c{c}oise Gourmelon (IGE), Sandrine Anquetin (IGE), B\'en\'edicte Augeard (OFB), Fabien…

@arXiv_csDL_bot@mastoxiv.page
2025-06-12 07:29:11

MetaInfoSci: An Integrated Web Tool for Scholarly Data Analysis
Kiran Sharmaa, Parul Khurana, Ziya Uddina
arxiv.org/abs/2506.09056

@arXiv_csCL_bot@mastoxiv.page
2025-08-11 10:02:49

Learning the Topic, Not the Language: How LLMs Classify Online Immigration Discourse Across Languages
Andrea Nasuto, Stefano Maria Iacus, Francisco Rowe, Devika Jain
arxiv.org/abs/2508.06435

@arXiv_astrophSR_bot@mastoxiv.page
2025-06-10 10:17:23

SDSS-V Milky Way Mapper (MWM): ASPCAP Stellar Parameters and Abundances in SDSS-V Data Release 19
Szabolcs M\'esz\'aros, Paula Jofr\'e, Jennifer A. Johnson, Jonathan C. Bird, Andrew R. Casey, Katia Cunha, Nathan De Lee, Peter Frinchaboy, Guillaume Guiglion, Viola Heged\H{u}s, Alex P. Ji, Juna A. Kollmeier, Melissa K. Ness, Jonah Otto, Marc H. Pinsonneault, Alexandre Roman-Lopes, Amaya Sinha, Ying-Yi Song, Guy S. Stringfellow, Keivan G. Stassun, Jamie Tayar, Andrew Tkachenko…

@arXiv_csSE_bot@mastoxiv.page
2025-06-11 07:54:45

A Metrics-Oriented Architectural Model to Characterize Complexity on Machine Learning-Enabled Systems
Renato Cordeiro Ferreira (University of S\~ao Paulo, Jheronimus Academy of Data Science, Technical University of Eindhoven, Tilburg University)
arxiv.org/abs/2506.08153

@cosmos4u@scicomm.xyz
2025-06-04 22:49:49

The most energetic transients - tidal disruptions of high-mass stars: #ExtremeNuclearTransients (ENTs) are the most energetic transients yet observed.

@arXiv_csHC_bot@mastoxiv.page
2025-08-11 07:38:29

Automated Visualization Makeovers with LLMs
Siddharth Gangwar, David A. Selby, Sebastian J. Vollmer
arxiv.org/abs/2508.05637 arxiv.org/pdf/…

@arXiv_csCY_bot@mastoxiv.page
2025-06-12 07:39:31

KI4Demokratie: An AI-Based Platform for Monitoring and Fostering Democratic Discourse
Rudy Alexandro Garrido Veliz, Till Nikolaus Schaland, Simon Bergmoser, Florian Horwege, Somya Bansal, Ritesh Nahar, Martin Semmann, J\"org Forthmann, Seid Muhie Yimam
arxiv.org/abs/2506.09947

@arXiv_csIR_bot@mastoxiv.page
2025-08-11 09:29:49

Fine-Tuning Vision-Language Models for Markdown Conversion of Financial Tables in Malaysian Audited Financial Reports
Jin Khye Tan (Faculty of Computer Science,Information Technology, Universiti Malaya), En Jun Choong, Ethan Jeremiah Chitty, Yan Pheng Choo, John Hsin Yang Wong, Chern Eu Cheah
arxiv.org/abs/2508.05669

@prachisrivas@masto.ai
2025-06-04 13:13:26

This looks very cool.
'OpenAIRE in collaboration with Area Science Park organizes a hands-on workshop titled “Where LEGO Meets FAIR Data,” designed to introduce the principles of FAIR data through a creative, interactive simulation using LEGO metaphors.'

@arXiv_csAI_bot@mastoxiv.page
2025-08-06 07:32:40

Large Language Model-based Data Science Agent: A Survey
Peiran Wang, Yaoning Yu, Ke Chen, Xianyang Zhan, Haohan Wang
arxiv.org/abs/2508.02744

@datascience@genomic.social
2025-05-31 10:00:00

Overview of statistical concepts in datascience: could be usefull in case you need some preformulated text to share with stakeholders... towardsdatascience.com/ultimat

@netzschleuder@social.skewed.de
2025-07-10 11:00:05

dblp_cite: DBLP citations (2014)
Citations among papers contained in the DBLP computer science bibliography. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) Self-loops may be present. This snapshot from May 2014.
This network has 12590 nodes and 49759 edges.
Tags: Informational, Citation, Unweighted

dblp_cite: DBLP citations (2014). 12590 nodes, 49759 edges. https://networks.skewed.de/net/dblp_cite
@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-06-10 18:17:40

This arxiv.org/abs/2503.17945 has been replaced.
initial toot: mastoxiv.page/@a…

@arXiv_hepex_bot@mastoxiv.page
2025-06-10 17:41:50

This arxiv.org/abs/2412.04854 has been replaced.
initial toot: mastoxiv.page/@arXiv_hepe…

@arXiv_csDB_bot@mastoxiv.page
2025-06-09 07:32:02

Stream DaQ: Stream-First Data Quality Monitoring
Vasileios Papastergios, Anastasios Gounaris
arxiv.org/abs/2506.06147

@arXiv_physicssocph_bot@mastoxiv.page
2025-06-11 14:47:45

Replaced article(s) found for physics.soc-ph. arxiv.org/list/physics.soc-ph/
[1/1]:
Growth of Science and Women: Methodological Challenges of Using Structured Big Data

@arXiv_csCV_bot@mastoxiv.page
2025-07-10 08:05:31

Centralized Copy-Paste: Enhanced Data Augmentation Strategy for Wildland Fire Semantic Segmentation
Joon Tai Kim, Tianle Chen, Ziyu Dong, Nishanth Kunchala, Alexander Guller, Daniel Ospina Acero, Roger Williams, Mrinal Kumar
arxiv.org/abs/2507.06321

@arXiv_physicsbioph_bot@mastoxiv.page
2025-07-10 08:23:20

GloBIAS: strengthening the foundations of BioImage Analysis
A. A. Corbat (BioImage Informatics Unit, Science for Life Laboratory and Department of Information Technology, Uppsala University, Sweden), C. G. Walther (German BioImaging, Gesellschaft f\"ur Mikroskopie und Bildanalyse e.V., Konstanz, Germany, University of Vienna, Vienna, Austria), L. R. de la Ballina (Centre for Cancer Cell Reprogramming, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Montebe…

@arXiv_qfinST_bot@mastoxiv.page
2025-06-10 09:46:13

DELPHYNE: A Pre-Trained Model for General and Financial Time Series
Xueying Ding, Aakriti Mittal, Achintya Gopal
arxiv.org/abs/2506.06288

@arXiv_physicscompph_bot@mastoxiv.page
2025-06-09 09:21:02

Mapping correlations and coherence: adjacency-based approach to data visualization and regularity discovery
Guang-Xing Li
arxiv.org/abs/2506.05758

@arXiv_statOT_bot@mastoxiv.page
2025-08-07 08:53:44

A Blueprint to Design Curriculum and Pedagogy for Introductory Data Science
Elijah Meyer, Mine \c{C}etinkaya-Rundel
arxiv.org/abs/2508.03952

@arXiv_csSE_bot@mastoxiv.page
2025-06-09 08:28:32

MLOps with Microservices: A Case Study on the Maritime Domain
Renato Cordeiro Ferreira (Jheronimus Academy of Data Science, Technical University of Eindhoven, Tilburg University), Rowanne Trapmann (Jheronimus Academy of Data Science, Technical University of Eindhoven, Tilburg University), Willem-Jan van den Heuvel (Jheronimus Academy of Data Science, Technical University of Eindhoven, Tilburg University)

@arXiv_csCR_bot@mastoxiv.page
2025-06-09 07:36:22

Explainer-guided Targeted Adversarial Attacks against Binary Code Similarity Detection Models
Mingjie Chen (Zhejiang University), Tiancheng Zhu (Huazhong University of Science,Technology), Mingxue Zhang (The State Key Laboratory of Blockchain,Data Security, Zhejiang University,Hangzhou High-Tech Zone), Yiling He (University College London), Minghao Lin (University of Southern California), Penghui Li (Columbia University), Kui Ren (The State Key Laboratory of Blockchain,Data Security, Z…

Two of NASA's historic data-collecting missions
— used by scientists and earthbound agriculturalists to track carbon dioxide and crop health
— ❌ may be permanently grounded as the Trump administration looks to shrink the agency's spending.
When they launched over a decade ago,
the satellites known as the
"Orbiting Carbon Observatories" (OCOs) revolutionized the collection of carbon data and greenhouse gas science.
To put it simply, the …

@arXiv_csDL_bot@mastoxiv.page
2025-06-05 07:17:10

What Does Information Science Offer for Data Science Research?: A Review of Data and Information Ethics Literature
Brady D. Lund, Ting Wang
arxiv.org/abs/2506.03165

@arXiv_csLG_bot@mastoxiv.page
2025-07-09 09:46:12

Explainable Hierarchical Deep Learning Neural Networks (Ex-HiDeNN)
Reza T. Batley, Chanwook Park, Wing Kam Liu, Sourav Saha
arxiv.org/abs/2507.05498

@villavelius@mastodon.online
2025-07-26 11:48:10

"The Seven Capital Sins of Open Science"
1. Worshiping the 'age factor'
2. Ignoring the value of data reuse and complexity
3. Disrespecting other disciplines
4. Publishing data without a supplementary paper
5. Creating and maintaining a nightmare for machines
6. Refusing to support investment in general infrastructure
7. Creating data without a FAIR and explicit data stewardship plan.

@cosmos4u@scicomm.xyz
2025-07-08 01:32:24

47 Tuc in Rubin Data Preview 1 - Exploring Early LSST Data and Science Potential: #Rubin Observatory is Just Getting Started: universetoday.com/articles/glo (based on ComCam images just released in bulk).

@PaulWermer@sfba.social
2025-06-27 20:54:16

Please don't call it "politicized science". What #RFKjr and his ilk are doing is not science. Science does not discard data that disagrees with the researchers desired outcomes.
They are purposefully ignoring and hiding any information that contradicts their beliefs. That is not science, that is censorship and lies in favor of an ideology.

@paulwermer@sfba.social
2025-06-27 20:54:16

Please don't call it "politicized science". What #RFKjr and his ilk are doing is not science. Science does not discard data that disagrees with the researchers desired outcomes.
They are purposefully ignoring and hiding any information that contradicts their beliefs. That is not science, that is censorship and lies in favor of an ideology.

@mia@hcommons.social
2025-07-17 15:45:23

#DH2025, we're delighted to share a Turing's Humanities and Data Science event in Oxford and online on 25th Sept with a panel asking: 'How far can data science and the humanities help to answer each other’s questions?'
Express your interest here:

@netzschleuder@social.skewed.de
2025-07-09 09:00:04

cora: CORA citations (1998)
Citations among papers indexed by CORA, from 1998, an early computer science research paper search engine. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) Self-loops may be present. The dates of these snapshots are uncertain.
This network has 23166 nodes and 91500 edges.
Tags: Informational, Citation, Unweighted

cora: CORA citations (1998). 23166 nodes, 91500 edges. https://networks.skewed.de/net/cora
@arXiv_statML_bot@mastoxiv.page
2025-06-06 07:39:49

Nonlinear Causal Discovery for Grouped Data
Konstantin G\"obler, Tobias Windisch, Mathias Drton
arxiv.org/abs/2506.05120

@Techmeme@techhub.social
2025-06-22 09:01:15

Inside the Vera C. Rubin Observatory, whose 3.2-gigapixel camera will produce 60PB of space image data over 10 years, to be analyzed using ML and deep learning (New York Times)
nytimes.com/2025/06/20/science…

@arXiv_astrophGA_bot@mastoxiv.page
2025-06-06 07:29:20

The future of gravitational wave science unlocking LIGO potential: AI-driven data analysis and exploration
Yong Xiao, Li, Zin Nandar Win, He Wang, Hla Myo Tun, Win Thu Zar
arxiv.org/abs/2506.04584

@arXiv_csIT_bot@mastoxiv.page
2025-06-09 07:45:42

On Inverse Problems, Parameter Estimation, and Domain Generalization
Deborah Pereg
arxiv.org/abs/2506.06024 arxiv.org…

@rae@bne.social
2025-06-05 07:02:19

Next investigation should be into the councillors themselves
Warrnambool council abandons peer-reviewed flood study, citing 'supposed science' - ABC News
abc.net.au/news/2025-06-05/reg

@arXiv_csHC_bot@mastoxiv.page
2025-06-09 07:56:32

A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data
Lama Alqazlan, Zheng Fang, Michael Castelle, Rob Procter
arxiv.org/abs/2506.06083

@arXiv_csDC_bot@mastoxiv.page
2025-06-04 07:29:17

D-Rex: Heterogeneity-Aware Reliability Framework and Adaptive Algorithms for Distributed Storage
Maxime Gonthier (University of Chicago, Argonne National Laboratory), Dante D. Sanchez-Gallegos (Universidad Carlos III de Madrid), Haochen Pan (University of Chicago), Bogdan Nicolae (Argonne National Laboratory), Sicheng Zhou (Southern University of Science and Technology), Hai Duc Nguyen (University of Chicago, Argonne National Laboratory), Valerie Hayot-Sasson (University of Chicago, Ar…

@arXiv_mathNA_bot@mastoxiv.page
2025-07-08 10:28:51

Bilinear Quadratic Output Systems and Balanced Truncation
Heike Fa{\ss}bender (Institute for Numerical Analysis, TU Braunschweig), Serkan Gugercin (Department of Mathematics and Division of Computational Modeling and Data Analytics, Academy of Data Science, Virginia Tech), Till Peters (Institute for Numerical Analysis, TU Braunschweig)

@bthalpin@mastodon.social
2025-05-26 11:57:45

We're recruiting on our MSc Sociology and Data Analytics for next September at the University of Limerick @…

@arXiv_astrophIM_bot@mastoxiv.page
2025-06-02 07:29:19

The SPHEREx Sky Simulator: Science Data Modeling for the First All-Sky Near-Infrared Spectral Survey
Brendan P. Crill, Yoonsoo P. Bach, Sean A. Bryan, Jean Choppin de Janvry, Ari J. Cukierman, C. Darren Dowell, Spencer W. Everett, Candice Fazar, Tatiana Goldina, Zhaoyu Huai, Howard Hui, Woong-Seob Jeong, Jae Hwan Kang, Phillip M. Korngut, Jae Joon Lee, Daniel C. Masters, Chi H. Nguyen, Jeonghyun Pyo, Teresa Symons, Yujin Yang, Michael Zemcov, Rachel Akeson, Matthew L. N. Ashby, James J…

@primonatura@mstdn.social
2025-06-12 17:00:38

"Violinist composes music from moth flight data to highlight insect decline"
#Music #Insects #Animals

@arXiv_statCO_bot@mastoxiv.page
2025-06-10 18:26:10

This arxiv.org/abs/2502.06753 has been replaced.
link: scholar.google.com/scholar?q=a

@arXiv_csNI_bot@mastoxiv.page
2025-06-02 07:20:23

Trustworthy Provenance for Big Data Science: a Modular Architecture Leveraging Blockchain in Federated Settings
Nicola Giuseppe Marchioro, Yannis Velegrakis, Valentine Anantharaj, Ian Foster, Sandro Luigi Fiore
arxiv.org/abs/2505.24675

@kurtsh@mastodon.social
2025-06-13 21:45:24

Microsoft has once again been named a Leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning (DSML) Platforms.
azure.microsoft.com/en-us/blog

@arXiv_statAP_bot@mastoxiv.page
2025-08-11 09:12:20

The Impact of Carbon Targets on Firms' Carbon Performance
Xichen Sun, Xingzhi Jia, Rogelio Oliva
arxiv.org/abs/2508.05811 arxiv.org/pdf…

@blakes7bot@mas.torpidity.net
2025-06-06 06:19:21

#Blakes7 Series B, Episode 06 - Trial
THANIA: We reserve our opening declaration, sir.
SAMOR: Very well. Enter prosecution data. [A clerk presses some buttons.]
blake.torpidity.net/m/206/53

Claude 3.7 describes the image as: "This image appears to be from a science fiction television production, featuring a person in a sleek black uniform with a high collar and military-style design. The setting has a minimalist, futuristic aesthetic with vertical white panels forming the background, giving it a sterile, possibly spaceship or high-tech facility appearance. The subject is shown in profile, wearing their dark hair slicked back, which complements the austere, utilitarian style of the…
@arXiv_mathST_bot@mastoxiv.page
2025-06-05 07:27:43

Observable Covariance and Principal Observable Analysis for Data on Metric Spaces
Ece Karacam, Washington Mio, Osman Berat Okutan
arxiv.org/abs/2506.04003

@gedankenstuecke@scholar.social
2025-05-30 22:14:08

lol, basically every single example in this post shows how the LLM is just generating context that's not in the actual image. But somehow this is sold as being better than "classical" computer vision.
I don't know folks, if I actually wanted to do "data science", with focus on the "science" bit, I'd be disturbed by that. 🤷‍♂️
fosstodon.org/@Posit/114597245

@netzschleuder@social.skewed.de
2025-08-07 21:00:04

cora: CORA citations (1998)
Citations among papers indexed by CORA, from 1998, an early computer science research paper search engine. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) Self-loops may be present. The dates of these snapshots are uncertain.
This network has 23166 nodes and 91500 edges.
Tags: Informational, Citation, Unweighted

cora: CORA citations (1998). 23166 nodes, 91500 edges. https://networks.skewed.de/net/cora
@arXiv_csDS_bot@mastoxiv.page
2025-08-08 08:51:02

Online Sparsification of Bipartite-Like Clusters in Graphs
Joyentanuj Das, Suranjan De, He Sun
arxiv.org/abs/2508.05437 arxiv.org/pdf/2508.…

@arXiv_astrophSR_bot@mastoxiv.page
2025-07-03 09:13:10

47 Tuc in Rubin Data Preview 1: Exploring Early LSST Data and Science Potential
Yumi Choi (David), Knut A. G. Olsen (David), Jeffrey L. Carlin (David), Yuankun (David), Wang, Fred Moolekamp, Abi Saha, Ian Sullivan, Colin T. Slater, Douglas L. Tucker, Christina L. Adair, Peter S. Ferguson, Yijung Kang, Karla Pe\~na Ram\'irez, Markus Rabus

@awinkler@openbiblio.social
2025-08-01 11:07:30

Wasser auf @… s Mühlen:
Dieser Aufsatz setzt sich mit Definitionen von 'reuse' auseinander und kommt zum Schluss, dass use = reuse, man also der Einfachheit halber schlicht auch von "use" sprechen kann. Also auch: Nutzungsszenarien, nutzbare Daten. Schade um das R in

@arXiv_statOT_bot@mastoxiv.page
2025-06-05 07:39:59

Pivoting the paradigm: the role of spreadsheets in K-12 data science
Oren Tirschwell, Nicholas Jon Horton
arxiv.org/abs/2506.03232

@tiotasram@kolektiva.social
2025-08-04 15:49:00

Should we teach vibe coding? Here's why not.
Should AI coding be taught in undergrad CS education?
1/2
I teach undergraduate computer science labs, including for intro and more-advanced core courses. I don't publish (non-negligible) scholarly work in the area, but I've got years of craft expertise in course design, and I do follow the academic literature to some degree. In other words, In not the world's leading expert, but I have spent a lot of time thinking about course design, and consider myself competent at it, with plenty of direct experience in what knowledge & skills I can expect from students as they move through the curriculum.
I'm also strongly against most uses of what's called "AI" these days (specifically, generative deep neutral networks as supplied by our current cadre of techbro). There are a surprising number of completely orthogonal reasons to oppose the use of these systems, and a very limited number of reasonable exceptions (overcoming accessibility barriers is an example). On the grounds of environmental and digital-commons-pollution costs alone, using specifically the largest/newest models is unethical in most cases.
But as any good teacher should, I constantly question these evaluations, because I worry about the impact on my students should I eschew teaching relevant tech for bad reasons (and even for his reasons). I also want to make my reasoning clear to students, who should absolutely question me on this. That inspired me to ask a simple question: ignoring for one moment the ethical objections (which we shouldn't, of course; they're very stark), at what level in the CS major could I expect to teach a course about programming with AI assistance, and expect students to succeed at a more technically demanding final project than a course at the same level where students were banned from using AI? In other words, at what level would I expect students to actually benefit from AI coding "assistance?"
To be clear, I'm assuming that students aren't using AI in other aspects of coursework: the topic of using AI to "help you study" is a separate one (TL;DR it's gross value is not negative, but it's mostly not worth the harm to your metacognitive abilities, which AI-induced changes to the digital commons are making more important than ever).
So what's my answer to this question?
If I'm being incredibly optimistic, senior year. Slightly less optimistic, second year of a masters program. Realistic? Maybe never.
The interesting bit for you-the-reader is: why is this my answer? (Especially given that students would probably self-report significant gains at lower levels.) To start with, [this paper where experienced developers thought that AI assistance sped up their work on real tasks when in fact it slowed it down] (arxiv.org/abs/2507.09089) is informative. There are a lot of differences in task between experienced devs solving real bugs and students working on a class project, but it's important to understand that we shouldn't have a baseline expectation that AI coding "assistants" will speed things up in the best of circumstances, and we shouldn't trust self-reports of productivity (or the AI hype machine in general).
Now we might imagine that coding assistants will be better at helping with a student project than at helping with fixing bugs in open-source software, since it's a much easier task. For many programming assignments that have a fixed answer, we know that many AI assistants can just spit out a solution based on prompting them with the problem description (there's another elephant in the room here to do with learning outcomes regardless of project success, but we'll ignore this over too, my focus here is on project complexity reach, not learning outcomes). My question is about more open-ended projects, not assignments with an expected answer. Here's a second study (by one of my colleagues) about novices using AI assistance for programming tasks. It showcases how difficult it is to use AI tools well, and some of these stumbling blocks that novices in particular face.
But what about intermediate students? Might there be some level where the AI is helpful because the task is still relatively simple and the students are good enough to handle it? The problem with this is that as task complexity increases, so does the likelihood of the AI generating (or copying) code that uses more complex constructs which a student doesn't understand. Let's say I have second year students writing interactive websites with JavaScript. Without a lot of care that those students don't know how to deploy, the AI is likely to suggest code that depends on several different frameworks, from React to JQuery, without actually setting up or including those frameworks, and of course three students would be way out of their depth trying to do that. This is a general problem: each programming class carefully limits the specific code frameworks and constructs it expects students to know based on the material it covers. There is no feasible way to limit an AI assistant to a fixed set of constructs or frameworks, using current designs. There are alternate designs where this would be possible (like AI search through adaptation from a controlled library of snippets) but those would be entirely different tools.
So what happens on a sizeable class project where the AI has dropped in buggy code, especially if it uses code constructs the students don't understand? Best case, they understand that they don't understand and re-prompt, or ask for help from an instructor or TA quickly who helps them get rid of the stuff they don't understand and re-prompt or manually add stuff they do. Average case: they waste several hours and/or sweep the bugs partly under the rug, resulting in a project with significant defects. Students in their second and even third years of a CS major still have a lot to learn about debugging, and usually have significant gaps in their knowledge of even their most comfortable programming language. I do think regardless of AI we as teachers need to get better at teaching debugging skills, but the knowledge gaps are inevitable because there's just too much to know. In Python, for example, the LLM is going to spit out yields, async functions, try/finally, maybe even something like a while/else, or with recent training data, the walrus operator. I can't expect even a fraction of 3rd year students who have worked with Python since their first year to know about all these things, and based on how students approach projects where they have studied all the relevant constructs but have forgotten some, I'm not optimistic seeing these things will magically become learning opportunities. Student projects are better off working with a limited subset of full programming languages that the students have actually learned, and using AI coding assistants as currently designed makes this impossible. Beyond that, even when the "assistant" just introduces bugs using syntax the students understand, even through their 4th year many students struggle to understand the operation of moderately complex code they've written themselves, let alone written by someone else. Having access to an AI that will confidently offer incorrect explanations for bugs will make this worse.
To be sure a small minority of students will be able to overcome these problems, but that minority is the group that has a good grasp of the fundamentals and has broadened their knowledge through self-study, which earlier AI-reliant classes would make less likely to happen. In any case, I care about the average student, since we already have plenty of stuff about our institutions that makes life easier for a favored few while being worse for the average student (note that our construction of that favored few as the "good" students is a large part of this problem).
To summarize: because AI assistants introduce excess code complexity and difficult-to-debug bugs, they'll slow down rather than speed up project progress for the average student on moderately complex projects. On a fixed deadline, they'll result in worse projects, or necessitate less ambitious project scoping to ensure adequate completion, and I expect this remains broadly true through 4-6 years of study in most programs (don't take this as an endorsement of AI "assistants" for masters students; we've ignored a lot of other problems along the way).
There's a related problem: solving open-ended project assignments well ultimately depends on deeply understanding the problem, and AI "assistants" allow students to put a lot of code in their file without spending much time thinking about the problem or building an understanding of it. This is awful for learning outcomes, but also bad for project success. Getting students to see the value of thinking deeply about a problem is a thorny pedagogical puzzle at the best of times, and allowing the use of AI "assistants" makes the problem much much worse. This is another area I hope to see (or even drive) pedagogical improvement in, for what it's worth.
1/2

@cosmos4u@scicomm.xyz
2025-07-02 23:16:46

Now settled into low-Earth orbit, #SPHEREx (Spectro-Photometer for the History of the Universe, Epoch of Reionization, and Ices Explorer) has begun delivering its sky survey data to a public archive on a weekly basis, allowing anyone to use the data to probe the secrets of the cosmos: science.nasa.gov/open-science/

@servelan@newsie.social
2025-07-20 01:47:30

23andMe's Data Sold to Nonprofit Run by Its Co-Founder - 'And I Still Don't Trust It' - Slashdot
science.slashdot.org/story/25/

@arXiv_mathDS_bot@mastoxiv.page
2025-07-08 10:24:20

Retrodicting Chaotic Systems: An Algorithmic Information Theory Approach
Kamal Dingle, Boumediene Hamzi, Marcus Hutter, Houman Owhadi
arxiv.org/abs/2507.04780

@arXiv_csSE_bot@mastoxiv.page
2025-06-10 08:53:42

Is Your Training Pipeline Production-Ready? A Case Study in the Healthcare Domain
Daniel Lawand (University of S\~ao Paulo), Lucas Quaresma (University of S\~ao Paulo), Roberto Bolgheroni (University of S\~ao Paulo), Alfredo Goldman (University of S\~ao Paulo), Renato Cordeiro Ferreira (University of S\~ao Paulo, Jheronimus Academy of Data Science, Technical University of Eindhoven, Tilburg University)

@arXiv_csDB_bot@mastoxiv.page
2025-07-03 07:40:30

Data Agent: A Holistic Architecture for Orchestrating Data AI Ecosystems
Zhaoyan Sun, Jiayi Wang, Xinyang Zhao, Jiachi Wang, Guoliang Li
arxiv.org/abs/2507.01599

@arXiv_csIR_bot@mastoxiv.page
2025-07-03 08:44:00

A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval
Puspendu Banerjee, Aritra Mazumdar, Wazib Ansar, Saptarsi Goswami, Amlan Chakrabarti
arxiv.org/abs/2507.01058

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-07-08 11:41:50

CEMP: a platform unifying high-throughput online calculation, databases and predictive models for clean energy materials
Jifeng Wang, Jiazhe Ju, Ying Wang
arxiv.org/abs/2507.04423

@netzschleuder@social.skewed.de
2025-08-06 23:00:04

dblp_cite: DBLP citations (2014)
Citations among papers contained in the DBLP computer science bibliography. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) Self-loops may be present. This snapshot from May 2014.
This network has 12590 nodes and 49759 edges.
Tags: Informational, Citation, Unweighted

dblp_cite: DBLP citations (2014). 12590 nodes, 49759 edges. https://networks.skewed.de/net/dblp_cite
@arXiv_csLG_bot@mastoxiv.page
2025-06-05 11:00:02

This arxiv.org/abs/2505.24603 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…

@mia@hcommons.social
2025-07-18 14:12:16

Shared on Bluesky from a different #DH2025 'Nwulite Obodo Open Data License — Made for sharing African datasets equitably' datasciencelawlab.africa/nwuli

@Techmeme@techhub.social
2025-07-23 15:13:19

Meta debuts a prototype wristband to read electrical signals from forearm muscles, letting users control devices without touch, trained on 10K peoples' EMG data (Cade Metz/New York Times)
nytimes.com/2025/07/23/science

@arXiv_hepex_bot@mastoxiv.page
2025-06-09 08:54:02

Challenging Spontaneous Quantum Collapse with XENONnT
E. Aprile, J. Aalbers, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, D. Ant\'on Martin, S. R. Armbruster, F. Arneodo, L. Baudis, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, K. Boese, A. Brown, G. Bruno, R. Budnik, C. Cai, C. Capelli, J. M. R. Cardoso, A. P. Cimental Ch\'avez, A. P. Colijn, J. Conrad, J. J. Cuenca-Garc\'ia, C. Curceanu, V. D'Andrea, L. C. Daniel Garcia, M. P. Decowski, A. Deist…

@arXiv_physicssocph_bot@mastoxiv.page
2025-06-06 07:35:49

Edge interventions can mitigate demographic and prestige disparities in the Computer Science coauthorship network
Kate Barnes, Mia Ellis-Einhorn, Carolina Ch\'avez-Ruelas, Nayera Hasan, Mohammad Fanous, Blair D. Sullivan, Sorelle Friedler, Aaron Clauset
arxiv.org/abs/2506.04435

@arXiv_mathNA_bot@mastoxiv.page
2025-08-07 08:41:14

The Ubiquitous Sparse Matrix-Matrix Products
Ayd{\i}n Bulu\c{c}
arxiv.org/abs/2508.04077 arxiv.org/pdf/2508.04077

@netzschleuder@social.skewed.de
2025-07-05 22:00:04

cs_department: Aarhus Computer Science department relationships
Multiplex network consisting of 5 edge types corresponding to online and offline relationships (Facebook, leisure, work, co-authorship, lunch) between employees of the Computer Science department at Aarhus. Data hosted by Manlio De Domenico.
This network has 61 nodes and 620 edges.
Tags: Social, Relationships, Multilayer, Unweighted

cs_department: Aarhus Computer Science department relationships. 61 nodes, 620 edges. https://networks.skewed.de/net/cs_department
@arXiv_statAP_bot@mastoxiv.page
2025-06-09 09:50:52

Analysis of points outcome in ATP Grand Slam Tennis using big data and machine learning
Martin Illum (Department of Applied Mathematics and Computer Science, Technical University of Denmark, Richard Petersens Plads, Denmark), Hans Christian Bechs{\o}fft Mikkelsen (Department of Applied Mathematics and Computer Science, Technical University of Denmark, Richard Petersens Plads, Denmark), Emil Hovad (Department of Applied Mathematics and Computer Science, Technical University of Denmark, …

@arXiv_csAI_bot@mastoxiv.page
2025-06-24 11:54:00

Airalogy: AI-empowered universal data digitization for research automation
Zijie Yang, Qiji Zhou, Fang Guo, Sijie Zhang, Yexun Xi, Jinglei Nie, Yudian Zhu, Liping Huang, Chou Wu, Yonghe Xia, Xiaoyu Ma, Yingming Pu, Panzhong Lu, Junshu Pan, Mingtao Chen, Tiannan Guo, Yanmei Dou, Hongyu Chen, Anping Zeng, Jiaxing Huang, Tian Xu, Yue Zhang

@arXiv_csCR_bot@mastoxiv.page
2025-07-01 11:15:43

Poisoning Attacks to Local Differential Privacy for Ranking Estimation
Pei Zhan (School of Cyber Science and Technology, Shandong University, State Key Laboratory of Cryptography and Digital Economy Security, Shandong University, Qingdao, China), Peng Tang (School of Cyber Science and Technology, Shandong University, State Key Laboratory of Cryptography and Digital Economy Security, Shandong University, Qingdao, China), Yangzhuo Li (School of Cyber Science and Technology, Shandong Univ…

@arXiv_csDB_bot@mastoxiv.page
2025-07-29 09:44:21

Data Cleaning of Data Streams
Valerie Restat, Niklas Rodenhausen, Carina Antonin, Uta St\"orl
arxiv.org/abs/2507.20839 arxiv.org/pdf/2…

@arXiv_astrophGA_bot@mastoxiv.page
2025-06-03 07:45:53

Millimeter-wave observations of Euclid Deep Field South using the South Pole Telescope: A data release of temperature maps and catalogs
M. Archipley, A. Hryciuk, L. E. Bleem, K. Kornoelje, M. Klein, A. J. Anderson, B. Ansarinejad, M. Aravena, L. Balkenhol, P. S. Barry, K. Benabed, A. N. Bender, B. A. Benson, F. Bianchini, S. Bocquet, F. R. Bouchet, E. Camphuis, M. G. Campitiello, J. E. Carlstrom, J. Cathey, C. L. Chang, S. C. Chapman, P. Chaubal, P. M. Chichura, A. Chokshi, T. -L. Chou…

@netzschleuder@social.skewed.de
2025-07-05 04:00:04

cs_department: Aarhus Computer Science department relationships
Multiplex network consisting of 5 edge types corresponding to online and offline relationships (Facebook, leisure, work, co-authorship, lunch) between employees of the Computer Science department at Aarhus. Data hosted by Manlio De Domenico.
This network has 61 nodes and 620 edges.
Tags: Social, Relationships, Multilayer, Unweighted

cs_department: Aarhus Computer Science department relationships. 61 nodes, 620 edges. https://networks.skewed.de/net/cs_department
@mia@hcommons.social
2025-07-19 09:23:06

#DH2025 thanks @flochiff.bsky.social for sharing this link as I wanted to follow up on Pandore! 'Pandore: automating text-processing workflows for humanities researchers' from Sorbonne Université and ObTIC - Observatoire des textes, des idées et des corpus

@netzschleuder@social.skewed.de
2025-06-05 04:00:03

faculty_hiring: Faculty hiring networks (Comp. Sci., Business, History)
Three networks of faculty hiring in Computer Science Departments, Business Schools, and History Departments. Each node is a PhD-granting institution in the respective field, and a directed edge (i,j) indicates that a person received their PhD from node i and was tenure-track faculty at node j during time of collection (2011-2013). All data collected from faculty public rosters at the sampled institutions.
Thi…

faculty_hiring: Faculty hiring networks (Comp. Sci., Business, History). 206 nodes, 4988 edges. https://networks.skewed.de/net/faculty_hiring#computer_science
@cosmos4u@scicomm.xyz
2025-07-03 23:04:17

Carbonate formation and fluctuating habitability on Mars: #Mars Science Laboratory Curiosity rover data may explain why planet was likely harsh desert for most of recent past.

@arXiv_csHC_bot@mastoxiv.page
2025-07-23 08:44:52

Buckaroo: A Direct Manipulation Visual Data Wrangler
Annabelle Warner, Andrew McNutt, Paul Rosen, El Kindi Rezig
arxiv.org/abs/2507.16073

@netzschleuder@social.skewed.de
2025-06-04 05:00:04

cora: CORA citations (1998)
Citations among papers indexed by CORA, from 1998, an early computer science research paper search engine. If a paper i cites a paper j also in this data set, then a directed edge connects i to j. (Papers not in the data set are excluded.) Self-loops may be present. The dates of these snapshots are uncertain.
This network has 23166 nodes and 91500 edges.
Tags: Informational, Citation, Unweighted

cora: CORA citations (1998). 23166 nodes, 91500 edges. https://networks.skewed.de/net/cora
@arXiv_csDB_bot@mastoxiv.page
2025-06-24 08:48:30

Learning Lineage Constraints for Data Science Operations
Jinjin Zhao
arxiv.org/abs/2506.18252 arxiv.org/pdf/2506.1825…

@netzschleuder@social.skewed.de
2025-06-04 14:00:04

faculty_hiring: Faculty hiring networks (Comp. Sci., Business, History)
Three networks of faculty hiring in Computer Science Departments, Business Schools, and History Departments. Each node is a PhD-granting institution in the respective field, and a directed edge (i,j) indicates that a person received their PhD from node i and was tenure-track faculty at node j during time of collection (2011-2013). All data collected from faculty public rosters at the sampled institutions.
Thi…

faculty_hiring: Faculty hiring networks (Comp. Sci., Business, History). 206 nodes, 4988 edges. https://networks.skewed.de/net/faculty_hiring#computer_science
@arXiv_csDB_bot@mastoxiv.page
2025-07-21 07:55:20

Towards Next Generation Data Engineering Pipelines
Kevin M. Kramer, Valerie Restat, Sebastian Strasser, Uta St\"orl, Meike Klettke
arxiv.org/abs/2507.13892

@netzschleuder@social.skewed.de
2025-06-28 23:00:04

faculty_hiring: Faculty hiring networks (Comp. Sci., Business, History)
Three networks of faculty hiring in Computer Science Departments, Business Schools, and History Departments. Each node is a PhD-granting institution in the respective field, and a directed edge (i,j) indicates that a person received their PhD from node i and was tenure-track faculty at node j during time of collection (2011-2013). All data collected from faculty public rosters at the sampled institutions.
Thi…

faculty_hiring: Faculty hiring networks (Comp. Sci., Business, History). 206 nodes, 4988 edges. https://networks.skewed.de/net/faculty_hiring#computer_science
@netzschleuder@social.skewed.de
2025-06-02 15:00:04

faculty_hiring: Faculty hiring networks (Comp. Sci., Business, History)
Three networks of faculty hiring in Computer Science Departments, Business Schools, and History Departments. Each node is a PhD-granting institution in the respective field, and a directed edge (i,j) indicates that a person received their PhD from node i and was tenure-track faculty at node j during time of collection (2011-2013). All data collected from faculty public rosters at the sampled institutions.
Thi…

faculty_hiring: Faculty hiring networks (Comp. Sci., Business, History). 145 nodes, 4538 edges. https://networks.skewed.de/net/faculty_hiring#history
@netzschleuder@social.skewed.de
2025-08-03 10:00:04

sp_infectious: Art exhibit dynamic contacts (2011)
This dataset contains the daily dynamic contact networks collected during the Infectious SocioPatterns event that took place at the Science Gallery in Dublin, Ireland, during the artscience exhibition INFECTIOUS: STAY AWAY. Each file in the downloadable package contains a tab-separated list representing the active contacts during 20-second intervals of one day of data collection. Each line has the form “t i j“, where i and j are the a…

sp_infectious: Art exhibit dynamic contacts (2011). 10972 nodes, 415912 edges. https://networks.skewed.de/net/sp_infectious
@netzschleuder@social.skewed.de
2025-08-03 08:00:04

sp_infectious: Art exhibit dynamic contacts (2011)
This dataset contains the daily dynamic contact networks collected during the Infectious SocioPatterns event that took place at the Science Gallery in Dublin, Ireland, during the artscience exhibition INFECTIOUS: STAY AWAY. Each file in the downloadable package contains a tab-separated list representing the active contacts during 20-second intervals of one day of data collection. Each line has the form “t i j“, where i and j are the a…

sp_infectious: Art exhibit dynamic contacts (2011). 10972 nodes, 415912 edges. https://networks.skewed.de/net/sp_infectious
@netzschleuder@social.skewed.de
2025-07-24 18:00:04

cs_department: Aarhus Computer Science department relationships
Multiplex network consisting of 5 edge types corresponding to online and offline relationships (Facebook, leisure, work, co-authorship, lunch) between employees of the Computer Science department at Aarhus. Data hosted by Manlio De Domenico.
This network has 61 nodes and 620 edges.
Tags: Social, Relationships, Multilayer, Unweighted

cs_department: Aarhus Computer Science department relationships. 61 nodes, 620 edges. https://networks.skewed.de/net/cs_department
@netzschleuder@social.skewed.de
2025-07-27 12:00:04

faculty_hiring: Faculty hiring networks (Comp. Sci., Business, History)
Three networks of faculty hiring in Computer Science Departments, Business Schools, and History Departments. Each node is a PhD-granting institution in the respective field, and a directed edge (i,j) indicates that a person received their PhD from node i and was tenure-track faculty at node j during time of collection (2011-2013). All data collected from faculty public rosters at the sampled institutions.
Thi…

faculty_hiring: Faculty hiring networks (Comp. Sci., Business, History). 206 nodes, 4988 edges. https://networks.skewed.de/net/faculty_hiring#computer_science
@netzschleuder@social.skewed.de
2025-05-31 23:00:04

sp_infectious: Art exhibit dynamic contacts (2011)
This dataset contains the daily dynamic contact networks collected during the Infectious SocioPatterns event that took place at the Science Gallery in Dublin, Ireland, during the artscience exhibition INFECTIOUS: STAY AWAY. Each file in the downloadable package contains a tab-separated list representing the active contacts during 20-second intervals of one day of data collection. Each line has the form “t i j“, where i and j are the a…

sp_infectious: Art exhibit dynamic contacts (2011). 10972 nodes, 415912 edges. https://networks.skewed.de/net/sp_infectious
@netzschleuder@social.skewed.de
2025-06-30 20:00:03

sp_infectious: Art exhibit dynamic contacts (2011)
This dataset contains the daily dynamic contact networks collected during the Infectious SocioPatterns event that took place at the Science Gallery in Dublin, Ireland, during the artscience exhibition INFECTIOUS: STAY AWAY. Each file in the downloadable package contains a tab-separated list representing the active contacts during 20-second intervals of one day of data collection. Each line has the form “t i j“, where i and j are the a…

sp_infectious: Art exhibit dynamic contacts (2011). 10972 nodes, 415912 edges. https://networks.skewed.de/net/sp_infectious