Tootfinder

Opt-in global Mastodon full text search. Join the index!

@thomasfuchs@hachyderm.io
2025-07-07 01:38:13

Even if “AI” worked (it doesn’t), there’s many reasons why you shouldn’t use it:
1. It’s destroying Internet sites that you love as you use chat bots instead of actually going to sources of information—this will cause them to be less active and eventually shut down.
2. Pollution and water use from server farms cause immediate harm; often—just like other heavy industry—these are built in underprivileged communities and harming poor people. Without any benefits as the big tech companies get tax breaks and don’t pay for power, while workers aren’t from the community but commute in.
3. The basic underlying models of any LLM rely on stolen data, even when specific extra data is obtained legally. Chatbots can’t learn to speak English just by reading open source code.
4. You’re fueling a speculation bubble that is costing many people their jobs—because the illusion of “efficiency” is kept up by firing people and counting that as profit.
5. Whenever you use the great cheat machine in the cloud you’re robbing yourself from doing real research, writing or coding—literally atrophying your brain and making you stupider.
It’s a grift, through and through.

@bthalpin@mastodon.social
2025-09-06 12:30:43

I'm putting together notes for novices to do simple data analysis with R and the fact that I'm telling them to "cut and paste this inscrutable block of code at the start of your file" reminds me of nothing so much as when I worked for the ESRI in Dublin in the mid 1980s and we used to run SPSS analyses remotely on UCD's Amdahl by sandwiching our SPSS code (on punch cards) between two decks of IBM Job Control Language cards of which we understood nothing whatsoever.

Block of R code:
library(data.table)
library(ggplot2)
library(epiDisplay)
library(foreign)
nlsw88 = read.dta("nlsw88.dta")
@tiotasram@kolektiva.social
2025-08-04 15:49:00

Should we teach vibe coding? Here's why not.
Should AI coding be taught in undergrad CS education?
1/2
I teach undergraduate computer science labs, including for intro and more-advanced core courses. I don't publish (non-negligible) scholarly work in the area, but I've got years of craft expertise in course design, and I do follow the academic literature to some degree. In other words, In not the world's leading expert, but I have spent a lot of time thinking about course design, and consider myself competent at it, with plenty of direct experience in what knowledge & skills I can expect from students as they move through the curriculum.
I'm also strongly against most uses of what's called "AI" these days (specifically, generative deep neutral networks as supplied by our current cadre of techbro). There are a surprising number of completely orthogonal reasons to oppose the use of these systems, and a very limited number of reasonable exceptions (overcoming accessibility barriers is an example). On the grounds of environmental and digital-commons-pollution costs alone, using specifically the largest/newest models is unethical in most cases.
But as any good teacher should, I constantly question these evaluations, because I worry about the impact on my students should I eschew teaching relevant tech for bad reasons (and even for his reasons). I also want to make my reasoning clear to students, who should absolutely question me on this. That inspired me to ask a simple question: ignoring for one moment the ethical objections (which we shouldn't, of course; they're very stark), at what level in the CS major could I expect to teach a course about programming with AI assistance, and expect students to succeed at a more technically demanding final project than a course at the same level where students were banned from using AI? In other words, at what level would I expect students to actually benefit from AI coding "assistance?"
To be clear, I'm assuming that students aren't using AI in other aspects of coursework: the topic of using AI to "help you study" is a separate one (TL;DR it's gross value is not negative, but it's mostly not worth the harm to your metacognitive abilities, which AI-induced changes to the digital commons are making more important than ever).
So what's my answer to this question?
If I'm being incredibly optimistic, senior year. Slightly less optimistic, second year of a masters program. Realistic? Maybe never.
The interesting bit for you-the-reader is: why is this my answer? (Especially given that students would probably self-report significant gains at lower levels.) To start with, [this paper where experienced developers thought that AI assistance sped up their work on real tasks when in fact it slowed it down] (arxiv.org/abs/2507.09089) is informative. There are a lot of differences in task between experienced devs solving real bugs and students working on a class project, but it's important to understand that we shouldn't have a baseline expectation that AI coding "assistants" will speed things up in the best of circumstances, and we shouldn't trust self-reports of productivity (or the AI hype machine in general).
Now we might imagine that coding assistants will be better at helping with a student project than at helping with fixing bugs in open-source software, since it's a much easier task. For many programming assignments that have a fixed answer, we know that many AI assistants can just spit out a solution based on prompting them with the problem description (there's another elephant in the room here to do with learning outcomes regardless of project success, but we'll ignore this over too, my focus here is on project complexity reach, not learning outcomes). My question is about more open-ended projects, not assignments with an expected answer. Here's a second study (by one of my colleagues) about novices using AI assistance for programming tasks. It showcases how difficult it is to use AI tools well, and some of these stumbling blocks that novices in particular face.
But what about intermediate students? Might there be some level where the AI is helpful because the task is still relatively simple and the students are good enough to handle it? The problem with this is that as task complexity increases, so does the likelihood of the AI generating (or copying) code that uses more complex constructs which a student doesn't understand. Let's say I have second year students writing interactive websites with JavaScript. Without a lot of care that those students don't know how to deploy, the AI is likely to suggest code that depends on several different frameworks, from React to JQuery, without actually setting up or including those frameworks, and of course three students would be way out of their depth trying to do that. This is a general problem: each programming class carefully limits the specific code frameworks and constructs it expects students to know based on the material it covers. There is no feasible way to limit an AI assistant to a fixed set of constructs or frameworks, using current designs. There are alternate designs where this would be possible (like AI search through adaptation from a controlled library of snippets) but those would be entirely different tools.
So what happens on a sizeable class project where the AI has dropped in buggy code, especially if it uses code constructs the students don't understand? Best case, they understand that they don't understand and re-prompt, or ask for help from an instructor or TA quickly who helps them get rid of the stuff they don't understand and re-prompt or manually add stuff they do. Average case: they waste several hours and/or sweep the bugs partly under the rug, resulting in a project with significant defects. Students in their second and even third years of a CS major still have a lot to learn about debugging, and usually have significant gaps in their knowledge of even their most comfortable programming language. I do think regardless of AI we as teachers need to get better at teaching debugging skills, but the knowledge gaps are inevitable because there's just too much to know. In Python, for example, the LLM is going to spit out yields, async functions, try/finally, maybe even something like a while/else, or with recent training data, the walrus operator. I can't expect even a fraction of 3rd year students who have worked with Python since their first year to know about all these things, and based on how students approach projects where they have studied all the relevant constructs but have forgotten some, I'm not optimistic seeing these things will magically become learning opportunities. Student projects are better off working with a limited subset of full programming languages that the students have actually learned, and using AI coding assistants as currently designed makes this impossible. Beyond that, even when the "assistant" just introduces bugs using syntax the students understand, even through their 4th year many students struggle to understand the operation of moderately complex code they've written themselves, let alone written by someone else. Having access to an AI that will confidently offer incorrect explanations for bugs will make this worse.
To be sure a small minority of students will be able to overcome these problems, but that minority is the group that has a good grasp of the fundamentals and has broadened their knowledge through self-study, which earlier AI-reliant classes would make less likely to happen. In any case, I care about the average student, since we already have plenty of stuff about our institutions that makes life easier for a favored few while being worse for the average student (note that our construction of that favored few as the "good" students is a large part of this problem).
To summarize: because AI assistants introduce excess code complexity and difficult-to-debug bugs, they'll slow down rather than speed up project progress for the average student on moderately complex projects. On a fixed deadline, they'll result in worse projects, or necessitate less ambitious project scoping to ensure adequate completion, and I expect this remains broadly true through 4-6 years of study in most programs (don't take this as an endorsement of AI "assistants" for masters students; we've ignored a lot of other problems along the way).
There's a related problem: solving open-ended project assignments well ultimately depends on deeply understanding the problem, and AI "assistants" allow students to put a lot of code in their file without spending much time thinking about the problem or building an understanding of it. This is awful for learning outcomes, but also bad for project success. Getting students to see the value of thinking deeply about a problem is a thorny pedagogical puzzle at the best of times, and allowing the use of AI "assistants" makes the problem much much worse. This is another area I hope to see (or even drive) pedagogical improvement in, for what it's worth.
1/2

@netzschleuder@social.skewed.de
2025-08-06 21:00:03

board_directors: Norwegian Boards of Directors (2002-2011)
224 networks of the affiliations among board directors due to sitting on common boards of Norwegian public limited companies (as of 5 August 2009), from May 2002 onward, in monthly snapshots through August 2011. Some metadata is included, such as director and company names, city and postal code for companies, and gender for directors. The 'net2m' data are bipartite company-director networks, while the 'net1m' ar…

board_directors: Norwegian Boards of Directors (2002-2011). 1238 nodes, 1148 edges. https://networks.skewed.de/net/board_directors#net2m_2002-07-01
@arXiv_csSE_bot@mastoxiv.page
2025-10-06 09:05:29

Automatic Building Code Review: A Case Study
Hanlong Wan, Weili Xu, Michael Rosenberg, Jian Zhang, Aysha Siddika
arxiv.org/abs/2510.02634 a…

@gedankenstuecke@scholar.social
2025-08-06 17:09:24

If anyone else who's in the #OpenStreetMap/#opensource cosmos needs a reason to stop using Organic Maps and switch to some free/open alternatives:
In addition to them rolling their own 'data license', it seems like their recent license modifications for code could also be viewed as violating the Apache license/the creation of a non-FOSS license too.
That the devs are unwilling to give a clear answer to those questions speaks volumes imho…
github.com/organicmaps/organic

@Techmeme@techhub.social
2025-07-31 12:25:41

Stack Overflow survey: 84% of developers use or plan to use AI tools in their workflow, up from 76% in 2024, and 33% trust AI accuracy, down from 43% in 2024 (Sean Michael Kerner/VentureBeat)
venturebeat.com/ai/stack-overf

@arXiv_csSE_bot@mastoxiv.page
2025-08-06 08:39:10

Interpreting Performance Profiles with Deep Learning
Zhuoran Liu
arxiv.org/abs/2508.02729 arxiv.org/pdf/2508.02729

@x_cli@infosec.exchange
2025-10-02 08:58:42

Just received an email from Jetbrains about data collection in their IDE
> We’re now adding the option to allow the collection of detailed code‑related data pertaining to IDE activity, such as edit history, terminal usage, and your interactions with AI features. This may include code snippets, prompt text, and AI responses.
> If you’re using a non-commercial license, detailed code‑related data collection will be enabled as part of your next IDE update – you will be notified …

@tiotasram@kolektiva.social
2025-08-04 15:49:39

Should we teach vibe coding? Here's why not.
2/2
To address the bigger question I started with ("should we teach AI-"assisted" coding?"), my answer is: "No, except enough to show students directly what its pitfalls are." We have little enough time as it is to cover the core knowledge that they'll need, which has become more urgent now that they're going to be expected to clean up AI bugs and they'll have less time to develop an understanding of the problems they're supposed to be solving. The skill of prompt engineering & other skills of working with AI are relatively easy to pick up on your own, given a decent not-even-mathematical understanding of how a neutral network works, which is something we should be giving to all students, not just our majors.
Reasonable learning objectives for CS majors might include explaining what types of bugs an AI "assistant" is most likely to introduce, explaining the difference between software engineering and writing code, explaining why using an AI "assistant" is likely to violate open-source licenses, listing at lest three independent ethical objections to contemporary LLMs and explaining the evidence for/reasoning behind them, explaining why we should expect AI "assistants" to be better at generating code from scratch than at fixing bugs in existing code (and why they'll confidently "claim" to have fixed problems they haven't), and even fixing bugs in AI generated code (without AI "assistance").
If we lived in a world where the underlying environmental, labor, and data commons issues with AI weren't as bad, or if we could find and use systems that effectively mitigate these issues (there's lots of piecemeal progress on several of these) then we should probably start teaching an elective on coding with an assistant to students who have mastered programming basics, but such a class should probably spend a good chunk of time on non-assisted debugging.
#AI #LLMs #VibeCoding

@arXiv_eessSP_bot@mastoxiv.page
2025-09-03 11:09:53

Lightweight Error-Correction Code Encoders in Superconducting Electronic Systems
Yerzhan Mustafa, Berker Pek\"oz, Sel\c{c}uk K\"ose
arxiv.org/abs/2509.00962

@arXiv_csLO_bot@mastoxiv.page
2025-08-04 07:51:41

Building Bigraphs of the real world
Kang Rong Roy Ang
arxiv.org/abs/2508.00003 arxiv.org/pdf/2508.00003

@arXiv_csSE_bot@mastoxiv.page
2025-08-04 08:24:41

How Quantization Impacts Privacy Risk on LLMs for Code?
Md Nazmul Haque, Hua Yang, Zhou Yang, Bowen Xu
arxiv.org/abs/2508.00128 arxiv.org/p…

@arXiv_eessSY_bot@mastoxiv.page
2025-09-03 12:32:33

IndusGCC: A Data Benchmark and Evaluation Framework for GUI-Based General Computer Control in Industrial Automation
Xiaoran Yang, Yuyang Du, Kexin Chen, Soung Chang Liew, Jiamin Lu, Ziyu Guo, Xiaoyan Liu, Qun Yang, Shiqi Xu, Xingyu Fan, Yuchen Pan, Taoyong Cui, Hongyu Deng, Boris Dudder, Jianzhang Pan, Qun Fang, Pheng Ann Heng
arxi…

@arXiv_csDC_bot@mastoxiv.page
2025-09-29 07:58:37

Code once, Run Green: Automated Green Code Translation in Serverless Computing
Sebastian Werner, Mathis K\"ahler, Alireza Hakamian
arxiv.org/abs/2509.22068

@arXiv_csIT_bot@mastoxiv.page
2025-10-03 08:29:01

On Algebraic Approaches for DNA Codes with Multiple Constraints
Krishna Gopal Benerjee, Manish K Gupta
arxiv.org/abs/2510.01750 arxiv.org/p…

@arXiv_csAI_bot@mastoxiv.page
2025-08-27 10:10:23

VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation
David Egea, Barproda Halder, Sanghamitra Dutta
arxiv.org/abs/2508.18933

@arXiv_csRO_bot@mastoxiv.page
2025-09-30 13:13:21

From Code to Action: Hierarchical Learning of Diffusion-VLM Policies
Markus Peschl, Pietro Mazzaglia, Daniel Dijkman
arxiv.org/abs/2509.24917

@arXiv_astrophCO_bot@mastoxiv.page
2025-08-04 09:11:00

Can cosmic rotation resolve the Hubble tension? Constraints from CMB and large-scale structure
Micol Benetti, David A. Cook, Saulo Carneiro
arxiv.org/abs/2508.00759

@arXiv_physicsfludyn_bot@mastoxiv.page
2025-09-03 10:02:33

Data-driven modeling for flow reconstruction from sparse temperature measurements
Xicheng Wang, YiMeng Chan, KinWing Wong, Dmitry Grishchenko, Pavel Kudinov
arxiv.org/abs/2509.01189

@mcdanlj@social.makerforums.info
2025-08-15 15:16:06

I recently discovered gojq at work. I was looking to see if there was a pure Golang implementation of jq as a library to use from within Go code. Then I compared performance to native jq to make sure that it wouldn't be too much slower than the original when embedded in Golang.
It ran my complex jq script in ⇔ the time over the same data.
That was a nice free speedup!
Now my ~/bin/jq is a symlink to

@arXiv_quantph_bot@mastoxiv.page
2025-08-29 10:14:21

Quantum Verifiable Rewards for Post-Training Qiskit Code Assistant
Nicolas Dupuis, Adarsh Tiwari, Youssef Mroueh, David Kremer, Ismael Faro, Juan Cruz-Benito
arxiv.org/abs/2508.20907

@arXiv_csCR_bot@mastoxiv.page
2025-10-02 08:28:51

SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence
Ehsan Aghaei, Sarthak Jain, Prashanth Arun, Arjun Sambamoorthy
arxiv.org/abs/2510.00240

@arXiv_csHC_bot@mastoxiv.page
2025-08-01 07:40:50

Evaluating LLMs for Visualization Generation and Understanding
Saadiq Rauf Khan, Vinit Chandak, Sougata Mukherjea
arxiv.org/abs/2507.22890

@arXiv_csSE_bot@mastoxiv.page
2025-08-04 07:57:41

GPT-4.1 Sets the Standard in Automated Experiment Design Using Novel Python Libraries
Nuno Fachada, Daniel Fernandes, Carlos M. Fernandes, Bruno D. Ferreira-Saraiva, Jo\~ao P. Matos-Carvalho
arxiv.org/abs/2508.00033

@toxi@mastodon.thi.ng
2025-07-17 15:15:51

Added a customizable 2D vector field plot function for #ThingUmbrella

20x20 vector field visualization with vectors visualized as small arrows and directions mapped to different hues. Vector lengths are normalized
20x20 vector field visualization with vectors visualized as small arrows and directions mapped to different hues. Vector lengths are varying.
20x20 vector field visualization with vectors visualized as small dials
20x20 vector field visualization with vectors visualized as small black lines with red arrow heads.
@arXiv_csNI_bot@mastoxiv.page
2025-08-25 09:18:20

Self-Healing Network of Interconnected Edge Devices Empowered by Infrastructure-as-Code and LoRa Communication
Rob Carson, Mohamed Chahine Ghanem, Feriel Bouakkaz
arxiv.org/abs/2508.16268

@netzschleuder@social.skewed.de
2025-08-28 05:00:04

board_directors: Norwegian Boards of Directors (2002-2011)
224 networks of the affiliations among board directors due to sitting on common boards of Norwegian public limited companies (as of 5 August 2009), from May 2002 onward, in monthly snapshots through August 2011. Some metadata is included, such as director and company names, city and postal code for companies, and gender for directors. The 'net2m' data are bipartite company-director networks, while the 'net1m' ar…

board_directors: Norwegian Boards of Directors (2002-2011). 1386 nodes, 1303 edges. https://networks.skewed.de/net/board_directors#net2m_2004-01-01
@arXiv_csDC_bot@mastoxiv.page
2025-07-31 08:43:41

DSPE: Profit Maximization in Edge-Cloud Storage System using Dynamic Space Partitioning with Erasure Code
Shubhradeep Roy, Suvarthi Sarkar, Vivek Verma, Aryabartta Sahu
arxiv.org/abs/2507.22801

@arXiv_csCL_bot@mastoxiv.page
2025-07-21 09:48:50

Optimizing ASR for Catalan-Spanish Code-Switching: A Comparative Analysis of Methodologies
Carlos Mena, Pol Serra, Jacobo Romero, Abir Messaoudi, Jose Giraldo, Carme Armentano-Oller, Rodolfo Zevallos, Ivan Meza, Javier Hernando
arxiv.org/abs/2507.13875

@tiotasram@kolektiva.social
2025-07-25 10:57:58

Just saw this:
#AI can mean a lot of things these days, but lots of the popular meanings imply a bevy of harms that I definitely wouldn't feel are worth a cute fish game. In fact, these harms are so acute that even "just" playing into the AI hype becomes its own kind of harm (it's similar to blockchain in that way).
@… noticed that the authors claim the code base is 80% AI generated, which is a red flag because people with sound moral compasses wouldn't be using AI to "help" write code in the first place. The authors aren't by some miracle people who couldn't build this app without help, in case that influences your thinking about it: they have the skills to write the code themselves, although it likely would have taken longer (but also been better).
I was more interested in the fish-classification AI, and how much it might be dependent on datacenters. Thankfully, a quick glance at the code confirms they're using ONNX and running a self-trained neural network on your device. While the exponentially-increasing energy & water demands of datacenters to support billion-parameter models are a real concern, this is not that. Even a non-AI game can burn a lot of cycles on someone's phone, and I don't think there's anything to complain about energy-wise if we're just using cycles on the end user's device as long as we're not having them keep it on for hours crunching numbers like blockchain stuff does. Running whatever stuff locally while the user is playing a game is a negligible environmental concern, unlike, say, calling out to ChatGPT where you're directly feeding datacenter demand. Since they claimed to have trained the network themselves, and since it's actually totally reasonable to make your own dataset for this and get good-enough-for-a-silly-game results with just a few hundred examples, I don't have any ethical objections to the data sourcing or training processes either. Hooray! This is finally an example of "ethical use of neutral networks" that I can hold up as an example of what people should be doing instead of the BS they are doing.
But wait... Remember what I said about feeding the AI hype being its own form of harm? Yeah, between using AI tools for coding and calling their classifier "AI" in a way that makes it seem like the same kind of thing as ChatGPT et al., they're leaning into the hype rather than helping restrain it. And that means they're causing harm. Big AI companies can point to them and say "look AI enables cute things you like" when AI didn't actually enable it. So I'm feeling meh about this cute game and won't be sharing it aside from this post. If you love the cute fish, you don't really have to feel bad for playing with it, but I'd feel bad for advertising it without a disclaimer.

@arXiv_mathAG_bot@mastoxiv.page
2025-09-23 09:59:20

Infinite Euclidean Distance Discriminants
Felix Rydell, Emil Horobet
arxiv.org/abs/2509.17456 arxiv.org/pdf/2509.17456

@arXiv_astrophEP_bot@mastoxiv.page
2025-09-25 08:38:12

A High-Precision, Differentiable Code for Solar System Ephemerides
Ben Cassese, Malena Rice, Tiger Lu
arxiv.org/abs/2509.19549 arxiv.org/pd…

@arXiv_csSE_bot@mastoxiv.page
2025-08-04 09:09:00

Benchmarking LLMs for Unit Test Generation from Real-World Functions
Dong Huang, Jie M. Zhang, Mark Harman, Qianru Zhang, Mingzhe Du, See-Kiong Ng
arxiv.org/abs/2508.00408

@arXiv_csLG_bot@mastoxiv.page
2025-07-24 10:13:39

EarthLink: Interpreting Climate Signals with Self-Evolving AI Agents
Zijie Guo, Jiong Wang, Xiaoyu Yue, Wangxu Wei, Zhe Jiang, Wanghan Xu, Ben Fei, Wenlong Zhang, Xinyu Gu, Lijing Cheng, Jing-Jia Luo, Chao Li, Yaqiang Wang, Tao Chen, Wanli Ouyang, Fenghua Ling, Lei Bai
arxiv.org/abs/2507.17311

@arXiv_csAI_bot@mastoxiv.page
2025-07-16 10:09:51

Modeling Code: Is Text All You Need?
Daniel Nichols, Konstantinos Parasyris, Harshitha Menon, Brian R. Bartoldson, Giorgis Georgakoudis, Tal Ben-Nun, Abhinav Bhatele
arxiv.org/abs/2507.11467

@arXiv_csCR_bot@mastoxiv.page
2025-09-30 11:06:51

MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction
Sepideh Abedini (University of Waterloo, Vector Institute), Shubhankar Mohapatra (University of Waterloo), D. B. Emerson (Vector Institute), Masoumeh Shafieinejad (Vector Institute), Jesse C. Cresswell (Layer 6 AI), Xi He (University of Waterloo, Vector Institute)

@arXiv_csAR_bot@mastoxiv.page
2025-09-11 08:35:03

AutoVeriFix: Automatically Correcting Errors and Enhancing Functional Correctness in LLM-Generated Verilog Code
Yan Tan, Xiangchen Meng, Zijun Jiang, Yangdi Lyu
arxiv.org/abs/2509.08416

@rasterweb@mastodon.social
2025-09-12 03:30:58

I did not make any good progress on my project to read GPX data with Python today. (Trying to read speed.) May need to just read it as XML.
There are a bunch of nerds who do Python and bikes though, so someone might have the code example I need.
#biking #bikeTooter

@Techmeme@techhub.social
2025-07-14 03:55:46

A look at WindBorne, which uses weather balloons and AI to improve forecasting, as potential budget cuts to NOAA threaten its access to public weather data (Tim Fernholz/New York Times)

@netzschleuder@social.skewed.de
2025-09-24 05:00:04

board_directors: Norwegian Boards of Directors (2002-2011)
224 networks of the affiliations among board directors due to sitting on common boards of Norwegian public limited companies (as of 5 August 2009), from May 2002 onward, in monthly snapshots through August 2011. Some metadata is included, such as director and company names, city and postal code for companies, and gender for directors. The 'net2m' data are bipartite company-director networks, while the 'net1m' ar…

board_directors: Norwegian Boards of Directors (2002-2011). 1265 nodes, 1178 edges. https://networks.skewed.de/net/board_directors#net2m_2002-10-01
@arXiv_csNE_bot@mastoxiv.page
2025-07-22 07:59:40

DHEvo: Data-Algorithm Based Heuristic Evolution for Generalizable MILP Solving
Zhihao Zhang, Siyuan Li, Chenxi Li, Feifan Liu, Mengjing Chen, Kai Li, Tao Zhong, Bo An, Peng Liu
arxiv.org/abs/2507.15615

OSMnx is a Python package to retrieve, model, analyze, and visualize street networks from OpenStreetMap.
Users can download and model walkable, drivable, or bikeable urban networks with a single line of Python code
-- and then easily analyze and visualize them.
You can just as easily download and work with amenities/points of interest, building footprints, elevation data, street bearings/orientations, and network routing.
If you use OSMnx in your work, please downlo…

@tomkalei@machteburch.social
2025-09-26 11:01:30

It is good IT Security practice to separate data from code. For example, an SQL Injection attack is to get the target to treat data (entered into a form) as code, that is break that barrier.
Now take an "AI agent" doing a task for you like: "Download that podcast that Frank emailed me about". It will read untrusted data (e-mails in my inbox), access sensitive data (e-mails from my friends) get more stuff from the web (the podcast episode), etc.
And all that in an environment where there 1/2

@arXiv_csCL_bot@mastoxiv.page
2025-08-20 08:13:59

Datarus-R1: An Adaptive Multi-Step Reasoning LLM for Automated Data Analysis
Ayoub Ben Chaliah, Hela Dellagi
arxiv.org/abs/2508.13382 arxiv…

@arXiv_csAI_bot@mastoxiv.page
2025-09-10 10:13:01

SCoder: Iterative Self-Distillation for Bootstrapping Small-Scale Data Synthesizers to Empower Code LLMs
Xinyu Zhang, Changzhi Zhou, Linmei Hu, Luhao Zhang, Xiancai Chen, Haomin Fu, Yang Yang, Mengdi Zhang
arxiv.org/abs/2509.07858

@arXiv_physicsplasmph_bot@mastoxiv.page
2025-07-23 08:43:02

Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models
Aaron Ho (MIT Plasma Science and Fusion Center, Cambridge, USA), Lorenzo Zanisi (UKAEA Culham Centre for Fusion Energy, Abingdon, UK), Bram de Leeuw (Radboud University, Nijmegen, Netherlands), Vincent Galvan (MIT Plasma Science and Fusion Center, Cambridge, USA), Pablo Rodriguez-Fernandez (MIT Plasma Science and Fusion Center, Cambridge, USA), Nath…

@arXiv_csCR_bot@mastoxiv.page
2025-09-24 09:41:44

Obelix: Mitigating Side-Channels Through Dynamic Obfuscation
Jan Wichelmann, Anja Rabich, Anna P"atschke, Thomas Eisenbarth
arxiv.org/abs/2509.18909

@arXiv_astrophHE_bot@mastoxiv.page
2025-09-10 08:35:21

$\texttt{Jipole}$: A Differentiable $\texttt{ipole}$-based Code for Radiative Transfer in Curved Spacetimes
Pedro Naethe Motta, Ben S. Prather, Alejandro C\'ardenas-Avenda\~no
arxiv.org/abs/2509.07065

@arXiv_csSE_bot@mastoxiv.page
2025-08-28 09:10:31

Functional Consistency of LLM Code Embeddings: A Self-Evolving Data Synthesis Framework for Benchmarking
Zhuohao Li, Wenqing Chen, Jianxing Yu, Zhichao Lu
arxiv.org/abs/2508.19558

@netzschleuder@social.skewed.de
2025-09-21 04:00:04

board_directors: Norwegian Boards of Directors (2002-2011)
224 networks of the affiliations among board directors due to sitting on common boards of Norwegian public limited companies (as of 5 August 2009), from May 2002 onward, in monthly snapshots through August 2011. Some metadata is included, such as director and company names, city and postal code for companies, and gender for directors. The 'net2m' data are bipartite company-director networks, while the 'net1m' ar…

board_directors: Norwegian Boards of Directors (2002-2011). 1796 nodes, 1726 edges. https://networks.skewed.de/net/board_directors#net2m_2006-11-01
@arXiv_csSE_bot@mastoxiv.page
2025-07-31 09:58:21

Tracking research software outputs in the UK
Domhnall Carlin, Austen Rainer
arxiv.org/abs/2507.22871 arxiv.org/pdf/2507.22871

@arXiv_csRO_bot@mastoxiv.page
2025-09-19 09:14:31

Event-LAB: Towards Standardized Evaluation of Neuromorphic Localization Methods
Adam D. Hines, Alejandro Fontan, Michael Milford, Tobias Fischer
arxiv.org/abs/2509.14516

@shoppingtonz@mastodon.social
2025-07-21 06:40:18

I want the Micro Processor to order my Pulsar(Level 2 special operations unit) to mine coal and deposit it into the core...
I'll let you know about my progress...
So "mlog" is the "Mindustry Logic" 'language'.
mindustrygame.github.io/wiki/logic/0-introduction/
but I first started here:

@arXiv_csGR_bot@mastoxiv.page
2025-08-12 09:21:23

LL3M: Large Language 3D Modelers
Sining Lu, Guan Chen, Nam Anh Dinh, Itai Lang, Ari Holtzman, Rana Hanocka
arxiv.org/abs/2508.08228 arxiv.o…

@arXiv_csLG_bot@mastoxiv.page
2025-07-11 10:23:11

Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Sukjun Hwang, Brandon Wang, Albert Gu
arxiv.org/abs/2507.07955 arxiv.org/pdf/2507.07955 arxiv.org/html/2507.07955
arXiv:2507.07955v1 Announce Type: new
Abstract: Despite incredible progress in language models (LMs) in recent years, largely resulting from moving away from specialized models designed for specific tasks to general models based on powerful architectures (e.g. the Transformer) that learn everything from raw data, pre-processing steps such as tokenization remain a barrier to true end-to-end foundation models. We introduce a collection of new techniques that enable a dynamic chunking mechanism which automatically learns content -- and context -- dependent segmentation strategies learned jointly with the rest of the model. Incorporating this into an explicit hierarchical network (H-Net) allows replacing the (implicitly hierarchical) tokenization-LM-detokenization pipeline with a single model learned fully end-to-end. When compute- and data- matched, an H-Net with one stage of hierarchy operating at the byte level outperforms a strong Transformer language model operating over BPE tokens. Iterating the hierarchy to multiple stages further increases its performance by modeling multiple levels of abstraction, demonstrating significantly better scaling with data and matching a token-based Transformer of twice its size. H-Nets pretrained on English show significantly increased character-level robustness, and qualitatively learn meaningful data-dependent chunking strategies without any heuristics or explicit supervision. Finally, the H-Net's improvement over tokenized pipelines is further increased in languages and modalities with weaker tokenization heuristics, such as Chinese and code, or DNA sequences (nearly 4x improvement in data efficiency over baselines), showing the potential of true end-to-end models that learn and scale better from unprocessed data.
toXiv_bot_toot

@arXiv_astrophCO_bot@mastoxiv.page
2025-09-24 09:25:14

The effects on structure of a momentum coupling between dark matter and quintessence
G. N. Candlish, Y. Jaff\'e
arxiv.org/abs/2509.19164

@arXiv_csCR_bot@mastoxiv.page
2025-09-10 10:00:41

ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation
Kai Ye, Liangcai Su, Chenxiong Qian
arxiv.org/abs/2509.07941

@arXiv_csHC_bot@mastoxiv.page
2025-09-17 10:26:10

Towards an Embodied Composition Framework for Organizing Immersive Computational Notebooks
Sungwon In, Eric Krokos, Kirsten Whitley, Chris North, Yalong Yang
arxiv.org/abs/2509.13291

@arXiv_csSE_bot@mastoxiv.page
2025-07-16 09:02:01

A Code Comprehension Benchmark for Large Language Models for Code
Jayant Havare, Saurav Chaudhary, Ganesh Ramakrishnan, Kaushik Maharajan, Srikanth Tamilselvam
arxiv.org/abs/2507.10641

@arXiv_csDC_bot@mastoxiv.page
2025-09-23 09:25:20

Asteria: Semantic-Aware Cross-Region Caching for Agentic LLM Tool Access
Chaoyi Ruan, Chao Bi, Kaiwen Zheng, Ziji Shi, Xinyi Wan, Jialin Li
arxiv.org/abs/2509.17360

@netzschleuder@social.skewed.de
2025-08-14 06:00:04

board_directors: Norwegian Boards of Directors (2002-2011)
224 networks of the affiliations among board directors due to sitting on common boards of Norwegian public limited companies (as of 5 August 2009), from May 2002 onward, in monthly snapshots through August 2011. Some metadata is included, such as director and company names, city and postal code for companies, and gender for directors. The 'net2m' data are bipartite company-director networks, while the 'net1m' ar…

board_directors: Norwegian Boards of Directors (2002-2011). 1908 nodes, 1892 edges. https://networks.skewed.de/net/board_directors#net2m_2008-04-01
@arXiv_csSE_bot@mastoxiv.page
2025-09-25 09:08:02

Beyond Language Barriers: Multi-Agent Coordination for Multi-Language Code Generation
Micheline B\'en\'edicte Moumoula, Serge Lionel Nikiema, Alb\'erick Euraste Djire, Abdoul Kader Kabore, Jacques Klein, Tegawend\'e F. Bissyande
arxiv.org/abs/2509.19918

@arXiv_csCR_bot@mastoxiv.page
2025-09-26 07:41:31

Can You Trust Your Copilot? A Privacy Scorecard for AI Coding Assistants
Amir AL-Maamari
arxiv.org/abs/2509.20388 arxiv.org/pdf/2509.20388

@arXiv_csCL_bot@mastoxiv.page
2025-07-17 09:57:40

Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation
Ziyu Ge, Gabriel Chua, Leanne Tan, Roy Ka-Wei Lee
arxiv.org/abs/2507.11966

@arXiv_csSE_bot@mastoxiv.page
2025-07-22 11:42:00

Applying the Chinese Wall Reverse Engineering Technique to Large Language Model Code Editing
Manatsawin Hanmongkolchai
arxiv.org/abs/2507.15599

@netzschleuder@social.skewed.de
2025-09-09 16:00:04

board_directors: Norwegian Boards of Directors (2002-2011)
224 networks of the affiliations among board directors due to sitting on common boards of Norwegian public limited companies (as of 5 August 2009), from May 2002 onward, in monthly snapshots through August 2011. Some metadata is included, such as director and company names, city and postal code for companies, and gender for directors. The 'net2m' data are bipartite company-director networks, while the 'net1m' ar…

board_directors: Norwegian Boards of Directors (2002-2011). 1681 nodes, 1585 edges. https://networks.skewed.de/net/board_directors#net2m_2006-05-01
@arXiv_csSE_bot@mastoxiv.page
2025-07-17 07:59:00

MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Atharva Naik, Lawanya Baghel, Dhakshin Govindarajan, Darsh Agrawal, Daniel Fried, Carolyn Rose
arxiv.org/abs/2507.11687

@arXiv_csLG_bot@mastoxiv.page
2025-07-09 10:25:02

KnowIt: Deep Time Series Modeling and Interpretation
M. W. Theunissen, R. Rabe, M. H. Davel
arxiv.org/abs/2507.06009

@arXiv_csSE_bot@mastoxiv.page
2025-07-24 09:18:50

How Do Code Smells Affect Skill Growth in Scratch Novice Programmers?
Ricardo Hidalgo Arag\'on, Jes\'us M. Gonz\'alez-Barahona, Gregorio Robles
arxiv.org/abs/2507.17314

@netzschleuder@social.skewed.de
2025-07-09 06:00:03

board_directors: Norwegian Boards of Directors (2002-2011)
224 networks of the affiliations among board directors due to sitting on common boards of Norwegian public limited companies (as of 5 August 2009), from May 2002 onward, in monthly snapshots through August 2011. Some metadata is included, such as director and company names, city and postal code for companies, and gender for directors. The 'net2m' data are bipartite company-director networks, while the 'net1m' ar…

board_directors: Norwegian Boards of Directors (2002-2011). 1544 nodes, 4429 edges. https://networks.skewed.de/net/board_directors#net1m_2007-06-01
@arXiv_csSE_bot@mastoxiv.page
2025-07-29 11:11:01

Repairing vulnerabilities without invisible hands. A differentiated replication study on LLMs
Maria Camporese, Fabio Massacci
arxiv.org/abs/2507.20977

@arXiv_csSE_bot@mastoxiv.page
2025-09-18 09:37:31

Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning
Zhaoyang Chu, Yao Wan, Zhikun Zhang, Di Wang, Zhou Yang, Hongyu Zhang, Pan Zhou, Xuanhua Shi, Hai Jin, David Lo
arxiv.org/abs/2509.13755

@arXiv_csCR_bot@mastoxiv.page
2025-07-08 10:34:01

Securing Mixed Rust with Hardware Capabilities
Jason Zhijingcheng Yu, Fangqi Han, Kaustab Choudhury, Trevor E. Carlson, Prateek Saxena
arxiv.org/abs/2507.03344

@arXiv_csSE_bot@mastoxiv.page
2025-07-16 08:18:31

$\texttt{Droid}$: A Resource Suite for AI-Generated Code Detection
Daniil Orel, Indraneil Paul, Iryna Gurevych, Preslav Nakov
arxiv.org/abs/2507.10583

@arXiv_csSE_bot@mastoxiv.page
2025-08-19 10:03:50

Strengthening Programming Comprehension in Large Language Models through Code Generation
Xiaoning Ren, Qiang Hu, Wei Ma, Yan Li, Yao Zhang, Lingxiao Jiang, Yinxing Xue
arxiv.org/abs/2508.12620

@arXiv_csSE_bot@mastoxiv.page
2025-09-19 09:35:11

SALT4Decompile: Inferring Source-level Abstract Logic Tree for LLM-Based Binary Decompilation
Yongpan Wang, Xin Xu, Xiaojie Zhu, Xiaodong Gu, Beijun Shen
arxiv.org/abs/2509.14646

@arXiv_csSE_bot@mastoxiv.page
2025-07-22 11:51:40

Observing Fine-Grained Changes in Jupyter Notebooks During Development Time
Sergey Titov, Konstantin Grotov, Cristina Sarasua, Yaroslav Golubev, Dhivyabharathi Ramasamy, Alberto Bacchelli, Abraham Bernstein, Timofey Bryksin
arxiv.org/abs/2507.15831

@arXiv_csSE_bot@mastoxiv.page
2025-09-18 09:14:21

A Regression Testing Framework with Automated Assertion Generation for Machine Learning Notebooks
Yingao Elaine Yao, Vedant Nimje, Varun Viswanath, Saikat Dutta
arxiv.org/abs/2509.13656

@arXiv_csSE_bot@mastoxiv.page
2025-08-11 09:19:19

Improving the Developer Experience with a Low-Code Process Modelling Language
Henrique Henriques, Hugo Louren\c{c}o, Vasco Amaral, Miguel Goul\~ao
arxiv.org/abs/2508.06299

@arXiv_csSE_bot@mastoxiv.page
2025-09-15 09:11:11

Targeted Test Selection Approach in Continuous Integration
Pavel Plyusnin, Aleksey Antonov, Vasilii Ermakov, Aleksandr Khaybriev, Margarita Kikot, Ilseyar Alimova, Stanislav Moiseev
arxiv.org/abs/2509.10279

@arXiv_csSE_bot@mastoxiv.page
2025-08-12 10:02:53

Dynamic Benchmark Construction for Evaluating Large Language Models on Real-World Codes
Zhe Zhang, Runlin Liu, Aishan Liu, Xingyu Liu, Xiang Gao, Hailong Sun
arxiv.org/abs/2508.07180

@arXiv_csSE_bot@mastoxiv.page
2025-09-10 07:49:31

Aspect-Oriented Programming in Secure Software Development: A Case Study of Security Aspects in Web Applications
Mterorga Ukor
arxiv.org/abs/2509.07449