Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csLG_bot@mastoxiv.page
2025-07-11 10:23:01

Low Resource Reconstruction Attacks Through Benign Prompts
Sol Yarkoni, Roi Livni
arxiv.org/abs/2507.07947 arxiv.org/pdf/2507.07947 arxiv.org/html/2507.07947
arXiv:2507.07947v1 Announce Type: new
Abstract: The recent advances in generative models such as diffusion models have raised several risks and concerns related to privacy, copyright infringements and data stewardship. To better understand and control the risks, various researchers have created techniques, experiments and attacks that reconstruct images, or part of images, from the training set. While these techniques already establish that data from the training set can be reconstructed, they often rely on high-resources, excess to the training set as well as well-engineered and designed prompts.
In this work, we devise a new attack that requires low resources, assumes little to no access to the actual training set, and identifies, seemingly, benign prompts that lead to potentially-risky image reconstruction. This highlights the risk that images might even be reconstructed by an uninformed user and unintentionally. For example, we identified that, with regard to one existing model, the prompt ``blue Unisex T-Shirt'' can generate the face of a real-life human model. Our method builds on an intuition from previous works which leverages domain knowledge and identifies a fundamental vulnerability that stems from the use of scraped data from e-commerce platforms, where templated layouts and images are tied to pattern-like prompts.
toXiv_bot_toot

@cowboys@darktundra.xyz
2025-06-11 22:44:40

Double Jointed: Cowboys to host Rams at practice prior to rolling to SoFi for exhibition cowboyswire.usatoday.com/story

@raiders@darktundra.xyz
2025-07-08 11:52:27

NFL Coaching Mindset From Minicamp to Training Camp si.com/nfl/raiders/las-vegas-t

@arXiv_csAI_bot@mastoxiv.page
2025-08-11 09:36:29

Symmetry breaking for inductive logic programming
Andrew Cropper, David M. Cerna, Matti J\"arvisalo
arxiv.org/abs/2508.06263 arxiv.org…

@arXiv_csCV_bot@mastoxiv.page
2025-07-10 09:02:11

Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection
Yupeng Hu, Changxing Ding, Chang Sun, Shaoli Huang, Xiangmin Xu
arxiv.org/abs/2507.06510

@cowboys@darktundra.xyz
2025-06-11 22:36:02

Double Jointed: Cowboys to host Rams at practice prior to rolling to SoFi for exhibition cowboyswire.usatoday.com/story

@arXiv_astrophSR_bot@mastoxiv.page
2025-06-10 17:49:10

This arxiv.org/abs/2506.02763 has been replaced.
initial toot: mastoxiv.page/@arXiv_…

@raiders@darktundra.xyz
2025-07-10 21:03:20

Chemistry on the Defensive Line Sets Up the Raiders Well si.com/nfl/raiders/las-vegas-m

@arXiv_csLG_bot@mastoxiv.page
2025-06-09 10:08:22

What Really is a Member? Discrediting Membership Inference via Poisoning
Neal Mangaokar, Ashish Hooda, Zhuohang Li, Bradley A. Malin, Kassem Fawaz, Somesh Jha, Atul Prakash, Amrita Roy Chowdhury
arxiv.org/abs/2506.06003

@arXiv_csCE_bot@mastoxiv.page
2025-07-09 08:10:02

Bridging Sequential Deep Operator Network and Video Diffusion: Residual Refinement of Spatio-Temporal PDE Solutions
Jaewan Park, Farid Ahmed, Kazuma Kobayashi, Seid Koric, Syed Bahauddin Alam, Iwona Jasiuk, Diab Abueidda
arxiv.org/abs/2507.06133

@NFL@darktundra.xyz
2025-07-26 21:16:35

Titans' Treylon Burks fractures collarbone in training camp, set to miss at least season opener, per report

cbssports.com/nfl/news/titans-…

@arXiv_condmatstatmech_bot@mastoxiv.page
2025-07-08 11:22:50

Pseudo-likelihood produces associative memories able to generalize, even for asymmetric couplings
Francesco D'Amico, Dario Bocchi, Luca Maria Del Bono, Saverio Rossi, Matteo Negri
arxiv.org/abs/2507.05147

@arXiv_csSE_bot@mastoxiv.page
2025-08-04 08:24:41

How Quantization Impacts Privacy Risk on LLMs for Code?
Md Nazmul Haque, Hua Yang, Zhou Yang, Bowen Xu
arxiv.org/abs/2508.00128 arxiv.org/p…

@arXiv_csLG_bot@mastoxiv.page
2025-07-09 14:33:49

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[2/6]:
- Training Set Reconstruction from Differentially Private Forests: How Effective is DP?
Alice Gorg\'e, Julien Ferry, S\'ebastien Gambs, Thibaut Vidal

@bencurthoys@mastodon.social
2025-08-02 22:14:36

Is there any way of training a smart speaker to interpret a child's aimless and tuneless humming as "Please play Skinny Puppy and set volume to 100%"?
Enquiring minds want to know.

@arXiv_csRO_bot@mastoxiv.page
2025-08-01 08:28:21

Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks
Vira Joshi, Zifan Xu, Bo Liu, Peter Stone, Amy Zhang
arxiv.org/abs/2507.23172

@tiotasram@kolektiva.social
2025-08-04 15:49:00

Should we teach vibe coding? Here's why not.
Should AI coding be taught in undergrad CS education?
1/2
I teach undergraduate computer science labs, including for intro and more-advanced core courses. I don't publish (non-negligible) scholarly work in the area, but I've got years of craft expertise in course design, and I do follow the academic literature to some degree. In other words, In not the world's leading expert, but I have spent a lot of time thinking about course design, and consider myself competent at it, with plenty of direct experience in what knowledge & skills I can expect from students as they move through the curriculum.
I'm also strongly against most uses of what's called "AI" these days (specifically, generative deep neutral networks as supplied by our current cadre of techbro). There are a surprising number of completely orthogonal reasons to oppose the use of these systems, and a very limited number of reasonable exceptions (overcoming accessibility barriers is an example). On the grounds of environmental and digital-commons-pollution costs alone, using specifically the largest/newest models is unethical in most cases.
But as any good teacher should, I constantly question these evaluations, because I worry about the impact on my students should I eschew teaching relevant tech for bad reasons (and even for his reasons). I also want to make my reasoning clear to students, who should absolutely question me on this. That inspired me to ask a simple question: ignoring for one moment the ethical objections (which we shouldn't, of course; they're very stark), at what level in the CS major could I expect to teach a course about programming with AI assistance, and expect students to succeed at a more technically demanding final project than a course at the same level where students were banned from using AI? In other words, at what level would I expect students to actually benefit from AI coding "assistance?"
To be clear, I'm assuming that students aren't using AI in other aspects of coursework: the topic of using AI to "help you study" is a separate one (TL;DR it's gross value is not negative, but it's mostly not worth the harm to your metacognitive abilities, which AI-induced changes to the digital commons are making more important than ever).
So what's my answer to this question?
If I'm being incredibly optimistic, senior year. Slightly less optimistic, second year of a masters program. Realistic? Maybe never.
The interesting bit for you-the-reader is: why is this my answer? (Especially given that students would probably self-report significant gains at lower levels.) To start with, [this paper where experienced developers thought that AI assistance sped up their work on real tasks when in fact it slowed it down] (arxiv.org/abs/2507.09089) is informative. There are a lot of differences in task between experienced devs solving real bugs and students working on a class project, but it's important to understand that we shouldn't have a baseline expectation that AI coding "assistants" will speed things up in the best of circumstances, and we shouldn't trust self-reports of productivity (or the AI hype machine in general).
Now we might imagine that coding assistants will be better at helping with a student project than at helping with fixing bugs in open-source software, since it's a much easier task. For many programming assignments that have a fixed answer, we know that many AI assistants can just spit out a solution based on prompting them with the problem description (there's another elephant in the room here to do with learning outcomes regardless of project success, but we'll ignore this over too, my focus here is on project complexity reach, not learning outcomes). My question is about more open-ended projects, not assignments with an expected answer. Here's a second study (by one of my colleagues) about novices using AI assistance for programming tasks. It showcases how difficult it is to use AI tools well, and some of these stumbling blocks that novices in particular face.
But what about intermediate students? Might there be some level where the AI is helpful because the task is still relatively simple and the students are good enough to handle it? The problem with this is that as task complexity increases, so does the likelihood of the AI generating (or copying) code that uses more complex constructs which a student doesn't understand. Let's say I have second year students writing interactive websites with JavaScript. Without a lot of care that those students don't know how to deploy, the AI is likely to suggest code that depends on several different frameworks, from React to JQuery, without actually setting up or including those frameworks, and of course three students would be way out of their depth trying to do that. This is a general problem: each programming class carefully limits the specific code frameworks and constructs it expects students to know based on the material it covers. There is no feasible way to limit an AI assistant to a fixed set of constructs or frameworks, using current designs. There are alternate designs where this would be possible (like AI search through adaptation from a controlled library of snippets) but those would be entirely different tools.
So what happens on a sizeable class project where the AI has dropped in buggy code, especially if it uses code constructs the students don't understand? Best case, they understand that they don't understand and re-prompt, or ask for help from an instructor or TA quickly who helps them get rid of the stuff they don't understand and re-prompt or manually add stuff they do. Average case: they waste several hours and/or sweep the bugs partly under the rug, resulting in a project with significant defects. Students in their second and even third years of a CS major still have a lot to learn about debugging, and usually have significant gaps in their knowledge of even their most comfortable programming language. I do think regardless of AI we as teachers need to get better at teaching debugging skills, but the knowledge gaps are inevitable because there's just too much to know. In Python, for example, the LLM is going to spit out yields, async functions, try/finally, maybe even something like a while/else, or with recent training data, the walrus operator. I can't expect even a fraction of 3rd year students who have worked with Python since their first year to know about all these things, and based on how students approach projects where they have studied all the relevant constructs but have forgotten some, I'm not optimistic seeing these things will magically become learning opportunities. Student projects are better off working with a limited subset of full programming languages that the students have actually learned, and using AI coding assistants as currently designed makes this impossible. Beyond that, even when the "assistant" just introduces bugs using syntax the students understand, even through their 4th year many students struggle to understand the operation of moderately complex code they've written themselves, let alone written by someone else. Having access to an AI that will confidently offer incorrect explanations for bugs will make this worse.
To be sure a small minority of students will be able to overcome these problems, but that minority is the group that has a good grasp of the fundamentals and has broadened their knowledge through self-study, which earlier AI-reliant classes would make less likely to happen. In any case, I care about the average student, since we already have plenty of stuff about our institutions that makes life easier for a favored few while being worse for the average student (note that our construction of that favored few as the "good" students is a large part of this problem).
To summarize: because AI assistants introduce excess code complexity and difficult-to-debug bugs, they'll slow down rather than speed up project progress for the average student on moderately complex projects. On a fixed deadline, they'll result in worse projects, or necessitate less ambitious project scoping to ensure adequate completion, and I expect this remains broadly true through 4-6 years of study in most programs (don't take this as an endorsement of AI "assistants" for masters students; we've ignored a lot of other problems along the way).
There's a related problem: solving open-ended project assignments well ultimately depends on deeply understanding the problem, and AI "assistants" allow students to put a lot of code in their file without spending much time thinking about the problem or building an understanding of it. This is awful for learning outcomes, but also bad for project success. Getting students to see the value of thinking deeply about a problem is a thorny pedagogical puzzle at the best of times, and allowing the use of AI "assistants" makes the problem much much worse. This is another area I hope to see (or even drive) pedagogical improvement in, for what it's worth.
1/2

@arXiv_qbioQM_bot@mastoxiv.page
2025-06-04 13:54:56

This arxiv.org/abs/2506.00593 has been replaced.
initial toot: mastoxiv.page/@arXiv_qbi…

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-06-06 09:54:18

This arxiv.org/abs/2411.14608 has been replaced.
initial toot: mastoxiv.page/@a…

@arXiv_csCR_bot@mastoxiv.page
2025-06-04 07:32:03

Privacy Leaks by Adversaries: Adversarial Iterations for Membership Inference Attack
Jing Xue, Zhishen Sun, Haishan Ye, Luo Luo, Xiangyu Chang, Ivor Tsang, Guang Dai
arxiv.org/abs/2506.02711

@grork@mastodon.social
2025-07-02 23:09:09

Liquid Glass will set back CUA (computer use agent) models on macOS & iOS by months to years. The UI has changed, and drastically increased in its visual complexity.
How long will it take to train in the visual differences with enough training data?
Strategic play for Apple? Or Long-term blunder?

@NFL@darktundra.xyz
2025-07-21 19:16:27

Buccaneers' Chris Godwin Jr. set to miss start of training camp as he rehabs from 2024 ankle injury

cbssports.com/nfl/news/buccane

@arXiv_csNI_bot@mastoxiv.page
2025-08-05 09:08:40

On Effectiveness of Graph Neural Network Architectures for Network Digital Twins (NDTs)
Iulisloi Zacarias, Oussama Ben Taarit, Admela Jukan
arxiv.org/abs/2508.02373

@arXiv_astrophSR_bot@mastoxiv.page
2025-08-06 08:09:50

Spectroscopic ages for 4 million main-sequence dwarf stars from LAMOST DR10 estimated with data-driven approach
Jia-Hui Wang, Maosheng Xiang, Meng Zhang, Jiwei Xie, Jian Ge, Jinghua Zhang, Lanya Mou, Jifeng Liu
arxiv.org/abs/2508.03019

@karlauerbach@sfba.social
2025-07-27 19:42:26

One of our town's former residents has died...
nytimes.com/2025/07/27/arts/mu

@arXiv_csSD_bot@mastoxiv.page
2025-05-30 07:23:39

ZeroSep: Separate Anything in Audio with Zero Training
Chao Huang, Yuesheng Ma, Junxuan Huang, Susan Liang, Yunlong Tang, Jing Bi, Wenqiang Liu, Nima Mesgarani, Chenliang Xu
arxiv.org/abs/2505.23625

@arXiv_statME_bot@mastoxiv.page
2025-06-03 16:55:00

This arxiv.org/abs/2112.13738 has been replaced.
link: scholar.google.com/scholar?q=a

@metacurity@infosec.exchange
2025-07-12 12:19:58

Each week, Metacurity offers our free and paid subscribers a weekly digest of the best long-form (and longish) infosec-related pieces we couldn't properly fit into our daily news crush.
This week's selection covers
--Satellite jamming and spoofing set back global shipping,
--AI is inherently janky,
--Salvadoran gender freedom advocates fight oppression with digital training,
--A new sector arises to fight AI false positives in Chinese universities,

@arXiv_physicsoptics_bot@mastoxiv.page
2025-06-04 07:48:29

Inverse design for robust inference in integrated computational spectrometry
Wenchao Ma, Rapha\"el Pestourie, Zin Lin, Steven G. Johnson
arxiv.org/abs/2506.02194

@arXiv_csCL_bot@mastoxiv.page
2025-06-17 09:29:43

Maximally-Informative Retrieval for State Space Model Generation
Evan Becker, Benjamin Bowman, Matthew Trager, Tian Yu Liu, Luca Zancato, Wei Xia, Stefano Soatto
arxiv.org/abs/2506.12149

@Nathan@social.lostinok.com
2025-07-27 21:33:22

A sad passing of a pillar of my sense of humor. My father gave me a copy of That Was the Year That Was when I was about 10 years old and it tickled me in a way that little has since, save maybe Monty Python. May you rest in peace with no pigeons to be found. 🐦

@AthanSpod@social.linux.pizza
2025-07-27 17:02:30

This seems to be breaking news, with the New York Times the only source I can find:
#tomlehrer

@cowboys@darktundra.xyz
2025-06-25 16:44:35

2025 Dallas Cowboys training camp schedule: Full list of official dates si.com/nfl/cowboys/news/2025-d

@arXiv_statML_bot@mastoxiv.page
2025-07-14 09:00:12

MIRRAMS: Towards Training Models Robust to Missingness Distribution Shifts
Jihye Lee, Minseo Kang, Dongha Kim
arxiv.org/abs/2507.08280

@arXiv_eessAS_bot@mastoxiv.page
2025-06-05 07:24:12

Sound Field Reconstruction Using Physics-Informed Boundary Integral Networks
Stefano Damiano, Toon van Waterschoot
arxiv.org/abs/2506.03917

@arXiv_eessIV_bot@mastoxiv.page
2025-06-24 09:38:20

CT Radiomics-Based Explainable Machine Learning Model for Accurate Differentiation of Malignant and Benign Endometrial Tumors: A Two-Center Study
Tingrui Zhang, Honglin Wu, Zekun Jiang, Yingying Wang, Rui Ye, Huiming Ni, Chang Liu, Jin Cao, Xuan Sun, Rong Shao, Xiaorong Wei, Yingchun Sun
arxiv.org/abs/2506.18106

@arXiv_mathOC_bot@mastoxiv.page
2025-06-17 12:18:33

Counterexample-Guided Synthesis of Robust Discrete-Time Control Barrier Functions
Erfan Shakhesi, Alexander Katriniok, W. P. M. H. Heemels
arxiv.org/abs/2506.13011

@arXiv_csRO_bot@mastoxiv.page
2025-06-04 14:06:25

This arxiv.org/abs/2505.07802 has been replaced.
initial toot: mastoxiv.page/@arXiv_csRO_…

@arXiv_physicschemph_bot@mastoxiv.page
2025-05-29 07:33:39

Machine Learning Interatomic Potentials: library for efficient training, model development and simulation of molecular systems
Christoph Brunken, Olivier Peltre, Heloise Chomet, Lucien Walewski, Manus McAuliffe, Valentin Heyraud, Solal Attias, Martin Maarand, Yessine Khanfir, Edan Toledo, Fabio Falcioni, Marie Bluntzer, Silvia Acosta-Guti\'errez, Jules Tilly

@arXiv_quantph_bot@mastoxiv.page
2025-06-25 10:16:00

Conservative quantum offline model-based optimization
Kristian Sotirov, Annie E. Paine, Savvas Varsamopoulos, Antonio A. Gentile, Osvaldo Simeone
arxiv.org/abs/2506.19714

@arXiv_qbioQM_bot@mastoxiv.page
2025-06-03 07:56:45

Look mom, no experimental data! Learning to score protein-ligand interactions from simulations
Michael Brocidiacono, James Wellnitz, Konstantin I. Popov, Alexander Tropsha
arxiv.org/abs/2506.00593

@raiders@darktundra.xyz
2025-07-23 01:13:42

This Veteran Looks to Set Tone at Raiders’ Training Camp si.com/nfl/raiders/news/elando

@spamless@mastodon.social
2025-06-16 10:28:32

I just hit a #calisthenics goal: 100 decline push-ups in one set. I started training this level of difficulty—30-cm elevation at the feet, (ersatz) parallettes to facilitate full range of motion—last September after reaching previous targets. Two sets of 100 are coming!

@cowboys@darktundra.xyz
2025-06-25 16:12:48

2025 Dallas Cowboys training camp schedule: Full list of official dates si.com/nfl/cowboys/news/2025-d

@arXiv_physicsedph_bot@mastoxiv.page
2025-07-01 08:01:33

Quantum Workshop for IT-Professionals
Bettina Just, J\"org Hettel, Gerhard Hellstern
arxiv.org/abs/2506.22525 ar…

@arXiv_eessSP_bot@mastoxiv.page
2025-07-28 09:22:01

Machine Learning based Radio Environment Map Estimation for Indoor Visible Light Communication
Helena Serpi (Tanya), Christina (Tanya), Politi
arxiv.org/abs/2507.19149

@arXiv_physicscompph_bot@mastoxiv.page
2025-07-01 08:32:43

Data-Driven Surrogate Modeling of DSMC Solutions Using Deep Neural Networks
Ehsan Roohi, Ahmad Shoja-sani
arxiv.org/abs/2506.22453

@arXiv_csAI_bot@mastoxiv.page
2025-07-31 09:12:31

MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines
Yaolun Zhang, Xiaogeng Liu, Chaowei Xiao
arxiv.org/abs/2507.22606

@arXiv_csCR_bot@mastoxiv.page
2025-07-01 09:51:03

A Practical and Secure Byzantine Robust Aggregator
De Zhang Lee, Aashish Kolluri, Prateek Saxena, Ee-Chien Chang
arxiv.org/abs/2506.23183

@arXiv_csNE_bot@mastoxiv.page
2025-06-12 07:43:11

A Topological Improvement of the Overall Performance of Sparse Evolutionary Training: Motif-Based Structural Optimization of Sparse MLPs Project
Xiaotian Chen, Hongyun Liu, Seyed Sahand Mohammadi Ziabari
arxiv.org/abs/2506.09204

@arXiv_csCV_bot@mastoxiv.page
2025-07-29 12:16:31

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Yuhan Wang, Siwei Yang, Bingchen Zhao, Letian Zhang, Qing Liu, Yuyin Zhou, Cihang Xie
arxiv.org/abs/2507.21033

@arXiv_astrophIM_bot@mastoxiv.page
2025-07-28 07:58:01

Recommendations to overcome language barriers in the Vera C. Rubin Observatory Research Ecosystem
Jos\'e Antonio Alonso Pav\'on, Andr\'es Alejandro Plazas Malag\'on
arxiv.org/abs/2507.18682

@gwire@mastodon.social
2025-07-15 07:25:46

One aspect of copyright is you may be licensed to have a copy of a file, but that doesn't grant the right to sub-license. So, technically, you can't use a service to transfer that file or (often) store it on cloud services - because they usually require you grant a perpetual license.
In practice this has been ignored as nobody expected it to be an issue.
Maybe until the question of AI training?

@cowboys@darktundra.xyz
2025-07-02 10:31:54

Cowboys Headlines: Oxnard votes on camp's future; who are most important Cowboys in 2025? cowboyswire.usatoday.com/story

@raiders@darktundra.xyz
2025-07-23 20:33:24

Raiders feeding off Pete Carroll's positive energy as training camp opens nytimes.com/athletic/6511510/2

@arXiv_physicsclassph_bot@mastoxiv.page
2025-07-02 09:12:40

Physics. Tasks With Solutions
Lidiia L. Chinarova, Ivan L. Andronov, Nina V. Savchuk, Serhii I. Iovchev, Hanna M. Akopian
arxiv.org/abs/2507.00064

@arXiv_csSD_bot@mastoxiv.page
2025-07-02 09:22:19

MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
Nikolai Lund K\"uhne, Jesper Jensen, Jan {\O}stergaard, Zheng-Hua Tan
arxiv.org/abs/2507.00966

@raiders@darktundra.xyz
2025-07-23 15:22:37

Raiders Wide Receiver Position Battles in Training Camp si.com/nfl/raiders/las-vegas-p

@arXiv_astrophSR_bot@mastoxiv.page
2025-06-04 07:47:05

Homogeneous Stellar Atmospheric Parameters and 22 Elemental Abundances for FGK Stars Derived From LAMOST Low-resolution Spectra with DD-Payne
Meng Zhang, Maosheng Xiang, Yuan-Sen Ting, Anish Maynur Amarsi, Hua-Wei Zhang, Jianrong Shi, Haibo Yuan, Haining Li, Jiahui Wang, Yaqian Wu, Tianmin Wu, Lanya Mou, Hong-liang Yan, Jifeng Liu

@arXiv_econEM_bot@mastoxiv.page
2025-06-23 08:08:39

Leave No One Undermined: Policy Targeting with Regret Aversion
Toru Kitagawa, Sokbae Lee, Chen Qiu
arxiv.org/abs/2506.16430

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-08-01 09:28:31

Machine learning Landau free energy potentials
Mauro Pulzone, Natalya S. Fedorova, Hugo Aramberri, Jorge \'I\~niguez-Gonz\'alez
arxiv.org/abs/2507.23369

@arXiv_mathOC_bot@mastoxiv.page
2025-05-30 10:08:58

This arxiv.org/abs/2406.10065 has been replaced.
initial toot: mastoxiv.page/@arXiv_mat…

@arXiv_qbioQM_bot@mastoxiv.page
2025-05-28 07:37:28

Mathematical Modelling and Optimisation of Athletic Performance: Tapering and Periodisation
David Ceddia, Howard Bondell, Peter Taylor
arxiv.org/abs/2505.20859

@arXiv_eessIV_bot@mastoxiv.page
2025-07-22 08:49:30

NuSeC: A Dataset for Nuclei Segmentation in Breast Cancer Histopathology Images
Refik Samet, Nooshin Nemati, Emrah Hancer, Serpil Sak, Bilge Ayca Kirmizi
arxiv.org/abs/2507.14272

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:58:19

Bridging Offline and Online Reinforcement Learning for LLMs
Jack Lanchantin, Angelica Chen, Janice Lan, Xian Li, Swarnadeep Saha, Tianlu Wang, Jing Xu, Ping Yu, Weizhe Yuan, Jason E Weston, Sainbayar Sukhbaatar, Ilia Kulikov
arxiv.org/abs/2506.21495 arxiv.org/pdf/2506.21495 arxiv.org/html/2506.21495
arXiv:2506.21495v1 Announce Type: new
Abstract: We investigate the effectiveness of reinforcement learning methods for finetuning large language models when transitioning from offline to semi-online to fully online regimes for both verifiable and non-verifiable tasks. Our experiments cover training on verifiable math as well as non-verifiable instruction following with a set of benchmark evaluations for both. Across these settings, we extensively compare online and semi-online Direct Preference Optimization and Group Reward Policy Optimization objectives, and surprisingly find similar performance and convergence between these variants, which all strongly outperform offline methods. We provide a detailed analysis of the training dynamics and hyperparameter selection strategies to achieve optimal results. Finally, we show that multi-tasking with verifiable and non-verifiable rewards jointly yields improved performance across both task types.
toXiv_bot_toot

@arXiv_physicsplasmph_bot@mastoxiv.page
2025-07-23 08:43:02

Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models
Aaron Ho (MIT Plasma Science and Fusion Center, Cambridge, USA), Lorenzo Zanisi (UKAEA Culham Centre for Fusion Energy, Abingdon, UK), Bram de Leeuw (Radboud University, Nijmegen, Netherlands), Vincent Galvan (MIT Plasma Science and Fusion Center, Cambridge, USA), Pablo Rodriguez-Fernandez (MIT Plasma Science and Fusion Center, Cambridge, USA), Nath…

@raiders@darktundra.xyz
2025-07-22 22:42:17

This Veteran Looks to Set Tone at Raiders' Training Camp si.com/nfl/raiders/news/elando

@cowboys@darktundra.xyz
2025-06-13 18:03:35

Tyler Booker grateful for guidance, all set for Cowboys' training camp: 'I'm not gonna hold the offense back' dallascowboys.com/news/tyler-b

@arXiv_eessSY_bot@mastoxiv.page
2025-07-17 09:12:50

Inductance Estimation for High-Power Multilayer Rectangle Planar Windings
Theofilos Papadopoulos, Antonios Antonopoulos
arxiv.org/abs/2507.12082

@arXiv_eessAS_bot@mastoxiv.page
2025-05-30 09:57:31

This arxiv.org/abs/2505.21527 has been replaced.
initial toot: mastoxiv.page/@arXiv_ees…

@cowboys@darktundra.xyz
2025-06-13 19:14:23

Tyler Booker grateful for guidance, all set for Cowboys' training camp: 'I'm not gonna hold the offense back' dallascowboys.com/news/tyler-b

@arXiv_mathOC_bot@mastoxiv.page
2025-05-30 10:08:58

This arxiv.org/abs/2406.10065 has been replaced.
initial toot: mastoxiv.page/@arXiv_mat…

@arXiv_statAP_bot@mastoxiv.page
2025-06-17 12:20:33

Enforcing tail calibration when training probabilistic forecast models
Jakob Benjamin Wessel, Maybritt Schillinger, Frank Kwasniok, Sam Allen
arxiv.org/abs/2506.13687

@arXiv_csCL_bot@mastoxiv.page
2025-07-25 10:13:12

System Report for CCL25-Eval Task 10: SRAG-MAV for Fine-Grained Chinese Hate Speech Recognition
Jiahao Wang, Ramen Liu, Longhui Zhang, Jing Li
arxiv.org/abs/2507.18580

@cowboys@darktundra.xyz
2025-07-16 22:21:27

2025 Training Camp Preview: 4 Cowboys with a ton to prove include two pending free agents cowboyswire.usatoday.com/story

@arXiv_physicsmedph_bot@mastoxiv.page
2025-06-19 09:59:17

Improved Image Reconstruction and Diffusion Parameter Estimation Using a Temporal Convolutional Network Model of Gradient Trajectory Errors
Jonathan B. Martin, Hannah E. Alderson, John C. Gore, Mark D. Does, Kevin D. Harkins
arxiv.org/abs/2506.14995

@arXiv_csNE_bot@mastoxiv.page
2025-06-17 09:46:40

A Synthetic Pseudo-Autoencoder Invites Examination of Tacit Assumptions in Neural Network Design
Assaf Marron
arxiv.org/abs/2506.12076

@tiotasram@kolektiva.social
2025-07-31 16:25:48

LLM coding is the opposite of DRY
An important principle in software engineering is DRY: Don't Repeat Yourself. We recognize that having the same code copied in more than one place is bad for several reasons:
1. It makes the entire codebase harder to read.
2. It increases maintenance burden, since any problems in the duplicated code need to be solved in more than one place.
3. Because it becomes possible for the copies to drift apart if changes to one aren't transferred to the other (maybe the person making the change has forgotten there was a copy) it makes the code more error-prone and harder to debug.
All modern programming languages make it almost entirely unnecessary to repeat code: we can move the repeated code into a "function" or "module" and then reference it from all the different places it's needed. At a larger scale, someone might write an open-source "library" of such functions or modules and instead of re-implementing that functionality ourselves, we can use their code, with an acknowledgement. Using another person's library this way is complicated, because now you're dependent on them: if they stop maintaining it or introduce bugs, you've inherited a problem, but still, you could always copy their project and maintain your own version, and it would be not much more work than if you had implemented stuff yourself from the start. It's a little more complicated than this, but the basic principle holds, and it's a foundational one for software development in general and the open-source movement in particular. The network of "citations" as open-source software builds on other open-source software and people contribute patches to each others' projects is a lot of what makes the movement into a community, and it can lead to collaborations that drive further development. So the DRY principle is important at both small and large scales.
Unfortunately, the current crop of hyped-up LLM coding systems from the big players are antithetical to DRY at all scales:
- At the library scale, they train on open source software but then (with some unknown frequency) replicate parts of it line-for-line *without* any citation [1]. The person who was using the LLM has no way of knowing that this happened, or even any way to check for it. In theory the LLM company could build a system for this, but it's not likely to be profitable unless the courts actually start punishing these license violations, which doesn't seem likely based on results so far and the difficulty of finding out that the violations are happening. By creating these copies (and also mash-ups, along with lots of less-problematic stuff), the LLM users (enabled and encouraged by the LLM-peddlers) are directly undermining the DRY principle. If we see what the big AI companies claim to want, which is a massive shift towards machine-authored code, DRY at the library scale will effectively be dead, with each new project simply re-implementing the functionality it needs instead of every using a library. This might seem to have some upside, since dependency hell is a thing, but the downside in terms of comprehensibility and therefore maintainability, correctness, and security will be massive. The eventual lack of new high-quality DRY-respecting code to train the models on will only make this problem worse.
- At the module & function level, AI is probably prone to re-writing rather than re-using the functions or needs, especially with a workflow where a human prompts it for many independent completions. This part I don't have direct evidence for, since I don't use LLM coding models myself except in very specific circumstances because it's not generally ethical to do so. I do know that when it tries to call existing functions, it often guesses incorrectly about the parameters they need, which I'm sure is a headache and source of bugs for the vibe coders out there. An AI could be designed to take more context into account and use existing lookup tools to get accurate function signatures and use them when generating function calls, but even though that would probably significantly improve output quality, I suspect it's the kind of thing that would be seen as too-baroque and thus not a priority. Would love to hear I'm wrong about any of this, but I suspect the consequences are that any medium-or-larger sized codebase written with LLM tools will have significant bloat from duplicate functionality, and will have places where better use of existing libraries would have made the code simpler. At a fundamental level, a principle like DRY is not something that current LLM training techniques are able to learn, and while they can imitate it from their training sets to some degree when asked for large amounts of code, when prompted for many smaller chunks, they're asymptotically likely to violate it.
I think this is an important critique in part because it cuts against the argument that "LLMs are the modern compliers, if you reject them you're just like the people who wanted to keep hand-writing assembly code, and you'll be just as obsolete." Compilers actually represented a great win for abstraction, encapsulation, and DRY in general, and they supported and are integral to open source development, whereas LLMs are set to do the opposite.
[1] to see what this looks like in action in prose, see the example on page 30 of the NYTimes copyright complaint against OpenAI (#AI #GenAI #LLMs #VibeCoding

@cowboys@darktundra.xyz
2025-07-27 16:01:51

Cowboys give $52M, 4-year deal to player who 'set a new record for worst' TE season ever cowboyswire.usatoday.com/story

@raiders@darktundra.xyz
2025-07-15 11:03:50

Who Will Be Raiders Breakout Star? si.com/nfl/raiders/training-ca

@arXiv_mathOC_bot@mastoxiv.page
2025-07-23 09:12:02

Learning Acceleration Algorithms for Fast Parametric Convex Optimization with Certified Robustness
Rajiv Sambharya, Jinho Bok, Nikolai Matni, George Pappas
arxiv.org/abs/2507.16264

@arXiv_eessIV_bot@mastoxiv.page
2025-06-13 08:56:10

Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches
Andrea Moglia (Politecnico di Milano), Matteo Leccardi (Politecnico di Milano), Matteo Cavicchioli (Politecnico di Milano), Alice Maccarini (Universit\`a di Pavia), Marco Marcon (Politecnico di Milano), Luca Mainardi (Politecnico di Milano), Pietro Cerveri (Politecnico di Milano, Universit\`a di Pavia)

@arXiv_eessAS_bot@mastoxiv.page
2025-07-21 08:08:00

Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition
Cheng-Hung Hu, Yusuke Yasud, Akifumi Yoshimoto, Tomoki Toda
arxiv.org/abs/2507.13626

@raiders@darktundra.xyz
2025-07-23 02:41:47

“You Either Have A Philosophy Or You Don’t”—Raiders HC Pete Carroll On ‘Competition Wednesday’ raiderramble.com/2025/07/22/yo

@cowboys@darktundra.xyz
2025-07-24 19:03:58

Schottenheimer on fight in Cowboys' camp: 'We have to have discipline' dallascowboys.com/news/schotte

@cowboys@darktundra.xyz
2025-06-19 17:55:31

Comparing the Cowboys’ revamped defensive line against the NFC East insidethestar.com/comparing-th