Tootfinder

@arXiv_csLG_bot@mastoxiv.page
2025-07-11 10:23:01

Low Resource Reconstruction Attacks Through Benign Prompts
Sol Yarkoni, Roi Livni
https://arxiv.org/abs/2507.07947 https://arxiv.org/pdf/2507.07947 https://arxiv.org/html/2507.07947
arXiv:2507.07947v1 Announce Type: new
Abstract: The recent advances in generative models such as diffusion models have raised several risks and concerns related to privacy, copyright infringements and data stewardship. To better understand and control the risks, various researchers have created techniques, experiments and attacks that reconstruct images, or part of images, from the training set. While these techniques already establish that data from the training set can be reconstructed, they often rely on high-resources, excess to the training set as well as well-engineered and designed prompts.
In this work, we devise a new attack that requires low resources, assumes little to no access to the actual training set, and identifies, seemingly, benign prompts that lead to potentially-risky image reconstruction. This highlights the risk that images might even be reconstructed by an uninformed user and unintentionally. For example, we identified that, with regard to one existing model, the prompt ``blue Unisex T-Shirt'' can generate the face of a real-life human model. Our method builds on an intuition from previous works which leverages domain knowledge and identifies a fundamental vulnerability that stems from the use of scraped data from e-commerce platforms, where templated layouts and images are tied to pattern-like prompts.
toXiv_bot_toot

@cowboys@darktundra.xyz
2025-06-11 22:44:40

Double Jointed: Cowboys to host Rams at practice prior to rolling to SoFi for exhibition https://cowboyswire.usatoday.com/story/sports/nfl/cowboys/2025/06/11/cowboys-rams-joint-practice-training-camp-p…

Double Jointed: Cowboys to host Rams at practice prior to rolling to SoFi for exhibition
Dallas and Los Angeles are set to meet twice in one week, with a joint practice and preseason game on tap.

@raiders@darktundra.xyz
2025-07-08 11:52:27

NFL Coaching Mindset From Minicamp to Training Camp https://www.si.com/nfl/raiders/las-vegas-training-camp-pete-carroll-geno-smith

NFL Coaching Mindset From Minicamp to Training Camp
The Las Vegas Raiders are getting set for training camp later this month. But the time from minicamp to training camp is a worry for NFL coaches.

@arXiv_csAI_bot@mastoxiv.page
2025-08-11 09:36:29

Symmetry breaking for inductive logic programming
Andrew Cropper, David M. Cerna, Matti J\"arvisalo
https://arxiv.org/abs/2508.06263 https://arxiv.org…

Symmetry breaking for inductive logic programming
The goal of inductive logic programming is to search for a hypothesis that generalises training data and background knowledge. The challenge is searching vast hypothesis spaces, which is exacerbated because many logically equivalent hypotheses exist. To address this challenge, we introduce a method to break symmetries in the hypothesis space. We implement our idea in answer set programming. Our experiments on multiple domains, including visual reasoning and game playing, show that our approach …

@arXiv_csCV_bot@mastoxiv.page
2025-07-10 09:02:11

Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection
Yupeng Hu, Changxing Ding, Chang Sun, Shaoli Huang, Xiangmin Xu
https://arxiv.org/abs/2507.06510

Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection
Open vocabulary Human-Object Interaction (HOI) detection is a challenging task that detects all triplets of interest in an image, even those that are not pre-defined in the training set. Existing approaches typically rely on output features generated by large Vision-Language Models (VLMs) to enhance the generalization ability of interaction representations. However, the visual features produced by VLMs are holistic and coarse-grained, which contradicts the nature of detect…

@cowboys@darktundra.xyz
2025-06-11 22:36:02

Double Jointed: Cowboys to host Rams at practice prior to rolling to SoFi for exhibition
Dallas and Los Angeles are set to meet twice in one week, with a joint practice and preseason game on tap.

@arXiv_astrophSR_bot@mastoxiv.page
2025-06-10 17:49:10

This https://arxiv.org/abs/2506.02763 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_…

Homogeneous Stellar Atmospheric Parameters and 22 Elemental Abundances for FGK Stars Derived From LAMOST Low-resolution Spectra with DD-Payne
A deep understanding of our Galaxy desires detailed decomposition of its stellar populations via their chemical fingerprints. This requires precise stellar abundances of many elements for a large number of stars. Here we present an updated catalog of stellar labels derived from LAMOST low-resolution spectra in a physics-sensible and rigorous manner with DD-Payne, taking labels from high-resolution spectroscopy as training set. The catalog contains atmospheric parameters for 6.4 million stars re…

@raiders@darktundra.xyz
2025-07-10 21:03:20

Chemistry on the Defensive Line Sets Up the Raiders Well https://www.si.com/nfl/raiders/las-vegas-malcolm-koonce-adam-butler-maxx-crosby-christian-wilkins

Chemistry on the Defensive Line Sets Up the Raiders Well
The Las Vegas Raiders have a lot of returning players on the defensive line. That chemistry on the defensive line is almost set, and training camp will make it better

@arXiv_csLG_bot@mastoxiv.page
2025-06-09 10:08:22

What Really is a Member? Discrediting Membership Inference via Poisoning
Neal Mangaokar, Ashish Hooda, Zhuohang Li, Bradley A. Malin, Kassem Fawaz, Somesh Jha, Atul Prakash, Amrita Roy Chowdhury
https://arxiv.org/abs/2506.06003

What Really is a Member? Discrediting Membership Inference via Poisoning
Membership inference tests aim to determine whether a particular data point was included in a language model's training set. However, recent works have shown that such tests often fail under the strict definition of membership based on exact matching, and have suggested relaxing this definition to include semantic neighbors as members as well. In this work, we show that membership inference tests are still unreliable under this relaxation - it is possible to poison the training dataset in a way…

@arXiv_csCE_bot@mastoxiv.page
2025-07-09 08:10:02

Bridging Sequential Deep Operator Network and Video Diffusion: Residual Refinement of Spatio-Temporal PDE Solutions
Jaewan Park, Farid Ahmed, Kazuma Kobayashi, Seid Koric, Syed Bahauddin Alam, Iwona Jasiuk, Diab Abueidda
https://arxiv.org/abs/2507.06133

Bridging Sequential Deep Operator Network and Video Diffusion: Residual Refinement of Spatio-Temporal PDE Solutions
Video-diffusion models have recently set the standard in video generation, inpainting, and domain translation thanks to their training stability and high perceptual fidelity. Building on these strengths, we repurpose conditional video diffusion as a physics surrogate for spatio-temporal fields governed by partial differential equations (PDEs). Our two-stage surrogate first applies a Sequential Deep Operator Network (S-DeepONet) to produce a coarse, physics-consistent prior from the prescribed b…

@NFL@darktundra.xyz
2025-07-26 21:16:35

Titans' Treylon Burks fractures collarbone in training camp, set to miss at least season opener, per report

https://www.cbssports.com/nfl/news/titans-…

Titans' Treylon Burks fractures collarbone in training camp, set to miss at least season opener, per report
Burks has yet to log a healthy season and is back on the shelf with another injury

@arXiv_condmatstatmech_bot@mastoxiv.page
2025-07-08 11:22:50

Pseudo-likelihood produces associative memories able to generalize, even for asymmetric couplings
Francesco D'Amico, Dario Bocchi, Luca Maria Del Bono, Saverio Rossi, Matteo Negri
https://arxiv.org/abs/2507.05147

Pseudo-likelihood produces associative memories able to generalize, even for asymmetric couplings
Energy-based probabilistic models learned by maximizing the likelihood of the data are limited by the intractability of the partition function. A widely used workaround is to maximize the pseudo-likelihood, which replaces the global normalization with tractable local normalizations. Here we show that, in the zero-temperature limit, a network trained to maximize pseudo-likelihood naturally implements an associative memory: if the training set is small, patterns become fixed-point attractors whos…

@arXiv_csSE_bot@mastoxiv.page
2025-08-04 08:24:41

How Quantization Impacts Privacy Risk on LLMs for Code?
Md Nazmul Haque, Hua Yang, Zhou Yang, Bowen Xu
https://arxiv.org/abs/2508.00128 https://arxiv.org/p…

How Quantization Impacts Privacy Risk on LLMs for Code?
Large language models for code (LLMs4Code) rely heavily on massive training data, including sensitive data, such as cloud service credentials of the projects and personal identifiable information of the developers, raising serious privacy concerns. Membership inference (MI) has recently emerged as an effective tool for assessing privacy risk by identifying whether specific data belong to a model's training set. In parallel, model compression techniques, especially quantization, have gained trac…

@arXiv_csLG_bot@mastoxiv.page
2025-07-09 14:33:49

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[2/6]:
- Training Set Reconstruction from Differentially Private Forests: How Effective is DP?
Alice Gorg\'e, Julien Ferry, S\'ebastien Gambs, Thibaut Vidal

@arXiv_csRO_bot@mastoxiv.page
2025-08-01 08:28:21

Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks
Vira Joshi, Zifan Xu, Bo Liu, Peter Stone, Amy Zhang
https://arxiv.org/abs/2507.23172 ht…

Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks
Multi-task Reinforcement Learning (MTRL) has emerged as a critical training paradigm for applying reinforcement learning (RL) to a set of complex real-world robotic tasks, which demands a generalizable and robust policy. At the same time, \emph{massively parallelized training} has gained popularity, not only for significantly accelerating data collection through GPU-accelerated simulation but also for enabling diverse data collection across multiple tasks by simulating heterogeneous scenes in p…

@tiotasram@kolektiva.social
2025-08-04 15:49:00

Should we teach vibe coding? Here's why not.
Should AI coding be taught in undergrad CS education?
1/2
I teach undergraduate computer science labs, including for intro and more-advanced core courses. I don't publish (non-negligible) scholarly work in the area, but I've got years of craft expertise in course design, and I do follow the academic literature to some degree. In other words, In not the world's leading expert, but I have spent a lot of time thinking about course design, and consider myself competent at it, with plenty of direct experience in what knowledge & skills I can expect from students as they move through the curriculum.
I'm also strongly against most uses of what's called "AI" these days (specifically, generative deep neutral networks as supplied by our current cadre of techbro). There are a surprising number of completely orthogonal reasons to oppose the use of these systems, and a very limited number of reasonable exceptions (overcoming accessibility barriers is an example). On the grounds of environmental and digital-commons-pollution costs alone, using specifically the largest/newest models is unethical in most cases.
But as any good teacher should, I constantly question these evaluations, because I worry about the impact on my students should I eschew teaching relevant tech for bad reasons (and even for his reasons). I also want to make my reasoning clear to students, who should absolutely question me on this. That inspired me to ask a simple question: ignoring for one moment the ethical objections (which we shouldn't, of course; they're very stark), at what level in the CS major could I expect to teach a course about programming with AI assistance, and expect students to succeed at a more technically demanding final project than a course at the same level where students were banned from using AI? In other words, at what level would I expect students to actually benefit from AI coding "assistance?"
To be clear, I'm assuming that students aren't using AI in other aspects of coursework: the topic of using AI to "help you study" is a separate one (TL;DR it's gross value is not negative, but it's mostly not worth the harm to your metacognitive abilities, which AI-induced changes to the digital commons are making more important than ever).
So what's my answer to this question?
If I'm being incredibly optimistic, senior year. Slightly less optimistic, second year of a masters program. Realistic? Maybe never.
The interesting bit for you-the-reader is: why is this my answer? (Especially given that students would probably self-report significant gains at lower levels.) To start with, [this paper where experienced developers thought that AI assistance sped up their work on real tasks when in fact it slowed it down] (https://arxiv.org/abs/2507.09089) is informative. There are a lot of differences in task between experienced devs solving real bugs and students working on a class project, but it's important to understand that we shouldn't have a baseline expectation that AI coding "assistants" will speed things up in the best of circumstances, and we shouldn't trust self-reports of productivity (or the AI hype machine in general).
Now we might imagine that coding assistants will be better at helping with a student project than at helping with fixing bugs in open-source software, since it's a much easier task. For many programming assignments that have a fixed answer, we know that many AI assistants can just spit out a solution based on prompting them with the problem description (there's another elephant in the room here to do with learning outcomes regardless of project success, but we'll ignore this over too, my focus here is on project complexity reach, not learning outcomes). My question is about more open-ended projects, not assignments with an expected answer. Here's a second study (by one of my colleagues) about novices using AI assistance for programming tasks. It showcases how difficult it is to use AI tools well, and some of these stumbling blocks that novices in particular face.
But what about intermediate students? Might there be some level where the AI is helpful because the task is still relatively simple and the students are good enough to handle it? The problem with this is that as task complexity increases, so does the likelihood of the AI generating (or copying) code that uses more complex constructs which a student doesn't understand. Let's say I have second year students writing interactive websites with JavaScript. Without a lot of care that those students don't know how to deploy, the AI is likely to suggest code that depends on several different frameworks, from React to JQuery, without actually setting up or including those frameworks, and of course three students would be way out of their depth trying to do that. This is a general problem: each programming class carefully limits the specific code frameworks and constructs it expects students to know based on the material it covers. There is no feasible way to limit an AI assistant to a fixed set of constructs or frameworks, using current designs. There are alternate designs where this would be possible (like AI search through adaptation from a controlled library of snippets) but those would be entirely different tools.
So what happens on a sizeable class project where the AI has dropped in buggy code, especially if it uses code constructs the students don't understand? Best case, they understand that they don't understand and re-prompt, or ask for help from an instructor or TA quickly who helps them get rid of the stuff they don't understand and re-prompt or manually add stuff they do. Average case: they waste several hours and/or sweep the bugs partly under the rug, resulting in a project with significant defects. Students in their second and even third years of a CS major still have a lot to learn about debugging, and usually have significant gaps in their knowledge of even their most comfortable programming language. I do think regardless of AI we as teachers need to get better at teaching debugging skills, but the knowledge gaps are inevitable because there's just too much to know. In Python, for example, the LLM is going to spit out yields, async functions, try/finally, maybe even something like a while/else, or with recent training data, the walrus operator. I can't expect even a fraction of 3rd year students who have worked with Python since their first year to know about all these things, and based on how students approach projects where they have studied all the relevant constructs but have forgotten some, I'm not optimistic seeing these things will magically become learning opportunities. Student projects are better off working with a limited subset of full programming languages that the students have actually learned, and using AI coding assistants as currently designed makes this impossible. Beyond that, even when the "assistant" just introduces bugs using syntax the students understand, even through their 4th year many students struggle to understand the operation of moderately complex code they've written themselves, let alone written by someone else. Having access to an AI that will confidently offer incorrect explanations for bugs will make this worse.
To be sure a small minority of students will be able to overcome these problems, but that minority is the group that has a good grasp of the fundamentals and has broadened their knowledge through self-study, which earlier AI-reliant classes would make less likely to happen. In any case, I care about the average student, since we already have plenty of stuff about our institutions that makes life easier for a favored few while being worse for the average student (note that our construction of that favored few as the "good" students is a large part of this problem).
To summarize: because AI assistants introduce excess code complexity and difficult-to-debug bugs, they'll slow down rather than speed up project progress for the average student on moderately complex projects. On a fixed deadline, they'll result in worse projects, or necessitate less ambitious project scoping to ensure adequate completion, and I expect this remains broadly true through 4-6 years of study in most programs (don't take this as an endorsement of AI "assistants" for masters students; we've ignored a lot of other problems along the way).
There's a related problem: solving open-ended project assignments well ultimately depends on deeply understanding the problem, and AI "assistants" allow students to put a lot of code in their file without spending much time thinking about the problem or building an understanding of it. This is awful for learning outcomes, but also bad for project success. Getting students to see the value of thinking deeply about a problem is a thorny pedagogical puzzle at the best of times, and allowing the use of AI "assistants" makes the problem much much worse. This is another area I hope to see (or even drive) pedagogical improvement in, for what it's worth.
1/2

@arXiv_qbioQM_bot@mastoxiv.page
2025-06-04 13:54:56

This https://arxiv.org/abs/2506.00593 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_qbi…

Look mom, no experimental data! Learning to score protein-ligand interactions from simulations
Despite recent advances in protein-ligand structure prediction, deep learning methods remain limited in their ability to accurately predict binding affinities, particularly for novel protein targets dissimilar from the training set. In contrast, physics-based binding free energy calculations offer high accuracy across chemical space but are computationally prohibitive for large-scale screening. We propose a hybrid approach that approximates the accuracy of physics-based methods by training targ…

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-06-06 09:54:18

This https://arxiv.org/abs/2411.14608 has been replaced.
initial toot: https://mastoxiv.page/@a…

Toward machine learning interatomic potentials for modeling uranium mononitride
Uranium mononitride (UN) is a promising accident-tolerant fuel because of its high fissile density and high thermal conductivity. In this study, we developed the first machine learning interatomic potentials for reliable atomic-scale modeling of UN at finite temperatures. We constructed a training set using density functional theory (DFT) calculations that was enriched through an active learning procedure, and two neural network potentials were generated. Both potentials successfully reproduce …

@arXiv_csCR_bot@mastoxiv.page
2025-06-04 07:32:03

Privacy Leaks by Adversaries: Adversarial Iterations for Membership Inference Attack
Jing Xue, Zhishen Sun, Haishan Ye, Luo Luo, Xiangyu Chang, Ivor Tsang, Guang Dai
https://arxiv.org/abs/2506.02711

Privacy Leaks by Adversaries: Adversarial Iterations for Membership Inference Attack
Membership inference attack (MIA) has become one of the most widely used and effective methods for evaluating the privacy risks of machine learning models. These attacks aim to determine whether a specific sample is part of the model's training set by analyzing the model's output. While traditional membership inference attacks focus on leveraging the model's posterior output, such as confidence on the target sample, we propose IMIA, a novel attack strategy that utilizes the process of generatin…

@NFL@darktundra.xyz
2025-07-21 19:16:27

Buccaneers' Chris Godwin Jr. set to miss start of training camp as he rehabs from 2024 ankle injury

https://www.cbssports.com/nfl/news/buccane

Buccaneers' Chris Godwin Jr. set to miss start of training camp as he rehabs from 2024 ankle injury
Godwin was forced to undergo season-ending ankle surgery in 2024

@arXiv_csNI_bot@mastoxiv.page
2025-08-05 09:08:40

On Effectiveness of Graph Neural Network Architectures for Network Digital Twins (NDTs)
Iulisloi Zacarias, Oussama Ben Taarit, Admela Jukan
https://arxiv.org/abs/2508.02373 http…

On Effectiveness of Graph Neural Network Architectures for Network Digital Twins (NDTs)
Future networks, such as 6G, will need to support a vast and diverse range of interconnected devices and applications, each with its own set of requirements. While traditional network management approaches will suffice, an automated solutions are becoming a must. However, network automation frameworks are prone to errors, and often they employ ML-based techniques that require training to learn how the network can be optimized. In this sense, network digital twins are a useful tool that allows f…

@arXiv_astrophSR_bot@mastoxiv.page
2025-08-06 08:09:50

Spectroscopic ages for 4 million main-sequence dwarf stars from LAMOST DR10 estimated with data-driven approach
Jia-Hui Wang, Maosheng Xiang, Meng Zhang, Jiwei Xie, Jian Ge, Jinghua Zhang, Lanya Mou, Jifeng Liu
https://arxiv.org/abs/2508.03019

Spectroscopic ages for 4 million main-sequence dwarf stars from LAMOST DR10 estimated with data-driven approach
Stellar age determination for large samples of stars opens new avenues for a broad range of astronomical sciences. While precise stellar ages for evolved stars have been derived from large ground- and space-based stellar surveys, reliable age determination for cool main-sequence dwarf stars remains a challenge. In this work, we set out to estimate the age of dwarf stars from the LAMOST spectra with a data-driven approach. We build a training set by using wide binaries that the primary component…

@arXiv_csSD_bot@mastoxiv.page
2025-05-30 07:23:39

ZeroSep: Separate Anything in Audio with Zero Training
Chao Huang, Yuesheng Ma, Junxuan Huang, Susan Liang, Yunlong Tang, Jing Bi, Wenqiang Liu, Nima Mesgarani, Chenliang Xu
https://arxiv.org/abs/2505.23625

ZeroSep: Separate Anything in Audio with Zero Training
Audio source separation is fundamental for machines to understand complex acoustic environments and underpins numerous audio applications. Current supervised deep learning approaches, while powerful, are limited by the need for extensive, task-specific labeled data and struggle to generalize to the immense variability and open-set nature of real-world acoustic scenes. Inspired by the success of generative foundation models, we investigate whether pre-trained text-guided audio diffusion models c…

@arXiv_statME_bot@mastoxiv.page
2025-06-03 16:55:00

This https://arxiv.org/abs/2112.13738 has been replaced.
link: https://scholar.google.com/scholar?q=a

Evaluation of binary classifiers for asymptotically dependent and independent extremes
Machine learning classification methods usually assume that all possible classes are sufficiently present within the training set. Due to their inherent rarities, extreme events are always under-represented and classifiers tailored for predicting extremes need to be carefully designed to handle this under-representation. In this paper, we address the question of how to assess and compare classifiers with respect to their capacity to capture extreme occurrences. This is also related to the topic…

@metacurity@infosec.exchange
2025-07-12 12:19:58

Each week, Metacurity offers our free and paid subscribers a weekly digest of the best long-form (and longish) infosec-related pieces we couldn't properly fit into our daily news crush.
This week's selection covers
--Satellite jamming and spoofing set back global shipping,
--AI is inherently janky,
--Salvadoran gender freedom advocates fight oppression with digital training,
--A new sector arises to fight AI false positives in Chinese universities,

@arXiv_physicsoptics_bot@mastoxiv.page
2025-06-04 07:48:29

Inverse design for robust inference in integrated computational spectrometry
Wenchao Ma, Rapha\"el Pestourie, Zin Lin, Steven G. Johnson
https://arxiv.org/abs/2506.02194

Inverse design for robust inference in integrated computational spectrometry
For computational spectrometers, we propose an inverse-design approach in which the scattering media are topology-optimized to achieve better performance in inference, without the need of a training set of spectra and a distribution of detector noise. Our approach also allows the selection of the inference algorithm to be decoupled from that of the scatterer. For smooth spectra, we additionally devise a regularized reconstruction algorithm based on Chebyshev interpolation, which yields higher a…

@arXiv_csCL_bot@mastoxiv.page
2025-06-17 09:29:43

Maximally-Informative Retrieval for State Space Model Generation
Evan Becker, Benjamin Bowman, Matthew Trager, Tian Yu Liu, Luca Zancato, Wei Xia, Stefano Soatto
https://arxiv.org/abs/2506.12149

Maximally-Informative Retrieval for State Space Model Generation
Given a query and dataset, the optimal way of answering the query is to make use all the information available. Modern LLMs exhibit impressive ability to memorize training data, but data not deemed important during training is forgotten, and information outside that training set cannot be made use of. Processing an entire dataset at inference time is infeasible due to the bounded nature of model resources (e.g. context size in transformers or states in state space models), meaning we must resor…

@cowboys@darktundra.xyz
2025-06-25 16:44:35

2025 Dallas Cowboys training camp schedule: Full list of official dates https://www.si.com/nfl/cowboys/news/2025-dallas-cowboys-training-camp-schedule-full-list-of-official-dates

2025 Dallas Cowboys training camp schedule: Full list of official dates
The Dallas Cowboys are set to kick off training camp at the end of July. Check out the full practice schedule and list of official dates for every event.

@arXiv_statML_bot@mastoxiv.page
2025-07-14 09:00:12

MIRRAMS: Towards Training Models Robust to Missingness Distribution Shifts
Jihye Lee, Minseo Kang, Dongha Kim
https://arxiv.org/abs/2507.08280 https://

MIRRAMS: Towards Training Models Robust to Missingness Distribution Shifts
In real-world data analysis, missingness distributional shifts between training and test input datasets frequently occur, posing a significant challenge to achieving robust prediction performance. In this study, we propose a novel deep learning framework designed to address such shifts in missingness distributions. We begin by introducing a set of mutual information-based conditions, called MI robustness conditions, which guide a prediction model to extract label-relevant information while rema…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-05 07:24:12

Sound Field Reconstruction Using Physics-Informed Boundary Integral Networks
Stefano Damiano, Toon van Waterschoot
https://arxiv.org/abs/2506.03917 https:/…

Sound Field Reconstruction Using Physics-Informed Boundary Integral Networks
Sound field reconstruction refers to the problem of estimating the acoustic pressure field over an arbitrary region of space, using only a limited set of measurements. Physics-informed neural networks have been adopted to solve the problem by incorporating in the training loss function the governing partial differential equation, either the Helmholtz or the wave equation. In this work, we introduce a boundary integral network for sound field reconstruction. Relying on the Kirchhoff-Helmholtz bo…

@arXiv_eessIV_bot@mastoxiv.page
2025-06-24 09:38:20

CT Radiomics-Based Explainable Machine Learning Model for Accurate Differentiation of Malignant and Benign Endometrial Tumors: A Two-Center Study
Tingrui Zhang, Honglin Wu, Zekun Jiang, Yingying Wang, Rui Ye, Huiming Ni, Chang Liu, Jin Cao, Xuan Sun, Rong Shao, Xiaorong Wei, Yingchun Sun
https://arxiv.org/abs/2506.18106

CT Radiomics-Based Explainable Machine Learning Model for Accurate Differentiation of Malignant and Benign Endometrial Tumors: A Two-Center Study
Aimed to develop and validate a CT radiomics-based explainable machine learning model for diagnosing malignancy and benignity specifically in endometrial cancer (EC) patients. A total of 83 EC patients from two centers, including 46 with malignant and 37 with benign conditions, were included, with data split into a training set (n=59) and a testing set (n=24). The regions of interest (ROIs) were manually segmented from pre-surgical CT scans, and 1132 radiomic features were extracted from the pr…

@arXiv_mathOC_bot@mastoxiv.page
2025-06-17 12:18:33

Counterexample-Guided Synthesis of Robust Discrete-Time Control Barrier Functions
Erfan Shakhesi, Alexander Katriniok, W. P. M. H. Heemels
https://arxiv.org/abs/2506.13011

Counterexample-Guided Synthesis of Robust Discrete-Time Control Barrier Functions
Learning-based methods have gained popularity for training candidate Control Barrier Functions (CBFs) to satisfy the CBF conditions on a finite set of sampled states. However, since the CBF is unknown a priori, it is unclear which sampled states belong to its zero-superlevel set and must satisfy the CBF conditions, and which ones lie outside it. Existing approaches define a set in which all sampled states are required to satisfy the CBF conditions, thus introducing conservatism. In this paper, …

@arXiv_csRO_bot@mastoxiv.page
2025-06-04 14:06:25

This https://arxiv.org/abs/2505.07802 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…

Improving Trajectory Stitching with Flow Models
Generative models have shown great promise as trajectory planners, given their affinity to modeling complex distributions and guidable inference process. Previous works have successfully applied these in the context of robotic manipulation but perform poorly when the required solution does not exist as a complete trajectory within the training set. We identify that this is a result of being unable to plan via stitching, and subsequently address the architectural and dataset choices needed to re…

@arXiv_physicschemph_bot@mastoxiv.page
2025-05-29 07:33:39

Machine Learning Interatomic Potentials: library for efficient training, model development and simulation of molecular systems
Christoph Brunken, Olivier Peltre, Heloise Chomet, Lucien Walewski, Manus McAuliffe, Valentin Heyraud, Solal Attias, Martin Maarand, Yessine Khanfir, Edan Toledo, Fabio Falcioni, Marie Bluntzer, Silvia Acosta-Guti\'errez, Jules Tilly

Machine Learning Interatomic Potentials: library for efficient training, model development and simulation of molecular systems
Machine Learning Interatomic Potentials (MLIP) are a novel in silico approach for molecular property prediction, creating an alternative to disrupt the accuracy/speed trade-off of empirical force fields and density functional theory (DFT). In this white paper, we present our MLIP library which was created with two core aims: (1) provide to industry experts without machine learning background a user-friendly and computationally efficient set of tools to experiment with MLIP models, (2) provide m…

@arXiv_quantph_bot@mastoxiv.page
2025-06-25 10:16:00

Conservative quantum offline model-based optimization
Kristian Sotirov, Annie E. Paine, Savvas Varsamopoulos, Antonio A. Gentile, Osvaldo Simeone
https://arxiv.org/abs/2506.19714 …

Conservative quantum offline model-based optimization
Offline model-based optimization (MBO) refers to the task of optimizing a black-box objective function using only a fixed set of prior input-output data, without any active experimentation. Recent work has introduced quantum extremal learning (QEL), which leverages the expressive power of variational quantum circuits to learn accurate surrogate functions by training on a few data points. However, as widely studied in the classical machine learning literature, predictive models may incorrectly e…

@arXiv_qbioQM_bot@mastoxiv.page
2025-06-03 07:56:45

Look mom, no experimental data! Learning to score protein-ligand interactions from simulations
Michael Brocidiacono, James Wellnitz, Konstantin I. Popov, Alexander Tropsha
https://arxiv.org/abs/2506.00593

@raiders@darktundra.xyz
2025-07-23 01:13:42

This Veteran Looks to Set Tone at Raiders’ Training Camp https://www.si.com/nfl/raiders/news/elandon-roberts-training-camp-pete-carroll-pittsburgh-steelers

This Veteran Looks to Set Tone at Raiders' Training Camp
The Las Vegas Raiders are back and ready to improve.

@cowboys@darktundra.xyz
2025-06-25 16:12:48

2025 Dallas Cowboys training camp schedule: Full list of official dates https://www.si.com/nfl/cowboys/news/2025-dallas-cowboys-training-camp-schedule-full-list-of-official-dates

@arXiv_physicsedph_bot@mastoxiv.page
2025-07-01 08:01:33

Quantum Workshop for IT-Professionals
Bettina Just, J\"org Hettel, Gerhard Hellstern
https://arxiv.org/abs/2506.22525 https://ar…

Quantum Workshop for IT-Professionals
Quantum computing is gaining strategic relevance beyond research-driven industries. However, many companies lack the expertise to evaluate its potential for real-world applications. Traditional training formats often focus on physical principles without demonstrating practical relevance, which limits success. This paper presents a user-centered workshop concept tailored to IT professionals without prior quantum knowledge. Using a business game set in a fictitious company, participants explore q…

@arXiv_eessSP_bot@mastoxiv.page
2025-07-28 09:22:01

Machine Learning based Radio Environment Map Estimation for Indoor Visible Light Communication
Helena Serpi (Tanya), Christina (Tanya), Politi
https://arxiv.org/abs/2507.19149

Machine Learning based Radio Environment Map Estimation for Indoor Visible Light Communication
An innovative method for radio map estimation in optical wireless communications is proposed that is based on Machine Learning rather than simulation techniques. Multi-Layer Perceptron (MLP) representation of indoor Visible Light Communication (VLC) systems is suggested, and signal propagation is estimated. The simulation and performance predictions are accurate, fast and require a reduced set of training sample size with respect to other counterparts, making this solution very suitable for rea…

@arXiv_physicscompph_bot@mastoxiv.page
2025-07-01 08:32:43

Data-Driven Surrogate Modeling of DSMC Solutions Using Deep Neural Networks
Ehsan Roohi, Ahmad Shoja-sani
https://arxiv.org/abs/2506.22453 https://

Data-Driven Surrogate Modeling of DSMC Solutions Using Deep Neural Networks
This study presents a deep neural network (DNN) framework that accelerates Direct Simulation Monte Carlo (DSMC) computations for rarefied-gas flows, while maintaining high physical fidelity. First, a fully connected deep neural network is trained on high-quality DSMC data for seven temperatures (200-650 K) to reproduce the Maxwell-Boltzmann speed distribution of argon. Injecting the physical boundary point into the training set enforces the correct low-speed limit. It reduces the mean-squared e…

@arXiv_csAI_bot@mastoxiv.page
2025-07-31 09:12:31

MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines
Yaolun Zhang, Xiaogeng Liu, Chaowei Xiao
https://arxiv.org/abs/2507.22606 https://

MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines
Large Language Models (LLMs) have demonstrated the ability to solve a wide range of practical tasks within multi-agent systems. However, existing human-designed multi-agent frameworks are typically limited to a small set of pre-defined scenarios, while current automated design methods suffer from several limitations, such as the lack of tool integration, dependence on external training data, and rigid communication structures. In this paper, we propose MetaAgent, a finite state machine based fr…

@arXiv_csCR_bot@mastoxiv.page
2025-07-01 09:51:03

A Practical and Secure Byzantine Robust Aggregator
De Zhang Lee, Aashish Kolluri, Prateek Saxena, Ee-Chien Chang
https://arxiv.org/abs/2506.23183 https://

A Practical and Secure Byzantine Robust Aggregator
In machine learning security, one is often faced with the problem of removing outliers from a given set of high-dimensional vectors when computing their average. For example, many variants of data poisoning attacks produce gradient vectors during training that are outliers in the distribution of clean gradients, which bias the computed average used to derive the ML model. Filtering them out before averaging serves as a generic defense strategy. Byzantine robust aggregation is an algorithmic pri…

@arXiv_csNE_bot@mastoxiv.page
2025-06-12 07:43:11

A Topological Improvement of the Overall Performance of Sparse Evolutionary Training: Motif-Based Structural Optimization of Sparse MLPs Project
Xiaotian Chen, Hongyun Liu, Seyed Sahand Mohammadi Ziabari
https://arxiv.org/abs/2506.09204

A Topological Improvement of the Overall Performance of Sparse Evolutionary Training: Motif-Based Structural Optimization of Sparse MLPs Project
Deep Neural Networks (DNNs) have been proven to be exceptionally effective and have been applied across diverse domains within deep learning. However, as DNN models increase in complexity, the demand for reduced computational costs and memory overheads has become increasingly urgent. Sparsity has emerged as a leading approach in this area. The robustness of sparse Multi-layer Perceptrons (MLPs) for supervised feature selection, along with the application of Sparse Evolutionary Training (SET), i…

@arXiv_csCV_bot@mastoxiv.page
2025-07-29 12:16:31

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Yuhan Wang, Siwei Yang, Bingchen Zhao, Letian Zhang, Qing Liu, Yuyin Zhou, Cihang Xie
https://arxiv.org/abs/2507.21033

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Recent advancements in large multimodal models like GPT-4o have set a new standard for high-fidelity, instruction-guided image editing. However, the proprietary nature of these models and their training data creates a significant barrier for open-source research. To bridge this gap, we introduce GPT-IMAGE-EDIT-1.5M, a publicly available, large-scale image-editing corpus containing more than 1.5 million high-quality triplets (instruction, source image, edited image). We systematically construct …

@arXiv_astrophIM_bot@mastoxiv.page
2025-07-28 07:58:01

Recommendations to overcome language barriers in the Vera C. Rubin Observatory Research Ecosystem
Jos\'e Antonio Alonso Pav\'on, Andr\'es Alejandro Plazas Malag\'on
https://arxiv.org/abs/2507.18682

Recommendations to overcome language barriers in the Vera C. Rubin Observatory Research Ecosystem
The report presents a comprehensive set of five recommendations to reduce language barriers within the Vera C. Rubin Observatory Research Ecosystem, promoting greater inclusion of researchers who are speakers of English as an additional language. Recognizing that English linguistic hegemony in science limits participation and productivity, the document proposes multilingual presentation formats, academic writing training, a Virtual Writing Center, language support programs, and writing retreats…

@cowboys@darktundra.xyz
2025-07-02 10:31:54

Cowboys Headlines: Oxnard votes on camp's future; who are most important Cowboys in 2025? https://cowboyswire.usatoday.com/story/sports/nfl/cowboys/2025/07/02/news-headlines…

Cowboys Headlines: Oxnard votes on camp's future; who are most important Cowboys in 2025?
The city of Oxnard was set to vote Tuesday on the future of the Cowboys' training camp. A look at the team's most important players this year.

@raiders@darktundra.xyz
2025-07-23 20:33:24

Raiders feeding off Pete Carroll's positive energy as training camp opens https://www.nytimes.com/athletic/6511510/2025/07/23/raiders-training-camp-pete-carroll/

Raiders feeding off Pete Carroll’s positive energy as training camp opens
The new coach has set an optimistic tone as training camp kicks off in Las Vegas. Here are the highlights from Wednesday's opening session.

@arXiv_physicsclassph_bot@mastoxiv.page
2025-07-02 09:12:40

Physics. Tasks With Solutions
Lidiia L. Chinarova, Ivan L. Andronov, Nina V. Savchuk, Serhii I. Iovchev, Hanna M. Akopian
https://arxiv.org/abs/2507.00064 …

Physics. Tasks With Solutions
The study guide (textbook) is part of a set of materials designed to support high-quality practical training in physics. It includes a collection of tasks for organizing both in-class and independent work. The guide serves as a foundation for further study in physics-related disciplines and aligns with current educational programs. This textbook presents a curated set of 120 physics problems with detailed solutions, structured according to the first-year bachelor's curriculum. Each section ad…

@arXiv_csSD_bot@mastoxiv.page
2025-07-02 09:22:19

MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
Nikolai Lund K\"uhne, Jesper Jensen, Jan {\O}stergaard, Zheng-Hua Tan
https://arxiv.org/abs/2507.00966

MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
With the advent of new sequence models like Mamba and xLSTM, several studies have shown that these models match or outperform state-of-the-art models in single-channel speech enhancement, automatic speech recognition, and self-supervised audio representation learning. However, prior research has demonstrated that sequence models like LSTM and Mamba tend to overfit to the training set. To address this issue, previous works have shown that adding self-attention to LSTMs substantially improves gen…

@raiders@darktundra.xyz
2025-07-23 15:22:37

Raiders Wide Receiver Position Battles in Training Camp https://www.si.com/nfl/raiders/las-vegas-position-battles-training-camp-jack-bech-jakobi-meyers-tre-tucker-geno-smith

Raiders Wide Receiver Position Battles in Training Camp
The Las Vegas Raiders will have a lot of competition at training camp at the wide receiver position. The Raiders have their starter set, but after that, it is anyone's spot for the taking.

@arXiv_astrophSR_bot@mastoxiv.page
2025-06-04 07:47:05

Homogeneous Stellar Atmospheric Parameters and 22 Elemental Abundances for FGK Stars Derived From LAMOST Low-resolution Spectra with DD-Payne
Meng Zhang, Maosheng Xiang, Yuan-Sen Ting, Anish Maynur Amarsi, Hua-Wei Zhang, Jianrong Shi, Haibo Yuan, Haining Li, Jiahui Wang, Yaqian Wu, Tianmin Wu, Lanya Mou, Hong-liang Yan, Jifeng Liu
https://

@arXiv_econEM_bot@mastoxiv.page
2025-06-23 08:08:39

Leave No One Undermined: Policy Targeting with Regret Aversion
Toru Kitagawa, Sokbae Lee, Chen Qiu
https://arxiv.org/abs/2506.16430 https://

Leave No One Undermined: Policy Targeting with Regret Aversion
While the importance of personalized policymaking is widely recognized, fully personalized implementation remains rare in practice. We study the problem of policy targeting for a regret-averse planner when training data gives a rich set of observable characteristics while the assignment rules can only depend on its subset. Grounded in decision theory, our regret-averse criterion reflects a planner's concern about regret inequality across the population, which generally leads to a fractional opt…

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-08-01 09:28:31

Machine learning Landau free energy potentials
Mauro Pulzone, Natalya S. Fedorova, Hugo Aramberri, Jorge \'I\~niguez-Gonz\'alez
https://arxiv.org/abs/2507.23369 https://…

Machine learning Landau free energy potentials
We show how to construct Landau-like free energy potentials using a machine-learning approach. For concreteness, we focus on perovskite oxide PbTiO$_{3}$. We work with a training set obtained from Monte Carlo simulations based on an atomistic ''second-principles'' potential for PbTiO$_{3}$. We rely exclusively on data that would be experimentally accessible -- i.e., temperature-dependent polarization and strain, both with and without external electric fields and stresses applied --, to explore …

@arXiv_mathOC_bot@mastoxiv.page
2025-05-30 10:08:58

This https://arxiv.org/abs/2406.10065 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_mat…

An Extended Validity Domain for Constraint Learning
We consider embedding a predictive machine-learning model within a prescriptive optimization problem. In this setting, called constraint learning, we study the concept of a validity domain, i.e., a constraint added to the feasible set, which keeps the optimization close to the training data, thus helping to ensure that the computed optimal solution exhibits less prediction error. In particular, we propose a new validity domain which uses a standard convex-hull idea but in an extended space. We …

@arXiv_qbioQM_bot@mastoxiv.page
2025-05-28 07:37:28

Mathematical Modelling and Optimisation of Athletic Performance: Tapering and Periodisation
David Ceddia, Howard Bondell, Peter Taylor
https://arxiv.org/abs/2505.20859

Mathematical Modelling and Optimisation of Athletic Performance: Tapering and Periodisation
We conduct a mathematical optimisation of the training load to maximise performance for two seminal athletic performance models: the Banister et al. 1975 Fitness-Fatigue Impulse Response Model and the Busso 2003 Variable Dose-Response Model. We discuss discrepancies in the general trends of the optimised training loads compared to common training practices recommended in the sports science literature, such as tapering and periodisation. We then propose a set of interpretable nonlinear modificat…

@arXiv_eessIV_bot@mastoxiv.page
2025-07-22 08:49:30

NuSeC: A Dataset for Nuclei Segmentation in Breast Cancer Histopathology Images
Refik Samet, Nooshin Nemati, Emrah Hancer, Serpil Sak, Bilge Ayca Kirmizi
https://arxiv.org/abs/2507.14272

NuSeC: A Dataset for Nuclei Segmentation in Breast Cancer Histopathology Images
The NuSeC dataset is created by selecting 4 images with the size of 1024*1024 pixels from the slides of each patient among 25 patients. Therefore, there are a total of 100 images in the NuSeC dataset. To carry out a consistent comparative analysis between the methods that will be developed using the NuSeC dataset by the researchers in the future, we divide the NuSeC dataset 75% as the training set and 25% as the testing set. In detail, an image is randomly selected from 4 images of each patient…

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:58:19

Bridging Offline and Online Reinforcement Learning for LLMs
Jack Lanchantin, Angelica Chen, Janice Lan, Xian Li, Swarnadeep Saha, Tianlu Wang, Jing Xu, Ping Yu, Weizhe Yuan, Jason E Weston, Sainbayar Sukhbaatar, Ilia Kulikov
https://arxiv.org/abs/2506.21495 https://arxiv.org/pdf/2506.21495 https://arxiv.org/html/2506.21495
arXiv:2506.21495v1 Announce Type: new
Abstract: We investigate the effectiveness of reinforcement learning methods for finetuning large language models when transitioning from offline to semi-online to fully online regimes for both verifiable and non-verifiable tasks. Our experiments cover training on verifiable math as well as non-verifiable instruction following with a set of benchmark evaluations for both. Across these settings, we extensively compare online and semi-online Direct Preference Optimization and Group Reward Policy Optimization objectives, and surprisingly find similar performance and convergence between these variants, which all strongly outperform offline methods. We provide a detailed analysis of the training dynamics and hyperparameter selection strategies to achieve optimal results. Finally, we show that multi-tasking with verifiable and non-verifiable rewards jointly yields improved performance across both task types.
toXiv_bot_toot

@arXiv_physicsplasmph_bot@mastoxiv.page
2025-07-23 08:43:02

Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models
Aaron Ho (MIT Plasma Science and Fusion Center, Cambridge, USA), Lorenzo Zanisi (UKAEA Culham Centre for Fusion Energy, Abingdon, UK), Bram de Leeuw (Radboud University, Nijmegen, Netherlands), Vincent Galvan (MIT Plasma Science and Fusion Center, Cambridge, USA), Pablo Rodriguez-Fernandez (MIT Plasma Science and Fusion Center, Cambridge, USA), Nath…

Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models
This work demonstrates a proof-of-principle for using uncertainty-aware architectures, in combination with active learning techniques and an in-the-loop physics simulation code as a data labeller, to construct efficient datasets for data-driven surrogate model generation. Building off of a previous proof-of-principle successfully demonstrating training set reduction on static pre-labelled datasets, using the ADEPT framework, this strategy was applied again to the plasma turbulent transport prob…

@raiders@darktundra.xyz
2025-07-22 22:42:17

This Veteran Looks to Set Tone at Raiders' Training Camp https://www.si.com/nfl/raiders/news/elandon-roberts-training-camp-pete-carroll-pittsburgh-steelers

This Veteran Looks to Set Tone at Raiders' Training Camp
The Las Vegas Raiders are back and ready to improve.

@cowboys@darktundra.xyz
2025-06-13 18:03:35

Tyler Booker grateful for guidance, all set for Cowboys' training camp: 'I'm not gonna hold the offense back' https://www.dallascowboys.com/news/tyler-booker-grateful-for-guidance…

Tyler Booker grateful for guidance, all set for Cowboys' training camp: 'I'm not gonna hold the offense back'
From the moment Tyler Booker got the call from the Cowboys in the first round of the NFL Draft, he's been absorbing all the info he can to be ready for training camp and well beyond.

@arXiv_eessSY_bot@mastoxiv.page
2025-07-17 09:12:50

Inductance Estimation for High-Power Multilayer Rectangle Planar Windings
Theofilos Papadopoulos, Antonios Antonopoulos
https://arxiv.org/abs/2507.12082 ht…

Inductance Estimation for High-Power Multilayer Rectangle Planar Windings
This paper proposes a simple and accurate monomial-like equation for estimating the inductance of Multilayer Rectangle-shaped Planar Windings (MLRPWs) for high-frequency, high-power applications. The equation consists of the power product of the geometrical dimensions, raised at individual power coefficients. The coefficients are generated via Multiple Linear Regression (MLR), based on a large set of approximately 6,000 simulated windings, with an 80/20 training/evaluation sample ratio. The res…

@arXiv_eessAS_bot@mastoxiv.page
2025-05-30 09:57:31

This https://arxiv.org/abs/2505.21527 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining
Automatic speech recognition (ASR) has made remarkable progress but heavily relies on large-scale labeled data, which is scarce for low-resource languages like Vietnamese. While existing systems such as Whisper, USM, and MMS achieve promising performance, their efficacy remains inadequate in terms of training costs, latency, and accessibility. To address these issues, we propose VietASR, a novel ASR training pipeline that leverages vast amounts of unlabeled data and a small set of labeled data.…

@cowboys@darktundra.xyz
2025-06-13 19:14:23

Tyler Booker grateful for guidance, all set for Cowboys' training camp: 'I'm not gonna hold the offense back' https://www.dallascowboys.com/news/tyler-booker-grateful-for-guidance…

@arXiv_mathOC_bot@mastoxiv.page
2025-05-30 10:08:58

This https://arxiv.org/abs/2406.10065 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_mat…

@arXiv_statAP_bot@mastoxiv.page
2025-06-17 12:20:33

Enforcing tail calibration when training probabilistic forecast models
Jakob Benjamin Wessel, Maybritt Schillinger, Frank Kwasniok, Sam Allen
https://arxiv.org/abs/2506.13687

Enforcing tail calibration when training probabilistic forecast models
Probabilistic forecasts are typically obtained using state-of-the-art statistical and machine learning models, with model parameters estimated by optimizing a proper scoring rule over a set of training data. If the model class is not correctly specified, then the learned model will not necessarily issue forecasts that are calibrated. Calibrated forecasts allow users to appropriately balance risks in decision making, and it is particularly important that forecast models issue calibrated predicti…

@arXiv_csCL_bot@mastoxiv.page
2025-07-25 10:13:12

System Report for CCL25-Eval Task 10: SRAG-MAV for Fine-Grained Chinese Hate Speech Recognition
Jiahao Wang, Ramen Liu, Longhui Zhang, Jing Li
https://arxiv.org/abs/2507.18580 h…

System Report for CCL25-Eval Task 10: SRAG-MAV for Fine-Grained Chinese Hate Speech Recognition
This paper presents our system for CCL25-Eval Task 10, addressing Fine-Grained Chinese Hate Speech Recognition (FGCHSR). We propose a novel SRAG-MAV framework that synergistically integrates task reformulation(TR), Self-Retrieval-Augmented Generation (SRAG), and Multi-Round Accumulative Voting (MAV). Our method reformulates the quadruplet extraction task into triplet extraction, uses dynamic retrieval from the training set to create contextual prompts, and applies multi-round inference with vot…

@cowboys@darktundra.xyz
2025-07-16 22:21:27

2025 Training Camp Preview: 4 Cowboys with a ton to prove include two pending free agents https://cowboyswire.usatoday.com/story/sports/nfl/cowboys/2025/07/16/4-cowboys-training-camp/85207237007/

2025 Training Camp Preview: 4 Cowboys with a ton to prove include two pending free agents
As the Dallas Cowboys are set to open their 2025 training camp, here are four players with the most to prove in the upcoming season.

@arXiv_physicsmedph_bot@mastoxiv.page
2025-06-19 09:59:17

Improved Image Reconstruction and Diffusion Parameter Estimation Using a Temporal Convolutional Network Model of Gradient Trajectory Errors
Jonathan B. Martin, Hannah E. Alderson, John C. Gore, Mark D. Does, Kevin D. Harkins
https://arxiv.org/abs/2506.14995

Improved Image Reconstruction and Diffusion Parameter Estimation Using a Temporal Convolutional Network Model of Gradient Trajectory Errors
Summary: Errors in gradient trajectories introduce significant artifacts and distortions in magnetic resonance images, particularly in non-Cartesian imaging sequences, where imperfect gradient waveforms can greatly reduce image quality. Purpose: Our objective is to develop a general, nonlinear gradient system model that can accurately predict gradient distortions using convolutional networks. Methods: A set of training gradient waveforms were measured on a small animal imaging system, and used …

@arXiv_csNE_bot@mastoxiv.page
2025-06-17 09:46:40

A Synthetic Pseudo-Autoencoder Invites Examination of Tacit Assumptions in Neural Network Design
Assaf Marron
https://arxiv.org/abs/2506.12076 https://

A Synthetic Pseudo-Autoencoder Invites Examination of Tacit Assumptions in Neural Network Design
We present a handcrafted neural network that, without training, solves the seemingly difficult problem of encoding an arbitrary set of integers into a single numerical variable, and then recovering the original elements. While using only standard neural network operations -- weighted sums with biases and identity activation -- we make design choices that challenge common notions in this area around representation, continuity of domains, computation, learnability and more. For example, our const…

@tiotasram@kolektiva.social
2025-07-31 16:25:48

LLM coding is the opposite of DRY
An important principle in software engineering is DRY: Don't Repeat Yourself. We recognize that having the same code copied in more than one place is bad for several reasons:
1. It makes the entire codebase harder to read.
2. It increases maintenance burden, since any problems in the duplicated code need to be solved in more than one place.
3. Because it becomes possible for the copies to drift apart if changes to one aren't transferred to the other (maybe the person making the change has forgotten there was a copy) it makes the code more error-prone and harder to debug.
All modern programming languages make it almost entirely unnecessary to repeat code: we can move the repeated code into a "function" or "module" and then reference it from all the different places it's needed. At a larger scale, someone might write an open-source "library" of such functions or modules and instead of re-implementing that functionality ourselves, we can use their code, with an acknowledgement. Using another person's library this way is complicated, because now you're dependent on them: if they stop maintaining it or introduce bugs, you've inherited a problem, but still, you could always copy their project and maintain your own version, and it would be not much more work than if you had implemented stuff yourself from the start. It's a little more complicated than this, but the basic principle holds, and it's a foundational one for software development in general and the open-source movement in particular. The network of "citations" as open-source software builds on other open-source software and people contribute patches to each others' projects is a lot of what makes the movement into a community, and it can lead to collaborations that drive further development. So the DRY principle is important at both small and large scales.
Unfortunately, the current crop of hyped-up LLM coding systems from the big players are antithetical to DRY at all scales:
- At the library scale, they train on open source software but then (with some unknown frequency) replicate parts of it line-for-line *without* any citation [1]. The person who was using the LLM has no way of knowing that this happened, or even any way to check for it. In theory the LLM company could build a system for this, but it's not likely to be profitable unless the courts actually start punishing these license violations, which doesn't seem likely based on results so far and the difficulty of finding out that the violations are happening. By creating these copies (and also mash-ups, along with lots of less-problematic stuff), the LLM users (enabled and encouraged by the LLM-peddlers) are directly undermining the DRY principle. If we see what the big AI companies claim to want, which is a massive shift towards machine-authored code, DRY at the library scale will effectively be dead, with each new project simply re-implementing the functionality it needs instead of every using a library. This might seem to have some upside, since dependency hell is a thing, but the downside in terms of comprehensibility and therefore maintainability, correctness, and security will be massive. The eventual lack of new high-quality DRY-respecting code to train the models on will only make this problem worse.
- At the module & function level, AI is probably prone to re-writing rather than re-using the functions or needs, especially with a workflow where a human prompts it for many independent completions. This part I don't have direct evidence for, since I don't use LLM coding models myself except in very specific circumstances because it's not generally ethical to do so. I do know that when it tries to call existing functions, it often guesses incorrectly about the parameters they need, which I'm sure is a headache and source of bugs for the vibe coders out there. An AI could be designed to take more context into account and use existing lookup tools to get accurate function signatures and use them when generating function calls, but even though that would probably significantly improve output quality, I suspect it's the kind of thing that would be seen as too-baroque and thus not a priority. Would love to hear I'm wrong about any of this, but I suspect the consequences are that any medium-or-larger sized codebase written with LLM tools will have significant bloat from duplicate functionality, and will have places where better use of existing libraries would have made the code simpler. At a fundamental level, a principle like DRY is not something that current LLM training techniques are able to learn, and while they can imitate it from their training sets to some degree when asked for large amounts of code, when prompted for many smaller chunks, they're asymptotically likely to violate it.
I think this is an important critique in part because it cuts against the argument that "LLMs are the modern compliers, if you reject them you're just like the people who wanted to keep hand-writing assembly code, and you'll be just as obsolete." Compilers actually represented a great win for abstraction, encapsulation, and DRY in general, and they supported and are integral to open source development, whereas LLMs are set to do the opposite.
[1] to see what this looks like in action in prose, see the example on page 30 of the NYTimes copyright complaint against OpenAI (#AI #GenAI #LLMs #VibeCoding

@cowboys@darktundra.xyz
2025-07-27 16:01:51

Cowboys give $52M, 4-year deal to player who 'set a new record for worst' TE season ever https://cowboyswire.usatoday.com/story/sports/nfl/cowboys/2025/07/27/cowboys-te-jake-ferguson-agree-to-4-…

Cowboys give $52M, 4-year deal to player who 'set a new record for worst' TE season ever
Striking while iron's cold, Cowboys lock up TE Jake Ferguson with 4-year extension as he appears to be back to old self so far in 2025 training camp.

@raiders@darktundra.xyz
2025-07-15 11:03:50

Who Will Be Raiders Breakout Star? https://www.si.com/nfl/raiders/training-camp-las-vegas-jakorian-bennett-pete-carroll

Who Will Be Raiders Breakout Star?
Las Vegas Raiders cornerback Jakorian Bennett is set to be one of the starting cornerbacks for the Raiders. He will be the veteran of the group, and he is ready for that next step.

@arXiv_mathOC_bot@mastoxiv.page
2025-07-23 09:12:02

Learning Acceleration Algorithms for Fast Parametric Convex Optimization with Certified Robustness
Rajiv Sambharya, Jinho Bok, Nikolai Matni, George Pappas
https://arxiv.org/abs/2507.16264

Learning Acceleration Algorithms for Fast Parametric Convex Optimization with Certified Robustness
We develop a machine-learning framework to learn hyperparameter sequences for accelerated first-order methods (e.g., the step size and momentum sequences in accelerated gradient descent) to quickly solve parametric convex optimization problems with certified robustness. We obtain a strong form of robustness guarantee -- certification of worst-case performance over all parameters within a set after a given number of iterations -- through regularization-based training. The regularization term is …

@arXiv_eessIV_bot@mastoxiv.page
2025-06-13 08:56:10

Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches
Andrea Moglia (Politecnico di Milano), Matteo Leccardi (Politecnico di Milano), Matteo Cavicchioli (Politecnico di Milano), Alice Maccarini (Universit\`a di Pavia), Marco Marcon (Politecnico di Milano), Luca Mainardi (Politecnico di Milano), Pietro Cerveri (Politecnico di Milano, Universit\`a di Pavia)

Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches
Following the successful paradigm shift of large language models, leveraging pre-training on a massive corpus of data and fine-tuning on different downstream tasks, generalist models have made their foray into computer vision. The introduction of Segment Anything Model (SAM) set a milestone on segmentation of natural images, inspiring the design of a multitude of architectures for medical image segmentation. In this survey we offer a comprehensive and in-depth investigation on generalist models…

@arXiv_eessAS_bot@mastoxiv.page
2025-07-21 08:08:00

Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition
Cheng-Hung Hu, Yusuke Yasud, Akifumi Yoshimoto, Tomoki Toda
https://arxiv.org/abs/2507.13626

Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition
Speech Quality Assessment (SQA) and Continuous Speech Emotion Recognition (CSER) are two key tasks in speech technology, both relying on listener ratings. However, these ratings are inherently biased due to individual listener factors. Previous approaches have introduced a mean listener scoring scale and modeled all listener scoring scales in the training set. However, the mean listener approach is prone to distortion from averaging ordinal data, leading to potential biases. Moreover, learning …

@raiders@darktundra.xyz
2025-07-23 02:41:47

“You Either Have A Philosophy Or You Don’t”—Raiders HC Pete Carroll On ‘Competition Wednesday’ https://raiderramble.com/2025/07/22/you-either-have-a-philosophy-or-you-dont-raiders-hc-pete-carroll-on-competition-wednesda…

"You Either Have A Philosophy Or You Don't"—Raiders HC Pete Carroll
Raiders head coach Pete Carroll shed light on what "Competition Wednesday" is as Las Vegas gets set for training camp.

@cowboys@darktundra.xyz
2025-07-24 19:03:58

Schottenheimer on fight in Cowboys' camp: 'We have to have discipline' https://www.dallascowboys.com/news/schottenheimer-on-fight-in-cowboys-camp-we-have-to-have-discipline

Schottenheimer on fight in Cowboys' camp: 'We have to have discipline'
Things got heated during the second practice of Dallas Cowboys' training camp, and an infuriated Brian Schottenheimer set the stage thereafter for what he demands.

@cowboys@darktundra.xyz
2025-06-19 17:55:31

Comparing the Cowboys’ revamped defensive line against the NFC East https://insidethestar.com/comparing-the-cowboys-revamped-defensive-line-against-the-nfc-east

Comparing the Cowboys’ revamped defensive line against the NFC East
The Dallas Cowboys will head west for training camp next month with 16 players on the defensive line. Two of the four starting positions appear to be set.

Tootfinder

Opt-in global Mastodon full text search. Join the index!