Tootfinder

@grifferz@social.bitfolk.com
2025-07-22 21:12:30

Have I missed something or is there no way with openssh to exempt certain netblocks from the MaxStartups setting?
There's PerSourcePenaltyExemptList but that's spoecifically for the "penalties" things which are separate from MaxStartups, right?
https://

@arXiv_quantph_bot@mastoxiv.page
2025-09-22 09:57:41

Robust self-testing of quantum steering assemblages via operator inequalities
Beata Zjawin
https://arxiv.org/abs/2509.15699 https://arxiv.org/pdf/2509.1569…

Robust self-testing of quantum steering assemblages via operator inequalities
Robust self-testing provides a framework for certifying quantum resources under experimental imperfections. Improving robustness bounds for quantum resources such as quantum states, steering assemblages, and measurements is a constant effort that ensures relevance in the experimental realm. Despite progress in analytic self-testing methods for quantum states and measurements, extending these techniques to device-independent certification of steering assemblages has remained an open challenge, w…

@arXiv_csAI_bot@mastoxiv.page
2025-09-22 08:03:21

Stress Testing Deliberative Alignment for Anti-Scheming Training
Bronson Schoen, Evgenia Nitishinskaya, Mikita Balesni, Axel H{\o}jmark, Felix Hofst\"atter, J\'er\'emy Scheurer, Alexander Meinke, Jason Wolfe, Teun van der Weij, Alex Lloyd, Nicholas Goldowsky-Dill, Angela Fan, Andrei Matveiakin, Rusheb Shah, Marcus Williams, Amelia Glaese, Boaz Barak, Wojciech Zaremba, Marius Hobbhahn

Stress Testing Deliberative Alignment for Anti-Scheming Training
Highly capable AI systems could secretly pursue misaligned goals -- what we call "scheming". Because a scheming AI would deliberately try to hide its misaligned goals and actions, measuring and mitigating scheming requires different strategies than are typically used in ML. We propose that assessing anti-scheming interventions requires at least (1) testing propensity to scheme on far out-of-distribution (OOD) tasks, (2) evaluating whether lack of scheming is driven by situational awareness, and…

@Techmeme@techhub.social
2025-08-19 15:55:45

Functionize, which offers a cloud platform that uses AI to speed up software testing, raised a $41M Series B, bringing its total funding to $67M (Maria Deutscher/SiliconANGLE)
https://siliconangle.com/2025/08/19/functionize-nabs-41m-speed-software-testing…

Functionize nabs $41M to speed up software testing with AI - SiliconANGLE
Functionize nabs $41M to speed up software testing with AI - SiliconANGLE

@arXiv_csPL_bot@mastoxiv.page
2025-08-21 07:38:39

Tuning Random Generators: Property-Based Testing as Probabilistic Programming
Ryan Tjoa, Poorva Garg, Harrison Goldstein, Todd Millstein, Benjamin Pierce, Guy Van den Broeck
https://arxiv.org/abs/2508.14394

Tuning Random Generators: Property-Based Testing as Probabilistic Programming
Property-based testing validates software against an executable specification by evaluating it on randomly generated inputs. The standard way that PBT users generate test inputs is via generators that describe how to sample test inputs through random choices. To achieve a good distribution over test inputs, users must tune their generators, i.e., decide on the weights of these individual random choices. Unfortunately, it is very difficult to understand how to choose individual generator weights…

@benb@osintua.eu
2025-08-19 13:15:57

Joint training and NUCLEAR testing: what are Russia and Belarus preparing? #shorts: https://benborges.xyz/2025/08/19/joint-training-and-nuclear-testing.html

@arXiv_csSE_bot@mastoxiv.page
2025-07-21 08:46:40

Testing Autonomous Driving Systems -- What Really Matters and What Doesn't
Changwen Li, Joseph Sifakis, Rongjie Yan, Jian Zhang
https://arxiv.org/abs/2507.13661

Testing Autonomous Driving Systems -- What Really Matters and What Doesn't
Despite extensive research, the testing of autonomous driving systems (ADS) landscape remains fragmented, and there is currently no basis for an informed technical assessment of the importance and contribution of the current state of the art. This paper attempts to address this problem by exploring two complementary aspects. First, it proposes a framework for comparing existing test methods in terms of their intrinsic effectiveness and validity. It shows that many methods do not meet both of …

@arXiv_statME_bot@mastoxiv.page
2025-07-22 11:11:30

Testing Homogeneity in a heteroscedastic contaminated normal mixture
Xiaoqing Niu, Pengfei Li, Yuejiao Fu
https://arxiv.org/abs/2507.15630 https://

Testing Homogeneity in a heteroscedastic contaminated normal mixture
Large-scale simultaneous hypothesis testing appears in many areas such as microarray studies, genome-wide association studies, brain imaging, disease mapping and astronomical surveys. A well-known inference method is to control the false discovery rate. One popular approach is to model the $z$-scores derived from the individual $t$-tests and then use this model to control the false discovery rate. We propose a new class of contaminated normal mixtures for modelling $z$-scores. We further design…

@vyskocilm@witter.cz
2025-09-19 09:42:12

Finally tried "testing/synctest" and I must say this is A PERFECT way of a writing tests for async and cancelable code. No more flaky checking of random time ranges. No more selecting timeouts small enough to not kill the testing at all, but big enough that they'll work on a busy CI machine.
I put sleeps 1s, 2s, 5s, 10s in a loop and the test itself finishes in 0s. All asserts are for the exact time too.

Testing Time (and other asynchronicities) - The Go Programming Language
A discussion of testing asyncronous code and an exploration of the `testing/synctest` package. Based on the GopherCon Europe 2025 talk with the same title.

@arXiv_csDS_bot@mastoxiv.page
2025-07-22 08:55:00

Characterizing and Testing Configuration Stability in Two-Dimensional Threshold Cellular Automata
Yonatan Nakar, Dana Ron
https://arxiv.org/abs/2507.14569 …

Characterizing and Testing Configuration Stability in Two-Dimensional Threshold Cellular Automata
We consider the problems of characterizing and testing the stability of cellular automata configurations that evolve on a two-dimensional torus according to threshold rules with respect to the von-Neumann neighborhood. While stable configurations for Threshold-1 (OR) and Threshold-5 (AND) are trivial (and hence easily testable), the other threshold rules exhibit much more diverse behaviors. We first characterize the structure of stable configurations with respect to the Threshold-2 (similarly, …

@arXiv_csCL_bot@mastoxiv.page
2025-08-22 10:16:01

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
Ming Yin, Dinghan Shen, Silei Xu, Jianbing Han, Sixun Dong, Mian Zhang, Yebowen Hu, Shujian Liu, Simin Ma, Song Wang, Sathish Reddy Indurthi, Xun Wang, Yiran Chen, Kaiqiang Song
https://arxiv.org/abs/2508.15760

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
Tool calling has emerged as a critical capability for AI agents to interact with the real world and solve complex tasks. While the Model Context Protocol (MCP) provides a powerful standardized framework for tool integration, there is a significant gap in benchmarking how well AI agents can effectively solve multi-step tasks using diverse MCP tools in realistic, dynamic scenarios. In this work, we present LiveMCP-101, a benchmark of 101 carefully curated real-world queries, refined through itera…

@seeingwithsound@mas.to
2025-09-19 16:29:37

HUN-REN RCNS and its consortium receive prestigious European funding for the first human testing of a prosthesis designed to restore vision https://www.ttk.hun-ren.hu/en/recent-news/

HUN-REN RCNS and its Consortium Receive Prestigious European Funding for the First Human Testing of a Prosthesis Designed to Restore Vision
Over the past five years, the Belgian startup ReVision Implant has been developing a groundbreaking brain prosthesis that aims to partially restore vision to the blind. A consortium was formed to carry out the first human testing of the technology, consisting of the manufacturing startup and three research groups, including the

@thomastraynor@social.linux.pizza
2025-09-22 16:06:01

Hmmm, I wonder how long before it leaves beta? Almost the time when I nuke WIN10 and upgrade.
https://mxlinux.org/blog/mx-25-infinity-beta-1-isos-now-available-for-testing-purposes/

@arXiv_quantph_bot@mastoxiv.page
2025-08-22 10:01:21

Robust Self-Testing of Multiqudit Supersinglet Slater States via Constant Number of Binary Measurements
Arturo Konderak, Wojciech Bruzda, Remigiusz Augusiak
https://arxiv.org/abs/2508.15546

Robust Self-Testing of Multiqudit Supersinglet Slater States via Constant Number of Binary Measurements
Self-testing is a powerful device-independent technique that enables one to deduce the forms of both the quantum state and the measurements involved in a physical experiment based solely on observed correlations. Although numerous schemes for self-testing multipartite entangled states have been proposed, they are typically difficult to implement experimentally, as their complexity increases significantly with the number of subsystems or the local dimension. In this work, we introduce the first …

@arXiv_econEM_bot@mastoxiv.page
2025-07-22 09:09:10

Testing Clustered Equal Predictive Ability with Unknown Clusters
Oguzhan Akgun, Alain Pirotte, Giovanni Urga, Zhenlin Yang
https://arxiv.org/abs/2507.14621

Testing Clustered Equal Predictive Ability with Unknown Clusters
We develop new tests of clustered equal predictive ability (C-EPA) in panels where the clusters are unknown and estimated by a Panel Kmeans algorithm. This algorithm differs from the standard Kmeans algorithm by employing the time series variation of the panel rather than relying merely on time averages of observations. To address the challenge of testing hypotheses that depend on data-driven cluster estimates, we adopt a selective conditional inference framework. Specifically, we derive a Wald…

@arXiv_physicsinsdet_bot@mastoxiv.page
2025-07-21 08:24:00

A Simple Apparatus for Testing PMT Humidity Tolerance
A. Germer, K. Park, C. Skuse, C. Yang, D. S. Parno
https://arxiv.org/abs/2507.13545 https://

A Simple Apparatus for Testing PMT Humidity Tolerance
We report on a low-cost apparatus to extend a photomultiplier tube (PMT) testing setup to operations at high humidity and/or at an elevated temperature. This setup allows a determination of whether a PMT can successfully operate for an extended period of time in a high-humidity environment, such as the waterline of a water Cherenkov detector.

@Dragofix@veganism.social
2025-07-18 01:08:44

Brazil’s Chamber of Deputies Approves Bill Banning Cosmetic Testing on Live Vertebrates https://vegconomist.com/politics-law/brazil-approves-bill-banning-cosmetic-testing-live-vertebrates/

Brazil's Chamber of Deputies Approves Bill Banning Cosmetic Testing on Live Vertebrates - vegconomist - the vegan business magazine
Brazil's Chamber of Deputies has approved the Senate's substitute text for a bill that would ban the testing of cosmetics, personal hygiene products, and

@arXiv_hepph_bot@mastoxiv.page
2025-08-21 08:27:10

Testing the dark side of neutrino oscillations with the solar neutrino fog at Dark Matter experiments
Julia Gehrlein, Tanmay Kushwaha
https://arxiv.org/abs/2508.14166 https://…

Testing the dark side of neutrino oscillations with the solar neutrino fog at Dark Matter experiments
The recent detection of the solar neutrino background at Dark Matter direct detection experiments paves the way to fully explore an important degeneracy in neutrino oscillations in the presence of new interactions, named the LMA-Dark degeneracy. This degeneracy makes it impossible to determine the neutrino mass ordering in oscillation experiments if neutrinos have new vectorial interactions with matter. As the composition of solar neutrinos at the Earth consists of all three neutrino flavors, t…

@Mediagazer@mstdn.social
2025-07-21 20:45:42

Source: Netflix is using Runway AI's video generation tools for production; Disney is testing out the tools and talked with Runway about possible uses for them (Rachel Metz/Bloomberg)
https://www.bloomberg.com/news/articles/20

@fanf@mendeddrum.org
2025-08-17 11:42:03

from my link log —
Mix-testing: revealing a new class of compiler concurrency bugs.
https://johnwickerson.wordpress.com/2024/06/28/mix-testing-revealing-a-new-class-of-compiler-bugs/
saved 2024-06-29

Mix-testing: revealing a new class of compiler bugs
I’m delighted that Luke Geeson’s work on “mix testing” (a collaboration with James Brotherston, Wilco Dijkstra, Alastair Donaldson, Lee Smith, Tyler Sorensen, and myself) will app…

@arXiv_qbioPE_bot@mastoxiv.page
2025-07-21 08:09:10

Methodological considerations for semialgebraic hypothesis testing with incomplete U-statistics
David Barnhill, Marina Garrote-L\'opez, Elizabeth Gross, Max Hill, Bryson Kagy, John A. Rhodes, Joy Z. Zhang
https://arxiv.org/abs/2507.13531

Methodological considerations for semialgebraic hypothesis testing with incomplete U-statistics
Recently, Sturma, Drton, and Leung proposed a general-purpose stochastic method for hypothesis testing in models defined by polynomial equality and inequality constraints. Notably, the method remains theoretically valid even near irregular points, such as singularities and boundaries, where traditional testing approaches often break down. In this paper, we evaluate its practical performance on a collection of biologically motivated models from phylogenetics. While the method performs remarkably…

@davidaugust@mastodon.online
2025-07-17 19:27:57

President Epstein List has an insufficiency? Sounds right.
#USpol

Trump underwent medical testing for leg swelling, hand bruises: White House
The testing showed results in the normal limit, Leavitt said.

@arXiv_astrophCO_bot@mastoxiv.page
2025-07-21 09:15:10

Testing Signatures of Phantom Crossing through Full-Shape Galaxy Clustering Analysis
Emanuelly Silva, Rafael C. Nunes
https://arxiv.org/abs/2507.13989 http…

Testing Signatures of Phantom Crossing through Full-Shape Galaxy Clustering Analysis
Recent observations of baryon acoustic oscillations (BAO) from the Dark Energy Spectroscopic Instrument (DESI) survey, when combined with measurements of the cosmic microwave background (CMB) and Type Ia supernovae (SNIa), provide compelling evidence for a phantom crossing at late times, along with statistically significant deviations from the standard $Λ$CDM model. In this work, we investigate the role of redshift-space galaxy clustering data by employing the pre-reconstruction full-shape (FS…

@arXiv_astrophEP_bot@mastoxiv.page
2025-07-21 08:24:30

Testing the Origin of Hot Jupiters with Ariel
Lina D'Aoust, Ben Coull-Neveu, Eve J. Lee, Nicolas B. Cowan
https://arxiv.org/abs/2507.13446 https://

Testing the Origin of Hot Jupiters with Ariel
In spite of their long detection history, the origin of hot Jupiters remains to be resolved. While multiple dynamical evidence suggests high-eccentricity migration is most likely, conflicts remain when considering hot Jupiters as a population in the context of warm and cold Jupiters. Here, we turn to atmospheric signatures as an alternative mean to test the origin theory of hot Jupiters, focusing on population level trends that arise from post-formation pollution, motivated by the upcoming Arie…

@karlauerbach@sfba.social
2025-07-23 00:54:15

Wow, yet another Thunderbird update.
It seems that Thunderbird, which ought to be rather stable by now, is getting updated more often than Chrome.
Either somebody is writing crummy code, not testing, or has a low acceptance hurdle for proposed "enhancements."

@arXiv_csSE_bot@mastoxiv.page
2025-07-22 11:22:20

Deep Learning Framework Testing via Heuristic Guidance Based on Multiple Model Measurements
Yinglong Zou, Juan Zhai, Chunrong Fang, Yanzhou Mu, Jiawei Liu, Zhenyu Chen
https://arxiv.org/abs/2507.15181

Deep Learning Framework Testing via Heuristic Guidance Based on Multiple Model Measurements
Deep learning frameworks serve as the foundation for developing and deploying deep learning applications. To enhance the quality of deep learning frameworks, researchers have proposed numerous testing methods using deep learning models as test inputs. However, existing methods predominantly measure model bug detection effectiveness as heuristic indicators, presenting three critical limitations: Firstly, existing methods fail to quantitatively measure model's operator combination variety, potent…

@arXiv_astrophIM_bot@mastoxiv.page
2025-08-21 09:17:09

Development and testing of integrated readout electronics for next generation SiSeRO (Single electron Sensitive Read Out) devices
Tanmoy Chattopadhyay, Haley R. Stueber, Abigail Y. Pan, Sven Herrmann, Peter Orel, Kevan Donlon, Steven W. Allen, Marshall W. Bautz, Michael Cooper, Catherine E. Grant, Beverly LaMarr, Christopher Leitz, Andrew Malonis, Eric D. Miller, R. Glenn Morris, Gregory Prigozhin, Ilya Prigozhin, Artem Poliszczuk, Keith Warner, Daniel R. Wilkins

Development and testing of integrated readout electronics for next generation SiSeRO (Single electron Sensitive Read Out) devices
The first generation of Single electron Sensitive Read Out (SiSeRO) amplifiers, employed as on-chip charge detectors for charge-coupled devices (CCDs) have demonstrated excellent noise and spectral performance: a responsivity of around 800 pA per electron, an equivalent noise charge (ENC) of 3.2 electrons root mean square (RMS), and a full width half maximum (FWHM) energy resolution of 130 eV at 5.9 keV for a readout speed of 625 Kpixel/s. Repetitive Non Destructive Readout (RNDR) has also been…

@teledyn@mstdn.ca
2025-07-22 19:29:50

Of course, I don't know 'hardware', as you can tell from my technical description, but I have a sample from another tuning peg gear, and the peg and gear for testing, I get to Home Hardware and they have loose bolts of small dimension. I quickly learn that #6 is too large, #4 is too small and they have no #5's where the thread matches.
But you know what works? Do you remember those little chrome bolts with the hex-wrench heads that used to hold expansion cards in the ibm-pc? Perfect match, only 5mm too long, easily compensated by buying a matching nut and my 53-years owned pawnshop 5-string, my first banjo, is back in action!

@gevoel@mastodon.green
2025-09-21 16:23:28

Er wordt vaak gezegd dat de Nederlandse regering Trump nadoet, maar het is andersom.
A Trump Administration Playbook: No Data, No Problem https://www.nytimes.com/2025/09/18/climate/trump-federal-data-climate-change-health.ht…

Trump Administration Stopping Efforts to Collect Scientific Data
A pattern of getting rid of statistics has emerged that echoes the president’s first term, when he suggested if the nation stopped testing for Covid, it would have few cases.

@benb@osintua.eu
2025-07-18 22:43:01

🔥 Ukraine will become a testing ground for the weapons of the future: tests right on the front line: https://benborges.xyz/2025/07/18/ukraine-will-become-a-testing.html

@arXiv_statME_bot@mastoxiv.page
2025-07-22 08:20:00

Hypothesis testing for quantitative trait locus effects in both location and scale in genetic backcross studies
Guanfu Liu, Pengfei Li, Yukun Liu, Xiaolong Pu
https://arxiv.org/abs/2507.14253

Hypothesis testing for quantitative trait locus effects in both location and scale in genetic backcross studies
Testing the existence of a quantitative trait locus (QTL) effect is an important task in QTL mapping studies. Most studies concentrate on the case where the phenotype distributions of different QTL groups follow normal distributions with the same unknown variance. In this paper we make a more general assumption that the phenotype distributions come from a location-scale distribution family. We derive the limiting distribution of the LRT for the existence of the QTL effect in both location and s…

@arXiv_csLG_bot@mastoxiv.page
2025-08-18 09:45:30

DiCriTest: Testing Scenario Generation for Decision-Making Agents Considering Diversity and Criticality
Qitong Chu, Yufeng Yue, Danya Yao, Huaxin Pei
https://arxiv.org/abs/2508.11514

DiCriTest: Testing Scenario Generation for Decision-Making Agents Considering Diversity and Criticality
The growing deployment of decision-making agents in dynamic environments increases the demand for safety verification. While critical testing scenario generation has emerged as an appealing verification methodology, effectively balancing diversity and criticality remains a key challenge for existing methods, particularly due to local optima entrapment in high-dimensional scenario spaces. To address this limitation, we propose a dual-space guided testing framework that coordinates scenario param…

@Techmeme@techhub.social
2025-08-21 12:40:49

Google rolls out AI Mode to 180 countries and territories in English, after testing in the US, UK, and India, and plans to add more languages and regions "soon" (Abner Li/9to5Google)
https://9to5google.com/2025/08/21/google-ai-mode-countries-agentic/

Google AI Mode expanding to 180 countries, getting agentic restaurant finder, more
Google is now bringing AI Mode to many more countries around the world, while AI Ultra subscribers will get to use an agentic capability.

@arXiv_csCR_bot@mastoxiv.page
2025-08-20 07:51:40

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
Yixuan Yang, Daoyuan Wu, Yufan Chen
https://arxiv.org/abs/2508.13220 https://

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
Large Language Models (LLMs) are increasingly integrated into real-world applications via the Model Context Protocol (MCP), a universal, open standard for connecting AI agents with data sources and external tools. While MCP enhances the capabilities of LLM-based agents, it also introduces new security risks and expands their attack surfaces. In this paper, we present the first systematic taxonomy of MCP security, identifying 17 attack types across 4 primary attack surfaces. We introduce MCPSecB…

@arXiv_csRO_bot@mastoxiv.page
2025-07-21 09:34:40

AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework
Yu Yao, Salil Bhatnagar, Markus Mazzola, Vasileios Belagiannis, Igor Gilitschenski, Luigi Palmieri, Simon Razniewski, Marcel Hallgarten
https://arxiv.org/abs/2507.13729

AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework
Rare, yet critical, scenarios pose a significant challenge in testing and evaluating autonomous driving planners. Relying solely on real-world driving scenes requires collecting massive datasets to capture these scenarios. While automatic generation of traffic scenarios appears promising, data-driven models require extensive training data and often lack fine-grained control over the output. Moreover, generating novel scenarios from scratch can introduce a distributional shift from the original …

@selea@social.linux.pizza
2025-08-18 10:16:05

I wish I had mämmi
#finland #testing

@cowboys@darktundra.xyz
2025-09-22 18:22:44

CeeDee Lamb injury diagnosis confirmed by Dallas Cowboys EVP Stephen Jones https://www.si.com/nfl/cowboys/news/ceedee-lamb-injury-diagnosis-confirmed-dallas-cowboys-evp-stephen-jones

CeeDee Lamb injury diagnosis confirmed by Dallas Cowboys EVP Stephen Jones
Dallas Cowboys star receiver CeeDee Lamb had his injury diagnosis confirmed, but will continue to undergo further testing.

@malik@Mastodon.Social
2025-08-18 21:58:46

Pest und Cholera in einem schönen Ebenmaß?
https://mastodon.online/@9to5Mac/115051912057222137

9to5Mac (@9to5Mac@mastodon.online)
Attached: 1 image WhatsApp starts testing AI-powered writing suggestions on iOS https://9to5mac.com/2025/08/18/whatsapp-starts-testing-ai-powered-writing-suggestions-on-ios/?utm_source=dlvr.it&utm_medium=mastodon

@arXiv_grqc_bot@mastoxiv.page
2025-08-20 09:48:00

Testing the Generalized Second Law in $(2 1)$-Dimensional Cosmology: Holographic Entropy Bounds and Observational Constraints
Praveen Kumar Dhankar, Aritra Sanyal, Safiqul Islam, Farook Rahaman, Behnam Pourhassan
https://arxiv.org/abs/2508.13227

Testing the Generalized Second Law in $(2+1)$-Dimensional Cosmology: Holographic Entropy Bounds and Observational Constraints
We investigate the validity of the Generalized Second Law (GSL) of thermodynamics in a $(2+1)$-dimensional holographic cosmological model with a negative cosmological constant. Adopting a horizon thermodynamics framework, we examine two prominent entropy bounds, the Fischler--Susskind (FS) bound and the Hubble Entropy (HE) bound, in both expanding and contracting universes, including the effects of quantum entropy corrections. Our theoretical analysis shows that the FS bound is intrinsically in…

@arXiv_statME_bot@mastoxiv.page
2025-09-22 08:49:31

Leveraging the group structure of hypotheses for more powerful multiple testing with FDR control for the filtered rejection set
Marina Bogomolov, Shinjini Nandi
https://arxiv.org/abs/2509.15444

Leveraging the group structure of hypotheses for more powerful multiple testing with FDR control for the filtered rejection set
Modern biological studies often involve testing many hypotheses organized in a group or a hierarchical structure, such as a directed acyclic graph (DAG). In these studies, researchers often wish to control the false discovery rate (FDR) after filtering the discoveries to obtain interpretable results. For addressing this goal, Katsevich, Sabatti, and Bogomolov (2023, Journal of the American Statistical Association, 118(541), 165-176) developed a general method, Focused BH, that guarantees FDR co…

@arXiv_eessIV_bot@mastoxiv.page
2025-07-22 09:51:40

A Study of Anatomical Priors for Deep Learning-Based Segmentation of Pheochromocytoma in Abdominal CT
Tanjin Taher Toma, Tejas Sudharshan Mathai, Bikash Santra, Pritam Mukherjee, Jianfei Liu, Wesley Jong, Darwish Alabyad, Vivek Batheja, Abhishek Jha, Mayank Patel, Darko Pucar, Jayadira del Rivero, Karel Pacak, Ronald M. Summers
https://

A Study of Anatomical Priors for Deep Learning-Based Segmentation of Pheochromocytoma in Abdominal CT
Accurate segmentation of pheochromocytoma (PCC) in abdominal CT scans is essential for tumor burden estimation, prognosis, and treatment planning. It may also help infer genetic clusters, reducing reliance on expensive testing. This study systematically evaluates anatomical priors to identify configurations that improve deep learning-based PCC segmentation. We employed the nnU-Net framework to evaluate eleven annotation strategies for accurate 3D segmentation of pheochromocytoma, introducing a …

@arXiv_astrophCO_bot@mastoxiv.page
2025-07-21 08:59:30

Testing the cosmic distance duality relation with baryon acoustic oscillations and supernovae data
Tian-Nuo Li, Guo-Hong Du, Peng-Ju Wu, Jing-Zhao Qi, Jing-Fei Zhang, Xin Zhang
https://arxiv.org/abs/2507.13811

Testing the cosmic distance duality relation with baryon acoustic oscillations and supernovae data
One of the most fundamental relationships in modern cosmology is the cosmic distance duality relation (CDDR), which describes the relationship between the angular diameter distance ($D_{\rm A}$) and the luminosity distance ($D_{\rm L}$), and is expressed as: $η(z)=D_{\rm L}(z)(1+z)^{-2}/D_{\rm A}(z)=1$. In this work, we conduct a comprehensive test of the CDDR by combining baryon acoustic oscillation (BAO) data from the SDSS and DESI surveys with type Ia supernova (SN) data from PantheonPlus a…

@Jeff@mastodon.opencloud.lu
2025-08-21 21:51:06

The International Institute for Strategic Studies
The Scale of Russian Sabotage Operations Against Europe’s Critical Infrastructure
19 August 2025
"IISS has created the most comprehensive open-source database of suspected and confirmed Russian sabotage operations targeting Europe."
site:

map of Europe with visualised data on #sabotage.

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/TQ0FMQ

#dataverse #harvard

The Scale of Russian Sabotage Operations Against Europe’s Critical Infrastructure
This IISS paper assesses Russia’s unconventional war on Europe, focusing on sabotage of critical infrastructure, from military sites and energy grids to communications and undersea cables, testing the resilience of European governments and societies and challenging NATO/EU deterrence.

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 11:17:40

Bayesian Optimization-based Search for Agent Control in Automated Game Testing
Carlos Celemin
https://arxiv.org/abs/2508.13121 https://arxiv.org/pdf/2508.1…

Bayesian Optimization-based Search for Agent Control in Automated Game Testing
This work introduces an automated testing approach that employs agents controlling game characters to detect potential bugs within a game level. Harnessing the power of Bayesian Optimization (BO) to execute sample-efficient search, the method determines the next sampling point by analyzing the data collected so far and calculates the data point that will maximize information acquisition. To support the BO process, we introduce a game testing-specific model built on top of a grid map, that featu…

@Techmeme@techhub.social
2025-07-21 20:45:40

@arXiv_csSE_bot@mastoxiv.page
2025-08-21 07:57:10

You Don't Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation
Yutong Bian, Xianhao Lin, Yupeng Xie, Tianyang Liu, Mingchen Zhuge, Siyuan Lu, Haoming Tang, Jinlin Wang, Jiayi Zhang, Jiaqi Chen, Xiangru Tang, Yongxin Ni, Sirui Hong, Chenglin Wu
https://arxiv.org/abs/2508.14104

You Don't Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation
Large Language Models (LLMs) and code agents in software development are rapidly evolving from generating isolated code snippets to producing full-fledged software applications with graphical interfaces, interactive logic, and dynamic behaviors. However, current benchmarks fall short in evaluating such production-ready software, as they often rely on static checks or binary pass/fail scripts, failing to capture the interactive behaviors and runtime dynamics that define real-world usability - qu…

@Sustainable2050@mastodon.energy
2025-09-19 16:22:08

Estonia's Foreign Minister:
"Russia’s increasingly extensive testing of boundaries and growing aggressiveness must be met with a swift increase in political and economic pressure."
https://www.pravda.com.ua/eng/news/2025/09/19/7531620/

Three Russian MiG-31 jets violate Estonian airspace for 12 minutes
The Estonian Foreign Ministry has summoned Russia’s chargé d’affaires in Estonia to lodge a protest and hand over a note regarding the violation of Estonian airspace that occurred on 19 September.

@arXiv_qfinTR_bot@mastoxiv.page
2025-07-22 08:10:00

A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books
Ivan Letteri
https://arxiv.org/abs/2507.14960 https://

A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books
The detection of outliers within cryptocurrency limit order books (LOBs) is of paramount importance for comprehending market dynamics, particularly in highly volatile and nascent regulatory environments. This study conducts a comprehensive comparative analysis of robust statistical methods and advanced machine learning techniques for real-time anomaly identification in cryptocurrency LOBs. Within a unified testing environment, named AITA Order Book Signal (AITA-OBS), we evaluate the efficacy of…

@arXiv_astrophHE_bot@mastoxiv.page
2025-07-22 09:46:50

Unfolding the Atmospheric Muon Flux with IceCube: Investigating Stopping Muons and High-Energy Prompt Contributions
Pascal Gutjahr (for the IceCube Collaboration), Lucas Witthaus (for the IceCube Collaboration)
https://arxiv.org/abs/2507.14525

Unfolding the Atmospheric Muon Flux with IceCube: Investigating Stopping Muons and High-Energy Prompt Contributions
Atmospheric muons produced in cosmic-ray air showers are classified as conventional muons from pion and kaon decays and prompt muons from heavy hadron decays. Conventional muons dominate at lower energies, and the prompt component becomes dominant at PeV energies and above. Precisely measuring the atmospheric muon flux from a few GeV to several PeV is valuable for advancing our understanding of cosmic-ray interactions and testing hadronic interaction models. Low-energy muons that stop within th…

@mrwedders@social.linux.pizza
2025-07-22 19:34:51

New drives have suitably pleased me after a bit of testing, time to start swapping them in... 12 hours per drive doesn't seem horrendous.

@michabbb@social.vivaldi.net
2025-09-20 06:53:09

📊 Versatile use cases include summarizing articles explaining complex concepts testing knowledge modifying recipes comparing products and making informed decisions
✍️ Get key takeaways from articles pages or discussion threads without leaving your current browsing session maintaining focus and workflow efficiency
🔍 Ask questions about content you're reading and receive relevant answers and explanations using the current page's information for accurate context

@primonatura@mstdn.social
2025-08-21 18:00:41

"How Switzerland’s Solar Train Tracks Could Reshape Renewable Energy"
#Switzerland #Trains #SolarPower #Energy

How Switzerland’s Solar Train Tracks Could Reshape Renewable Energy | Happy Eco News
Switzerland is testing solar train tracks to generate clean energy using unused rail space. These removable panels could power 300,000 homes, create jobs, and reduce land use. Countries like South Korea and Japan are watching closely. If successful, the system could offer a new way to meet global energy demands.Switzerland is testing solar train tracks to generate clean energy using unused rail space. These removable panels could power 300,000 homes, create jobs, and reduce land use. Countries …

@arXiv_mathST_bot@mastoxiv.page
2025-09-18 08:42:41

Optimal Transport Based Testing in Factorial Design
Michel Groppe, Linus Niem\"oller, Shayan Hundrieser, David Ventzke, Anna Blob, Sarah K\"oster, Axel Munk
https://arxiv.org/abs/2509.13970

Optimal Transport Based Testing in Factorial Design
We introduce a general framework for testing statistical hypotheses for probability measures supported on finite spaces, which is based on optimal transport (OT). These tests are inspired by the analysis of variance (ANOVA) and its nonparametric counterparts. They allow for testing linear relationships in factorial designs between discrete probability measures and are based on pairwise comparisons of the OT distance and corresponding barycenters. To this end, we derive under the null hypotheses…

@grahamperrin@bsd.cafe
2025-09-20 04:48:17

FreeBSD: pkgbase: major upgrades
<https://gist.github.com/grahamperrin/a58edbb8587af513a154ac01d922f611#file-pkgbase-major-upgrade-md-readme>
― a rough guide, for alpha testing on AMD64.
From <

@arXiv_csET_bot@mastoxiv.page
2025-08-22 07:44:00

Distributed Shared Layered Storage Quantum Simulator: A novel quantum simulation system for efficient scaling and cost optimization
Mingyang Yu, Haorui Yang, Donglin Wang, Desheng Kong, Ji Du, Yulong Fu, Wei Wang, Jing Xu
https://arxiv.org/abs/2508.15542

Distributed Shared Layered Storage Quantum Simulator: A novel quantum simulation system for efficient scaling and cost optimization
Quantum simulators are essential tools for developing and testing quantum algorithms. However, the high-frequency traversal characteristic of quantum simulators represents an unprecedented demand in the history of IT, and existing distributed technologies is unable to meet this requirement, resulting in a single-node bottleneck of quantum simulator. To overcome this limitation, this paper introduces a novel Distributed Shared Layered Storage Quantum Simulator (DSLSQS). By leveraging an innovati…

@Techmeme@techhub.social
2025-08-22 14:45:51

NYC grants Waymo its first permit, which extends through late September, to test up to eight of its autonomous vehicles in Manhattan and Downtown Brooklyn (Samantha Subin/CNBC)
https://www.cnbc.com/2025/08/22/waymo-permit-new-york-city-nyc-rides.html

Waymo granted first permit to begin testing autonomous vehicles in New York City
Waymo is getting one step closer to starting its engine in New York City.

@arXiv_csAI_bot@mastoxiv.page
2025-09-19 07:30:31

From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing
Lanxiao Huang, Daksh Dave, Ming Jin, Tyler Cody, Peter Beling
https://arxiv.org/abs/2509.14289

From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing
Large language models (LLMs) are increasingly used to automate or augment penetration testing, but their effectiveness and reliability across attack phases remain unclear. We present a comprehensive evaluation of multiple LLM-based agents, from single-agent to modular designs, across realistic penetration testing scenarios, measuring empirical performance and recurring failure patterns. We also isolate the impact of five core functional capabilities via targeted augmentations: Global Context Me…

@arXiv_csCR_bot@mastoxiv.page
2025-09-22 07:34:01

Synergizing Static Analysis with Large Language Models for Vulnerability Discovery and beyond
Vaibhav Agrawal, Kiarash Ahi
https://arxiv.org/abs/2509.15433 https://

Synergizing Static Analysis with Large Language Models for Vulnerability Discovery and beyond
This report examines the synergy between Large Language Models (LLMs) and Static Application Security Testing (SAST) to improve vulnerability discovery. Traditional SAST tools, while effective for proactive security, are limited by high false-positive rates and a lack of contextual understanding. Conversely, LLMs excel at code analysis and pattern recognition but can be prone to inconsistencies and hallucinations. By integrating these two technologies, a more intelligent and efficient system is…

@arXiv_csCL_bot@mastoxiv.page
2025-08-21 10:01:10

Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference
Samir Abdaljalil, Erchin Serpedin, Khalid Qaraqe, Hasan Kurban
https://arxiv.org/abs/2508.14735

Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference
Large language models (LLMs) are increasingly applied in multilingual contexts, yet their capacity for consistent, logically grounded alignment across languages remains underexplored. We present a controlled evaluation framework for multilingual natural language inference (NLI) that generates synthetic, logic-based premise-hypothesis pairs and translates them into a typologically diverse set of languages. This design enables precise control over semantic relations and allows testing in both mon…

@arXiv_csRO_bot@mastoxiv.page
2025-07-22 11:14:10

Robots for Kiwifruit Harvesting and Pollination
Jamie Bell
https://arxiv.org/abs/2507.15484 https://arxiv.org/pdf/2507.15484

Robots for Kiwifruit Harvesting and Pollination
This research was a part of a project that developed mobile robots that performed targeted pollen spraying and automated harvesting in pergola structured kiwifruit orchards. Multiple kiwifruit detachment mechanisms were designed and field testing of one of the concepts showed that the mechanism could reliably pick kiwifruit. Furthermore, this kiwifruit detachment mechanism was able to reach over 80 percent of fruit in the cluttered kiwifruit canopy, whereas the previous state of the art mechani…

@arXiv_statME_bot@mastoxiv.page
2025-07-22 08:14:40

On the Testing of complete causal mediation and its applications
Yichin Tsai, Wan-Tzu Chang, Jia Jyun Sie, Cathy SJ Fann, Iebin Lian
https://arxiv.org/abs/2507.14246

On the Testing of complete causal mediation and its applications
The Complete Mediation Test (CMT) serves as a specialized approach of mediation analysis to assess whether an independent variable A, influences an outcome variable Y exclusively through a mediator M, without any direct effect. An application of CMT lies in Mendelian Randomization (MR) studies, where it can be used to investigate non-pleiotropy, that is, to test whether genetic variants impact a disease outcome solely through their effect on a target exposure variable. Traditionally, CMT has re…

@arXiv_csSE_bot@mastoxiv.page
2025-08-19 10:11:40

RUM: Rule LLM-Based Comprehensive Assessment on Testing Skills
Yue Wang, Zhenyu Chen, Yuan Zhao, Chunrong Fang, Ziyuan Wang, Song Huang
https://arxiv.org/abs/2508.12922 https://…

RUM: Rule+LLM-Based Comprehensive Assessment on Testing Skills
Over the past eight years, the META method has served as a multidimensional testing skill assessment system in the National College Student Contest on Software Testing, successfully assessing over 100,000 students' testing skills. However, META is primarily limited to the objective assessment of test scripts, lacking the ability to automatically assess subjective aspects such as test case and test report. To address this limitation, this paper proposes RUM, a comprehensive assessment approach t…

@arXiv_eessIV_bot@mastoxiv.page
2025-07-22 08:49:30

NuSeC: A Dataset for Nuclei Segmentation in Breast Cancer Histopathology Images
Refik Samet, Nooshin Nemati, Emrah Hancer, Serpil Sak, Bilge Ayca Kirmizi
https://arxiv.org/abs/2507.14272

NuSeC: A Dataset for Nuclei Segmentation in Breast Cancer Histopathology Images
The NuSeC dataset is created by selecting 4 images with the size of 1024*1024 pixels from the slides of each patient among 25 patients. Therefore, there are a total of 100 images in the NuSeC dataset. To carry out a consistent comparative analysis between the methods that will be developed using the NuSeC dataset by the researchers in the future, we divide the NuSeC dataset 75% as the training set and 25% as the testing set. In detail, an image is randomly selected from 4 images of each patient…

@benb@osintua.eu
2025-09-16 13:28:33

Testing NATO, Russia's 'salami slicing tactics' now threaten Poland, Baltic States: https://benborges.xyz/2025/09/16/testing-nato-russias-salami-slicing.html

Tracking information about the Russian War against Ukraine — Support the OSINT Ukraine Archive the 🇷🇺 War against Ukraine 🇺🇦
Tracking information about the Russian War against Ukraine

@arXiv_grqc_bot@mastoxiv.page
2025-07-22 10:58:50

Constraint on gravitational-wave polarizations for space-based detectors with time-delay interferometry
Tong Jiang, Chunyu Zhang
https://arxiv.org/abs/2507.14870

Constraint on gravitational-wave polarizations for space-based detectors with time-delay interferometry
Probing extra polarizations in gravitational waves (GWs) with space-based detectors is the most direct method for testing theories of gravity. In this paper, by employing the second-generation time-delay interferometry (TDI) to cancel out the laser frequency noise in a rotating and flexing configuration with arm lengths varying linearly in time, we study the detectors' constraint ability on extra polarizations, and explore the impacts of TDI on the constraint of polarizations. Working in the pa…

@arXiv_quantph_bot@mastoxiv.page
2025-08-20 10:10:00

Adversarially robust quantum state learning and testing
Maryam Aliakbarpour, Vladimir Braverman, Nai-Hui Chia, Yuhan Liu
https://arxiv.org/abs/2508.13959 https://

Adversarially robust quantum state learning and testing
Quantum state learning is a fundamental problem in physics and computer science. As near-term quantum devices are error-prone, it is important to design error-resistant algorithms. Apart from device errors, other unexpected factors could also affect the algorithm, such as careless human read-out error, or even a malicious hacker deliberately altering the measurement results. Thus, we want our algorithm to work even in the worst case when things go against our favor. We consider the practical …

@Techmeme@techhub.social
2025-07-16 14:40:48

Microsoft begins testing a Windows 11 feature for sharing the entire desktop with Copilot Vision; it requires first entering a special mode in the Copilot app (Zac Bowden/Windows Central)
https://www.

Microsoft begins testing sharing your desktop with Copilot on Windows 11 — allows AI to view and chat about what's on your screen
First announced earlier this year, sharing your desktop with Copilot on Windows 11 is now in testing with insiders across all preview channels.

@Dragofix@veganism.social
2025-08-20 11:05:06

Brazil Bans Animal-Tested Cosmetics in Major Win for Compassion https://www.speciesunite.com/petition-updates/brazil-bans-animal-tested-cosmetics-in-major-win-for-compassion?utm_source=Mastodon

Brazil Bans Animal-Tested Cosmetics in Major Win for Compassion — Species Unite
On July 30, 2025, Brazil’s president, Luiz Inácio Lula da Silva, approved a ban on animal testing for cosmetics, personal hygiene products, and perfumes in the major South American nation. The bill was signed into law during a ceremony attended by Marina Silva, Brazil’s Minister of the Environment a

@arXiv_statME_bot@mastoxiv.page
2025-07-22 10:59:20

Multiple Hypothesis Testing To Estimate The Number Of Communities in Stochastic Block Models
Chetkar Jha, Mingyao Li, Ian Barnett
https://arxiv.org/abs/2507.15471

Multiple Hypothesis Testing To Estimate The Number Of Communities in Stochastic Block Models
Clustering of single-cell RNA sequencing (scRNA-seq) datasets can give key insights into the biological functions of cells. Therefore, it is not surprising that network-based community detection methods (one of the better clustering methods) are increasingly being used for the clustering of scRNA-seq datasets. The main challenge in implementing network-based community detection methods for scRNA-seq datasets is that these methods \emph{apriori} require the true number of communities or blocks f…

@Mediagazer@mstdn.social
2025-08-19 18:45:41

Testing AI tools for journalism work: LLMs worked well for short summaries but poorly for long ones, and scientific research tools lacked depth and consistency (Hilke Schellmann/Columbia Journalism Review)
https://www.cjr.org/analysis/i-tested-how-well-ai-tool…

I Tested How Well AI Tools Work for Journalism
Some tools were sufficient for summarizing meetings. For research, the results were a disaster.

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 11:12:10

EvolMathEval: Towards Evolvable Benchmarks for Mathematical Reasoning via Evolutionary Testing
Shengbo Wang, Mingwei Liu, Zike Li, Anji Li, Yanlin Wang, Xin Peng, Zibin Zheng
https://arxiv.org/abs/2508.13003

EvolMathEval: Towards Evolvable Benchmarks for Mathematical Reasoning via Evolutionary Testing
The rapid advancement of LLMs poses a significant challenge to existing mathematical reasoning benchmarks. These benchmarks commonly suffer from issues such as score saturation, temporal decay, and data contamination. To address this challenge, this paper introduces EvolMathEval, an automated mathematical benchmark generation and evolution framework based on evolutionary testing. By dynamically generating unique evaluation instances ab initio, the framework fundamentally eliminates the risk of …

@arXiv_csCR_bot@mastoxiv.page
2025-09-17 09:50:00

xOffense: An AI-driven autonomous penetration testing framework with offensive knowledge-enhanced LLMs and multi agent systems
Phung Duc Luong, Le Tran Gia Bao, Nguyen Vu Khai Tam, Dong Huu Nguyen Khoa, Nguyen Huu Quyen, Van-Hau Pham, Phan The Duy
https://arxiv.org/abs/2509.13021

xOffense: An AI-driven autonomous penetration testing framework with offensive knowledge-enhanced LLMs and multi agent systems
This work introduces xOffense, an AI-driven, multi-agent penetration testing framework that shifts the process from labor-intensive, expert-driven manual efforts to fully automated, machine-executable workflows capable of scaling seamlessly with computational infrastructure. At its core, xOffense leverages a fine-tuned, mid-scale open-source LLM (Qwen3-32B) to drive reasoning and decision-making in penetration testing. The framework assigns specialized agents to reconnaissance, vulnerability sc…

@arXiv_statME_bot@mastoxiv.page
2025-08-22 09:25:21

An adaptive procedure for detecting replicated signals with $k$-family-wise error rate control
Ninh Tran
https://arxiv.org/abs/2508.15363 https://arxiv.org…

An adaptive procedure for detecting replicated signals with $k$-family-wise error rate control
Partial conjunction (PC) hypothesis testing is widely used to assess the replicability of scientific findings across multiple comparable studies. In high-throughput meta-analyses, testing a large number of PC hypotheses with k-family-wise error rate (k-FWER) control often suffers from low statistical power due to the multiplicity burden. The state-of-the-art AdaFilter-Bon procedure by Wang et al. (2022, Ann. Stat., 50(4), 1890-1909) alleviates this problem by filtering out hypotheses unlikely t…

@arXiv_quantph_bot@mastoxiv.page
2025-08-22 10:02:41

Neutralization of Levitated Charged Nanodiamond: Towards matter-wave interferometry with massive objects
Sela Liran, Or Dobkowski, Rafael Benjaminov, Peter Skakunenko, Michael Averbukh, Yaniv Bar-Haim, David Groswasser, Joshua H. Baraban, Ron Folman
https://arxiv.org/abs/2508.15625

Neutralization of Levitated Charged Nanodiamond: Towards matter-wave interferometry with massive objects
Quantum mechanics (QM) and General relativity (GR), also known as the theory of gravity, are the two pillars of modern physics. A matter-wave interferometer with a massive particle, can test numerous fundamental ideas, including the spatial superposition principle - a foundational concept in QM - in completely new regimes, as well as the interface between QM and GR, e.g., testing the quantization of gravity. Consequently, there exists an intensive effort to realize such an interferometer. While…

@arXiv_csSE_bot@mastoxiv.page
2025-08-19 10:01:50

XAMT: Cross-Framework API Matching for Testing Deep Learning Libraries
Bin Duan, Ruican Dong, Naipeng Dong, Dan Dongseong Kim, Guowei Yang
https://arxiv.org/abs/2508.12546 https…

XAMT: Cross-Framework API Matching for Testing Deep Learning Libraries
Deep learning powers critical applications such as autonomous driving, healthcare, and finance, where the correctness of underlying libraries is essential. Bugs in widely used deep learning APIs can propagate to downstream systems, causing serious consequences. While existing fuzzing techniques detect bugs through intra-framework testing across hardware backends (CPU vs. GPU), they may miss bugs that manifest identically across backends and thus escape detection under these strategies. To addre…

@Techmeme@techhub.social
2025-08-20 14:30:57

Google hires NBA star Stephen Curry as a "performance advisor" for its Health, Pixel, and Cloud products, including testing Fitbit's new personal health coach (Jess Weatherbed/The Verge)
https://www.theverge.com/news/762146/google-pixel-stephen-curry…

Google signs Stephen Curry to pitch its Pixel, health, and AI gear
Google has brought NBA star Stephen Curry on board to help shape up the company’s hardware, features, and AI services.

@arXiv_quantph_bot@mastoxiv.page
2025-08-21 09:24:10

Strong Confinement of a Nanoparticle in a Needle Paul Trap: Towards Matter-Wave Interferometry with Nanodiamonds
Peter Skakunenko, Daniel Folman, Yaniv Bar-Haim, Ron Folman
https://arxiv.org/abs/2508.14272

Strong Confinement of a Nanoparticle in a Needle Paul Trap: Towards Matter-Wave Interferometry with Nanodiamonds
Quantum mechanics (QM) and General relativity (GR), also known as the theory of gravity, are the two pillars of modern physics. A matter-wave interferometer with a massive particle, can test numerous fundamental ideas, including the spatial superposition principle - a foundational concept in QM - in completely new regimes, as well as the interface between QM and GR, e.g., testing the quantization of gravity. Consequently, there exists an intensive effort to realize such an interferometer. While…

@arXiv_csSE_bot@mastoxiv.page
2025-08-18 08:38:50

ORFuzz: Fuzzing the "Other Side" of LLM Safety -- Testing Over-Refusal
Haonan Zhang, Dongxia Wang, Yi Liu, Kexin Chen, Jiashui Wang, Xinlei Ying, Long Liu, Wenhai Wang
https://arxiv.org/abs/2508.11222

ORFuzz: Fuzzing the "Other Side" of LLM Safety -- Testing Over-Refusal
Large Language Models (LLMs) increasingly exhibit over-refusal - erroneously rejecting benign queries due to overly conservative safety measures - a critical functional flaw that undermines their reliability and usability. Current methods for testing this behavior are demonstrably inadequate, suffering from flawed benchmarks and limited test generation capabilities, as highlighted by our empirical user study. To the best of our knowledge, this paper introduces the first evolutionary testing fra…

@arXiv_csCR_bot@mastoxiv.page
2025-08-19 11:35:00

Reducing False Positives with Active Behavioral Analysis for Cloud Security
Dikshant, Verma
https://arxiv.org/abs/2508.12584 https://arxiv.org/pdf/2508.12…

Reducing False Positives with Active Behavioral Analysis for Cloud Security
Rule-based cloud security posture management (CSPM) solutions are known to produce a lot of false positives based on the limited contextual understanding and dependence on static heuristics testing. This paper introduces a validation-driven methodology that integrates active behavioral testing in cloud security posture management solution(s) to evaluate the exploitability of policy violations in real time. The proposed system employs lightweight and automated probes, built from open-source tool…

@arXiv_quantph_bot@mastoxiv.page
2025-08-21 10:03:30

Design of high-efficiency UHV loading of nanodiamonds into a Paul trap: Towards Matter-Wave Interferometry with Massive Objects
Rafael Benjaminov, Sela Liran, Or Dobkowski, Yaniv Bar-Haim, Michael Averbukh, Ron Folman
https://arxiv.org/abs/2508.14722

Design of high-efficiency UHV loading of nanodiamonds into a Paul trap: Towards Matter-Wave Interferometry with Massive Objects
Quantum mechanics (QM) and General relativity (GR), also known as the theory of gravity, are the two pillars of modern physics. A matter-wave interferometer with a massive particle, can test numerous fundamental ideas, including the spatial superposition principle - a foundational concept in QM - in completely new regimes, as well as the interface between QM and GR, e.g., testing the quantization of gravity. Consequently, there exists an intensive effort to realize such an interferometer. While…

@Techmeme@techhub.social
2025-08-20 22:05:59

Inside Google's Reliability Labs, where it stress tests Pixel phones and watches; Google claims the Pixel 10 Pro Fold can withstand 10 years of folding (Julian Chokkattu/Wired)
https://www.wired.com/story/google-reliability-labs-exclusive-look/

An Exclusive Look at Reliability Labs, Where Google Stress Tests Pixel Hardware
Google runs an array of tests on its Pixel phones and smartwatches throughout the development cycle. Here's a peek behind the curtain of what that testing looks like.

@arXiv_csSE_bot@mastoxiv.page
2025-08-21 10:30:24

Crosslisted article(s) found for cs.SE. https://arxiv.org/list/cs.SE/new
[1/1]:
- Tuning Random Generators: Property-Based Testing as Probabilistic Programming
Ryan Tjoa, Poorva Garg, Harrison Goldstein, Todd Millstein, Benjamin Pierce, Guy Van den Broeck

@arXiv_quantph_bot@mastoxiv.page
2025-08-22 09:59:51

Quantum control of Nitrogen-Vacancy spin in Diamonds: Towards matter-wave interferometry with massive objects
N. Levi, O. Feldman, Y. Rosenzweig, D. Groswasser, A. Elgarat, M. Gal-Katizri, R. Folman
https://arxiv.org/abs/2508.15504

Quantum control of Nitrogen-Vacancy spin in Diamonds: Towards matter-wave interferometry with massive objects
Quantum mechanics (QM) and General relativity (GR), also known as the theory of gravity, are the two pillars of modern physics. A matter-wave interferometer with a massive particle can test numerous fundamental ideas, including the spatial superposition principle - a foundational concept in QM - in previously unexplored regimes. It also opens the possibility of probing the interface between QM and GR, such as testing the quantization of gravity. Consequently, there exists an intensive effort to…

@Techmeme@techhub.social
2025-08-14 01:40:59

India-based ride-hailing app Rapido starts testing its food delivery service Ownly in Bengaluru, marking its first serious move to challenge Swiggy and Zomato (Jagmeet Singh/TechCrunch)
https://techcrunch.com/2025/08/13/indias-rapido-beg…

India's Rapido begins testing food delivery to take on Swiggy, Zomato | TechCrunch
Rapido's beta food delivery service has popped up in three key localities in Bengaluru before a broader rollout.

@arXiv_statME_bot@mastoxiv.page
2025-08-19 09:58:00

Unified Conformalized Multiple Testing with Full Data Efficiency
Yuyang Huo, Xiaoyang Wu, Changliang Zou, Haojie Ren
https://arxiv.org/abs/2508.12085 https://

Unified Conformalized Multiple Testing with Full Data Efficiency
Conformalized multiple testing offers a model-free way to control predictive uncertainty in decision-making. Existing methods typically use only part of the available data to build score functions tailored to specific settings. We propose a unified framework that puts data utilization at the center: it uses all available data-null, alternative, and unlabeled-to construct scores and calibrate p-values through a full permutation strategy. This unified use of all available data significantly impro…

@arXiv_quantph_bot@mastoxiv.page
2025-08-21 10:02:40

Trapping and cooling of nanodiamonds in a Paul trap under ultra-high vacuum: Towards matter-wave interferometry with massive objects
Omer Feldman, Ben Baruch Shultz, Maria Muretova, Or Dobkowski, Yonathan Japha, David Grosswasser, Ron Folman
https://arxiv.org/abs/2508.14687

Trapping and cooling of nanodiamonds in a Paul trap under ultra-high vacuum: Towards matter-wave interferometry with massive objects
Quantum mechanics (QM) and General relativity (GR), also known as the theory of gravity, are the two pillars of modern physics. A matter-wave interferometer with a massive particle can test numerous fundamental ideas, including the spatial superposition principle - a foundational concept in QM - in previously unexplored regimes. It also opens the possibility of probing the interface between QM and GR, such as testing the quantization of gravity. Consequently, there exists an intensive effort to…

@arXiv_csSE_bot@mastoxiv.page
2025-08-22 09:19:11

A Novel Mutation Based Method for Detecting FPGA Logic Synthesis Tool Bugs
Yi Zhang, He Jiang, Xiaochen Li, Shikai Guo, Peiyu Zou, Zun Wang
https://arxiv.org/abs/2508.15536 http…

A Novel Mutation Based Method for Detecting FPGA Logic Synthesis Tool Bugs
FPGA (Field-Programmable Gate Array) logic synthesis tools are key components in the EDA (Electronic Design Automation) toolchain. They convert hardware designs written in description languages such as Verilog into gate-level representations for FPGAs. However, defects in these tools may lead to unexpected behaviors and pose security risks. Therefore, it is crucial to harden these tools through testing. Although several methods have been proposed to automatically test FPGA logic synthesis tools…

@Techmeme@techhub.social
2025-09-15 11:35:42

Israel-based Terra Security, which offers an AI-driven penetration testing platform, raised a $30M Series A led by Felicis, bringing its total funding to $38M (Meir Orbach/CTech)
https://www.calcalistech.com/ctechnews/article/awdq1yv5k

AI startup Terra Security raises $30M Series A for continuous penetration testing
Israeli startup aims to scale offensive security testing beyond manual and costly methods.

@arXiv_statME_bot@mastoxiv.page
2025-09-18 09:13:51

Bridging Control Variates and Regression Adjustment in A/B Testing: From Design-Based to Model-Based Frameworks
Yu Zhang, Bokui Wan, Yongli Qin
https://arxiv.org/abs/2509.13944 …

Bridging Control Variates and Regression Adjustment in A/B Testing: From Design-Based to Model-Based Frameworks
A B testing serves as the gold standard for large scale, data driven decision making in online businesses. To mitigate metric variability and enhance testing sensitivity, control variates and regression adjustment have emerged as prominent variance reduction techniques, leveraging pre experiment data to improve estimator performance. Over the past decade, these methods have spawned numerous derivatives, yet their theoretical connections and comparative properties remain underexplored. In this p…

@arXiv_csSE_bot@mastoxiv.page
2025-07-17 08:09:00

Extremal Testing for Network Software using LLMs
Rathin Singha, Harry Qian, Srinath Saikrishnan, Tracy Zhao, Ryan Beckett, Siva Kesava Reddy Kakarla, George Varghese
https://arxiv.org/abs/2507.11898

Extremal Testing for Network Software using LLMs
Physicists often manually consider extreme cases when testing a theory. In this paper, we show how to automate extremal testing of network software using LLMs in two steps: first, ask the LLM to generate input constraints (e.g., DNS name length limits); then ask the LLM to generate tests that violate the constraints. We demonstrate how easy this process is by generating extremal tests for HTTP, BGP and DNS implementations, each of which uncovered new bugs. We show how this methodology extends t…

@arXiv_quantph_bot@mastoxiv.page
2025-07-17 10:15:10

Modulator-free, self-testing quantum random number generator
Ana Bl\'azquez-Co\'ido, Fadri Gr\"unenfelder, Anthony Martin, Raphael Houlmann, Hugo Zbinden, Davide Rusca
https://arxiv.org/abs/2507.12346

Modulator-free, self-testing quantum random number generator
Quantum random number generators (QRNGs) use the inherent unpredictability of quantum mechanics to generate true randomness, as opposed to classical random number generators. However, ensuring the authenticity of this randomness still requires robust verification. Self-testing QRNGs address this need by enabling the validation of the randomness produced based on the observed data from the experiment while requiring few assumptions. In this work, we present a practical, self-testing QRNG designe…

@arXiv_statME_bot@mastoxiv.page
2025-07-22 11:13:40

A powerful procedure that controls the false discovery rate with directional information
Zhaoyang Tian, Kun Liang, Pengfei Li
https://arxiv.org/abs/2507.15631

A powerful procedure that controls the false discovery rate with directional information
In many multiple testing applications in genetics, the signs of test statistics provide useful directional information, such as whether genes are potentially up- or down-regulated between two experimental conditions. However, most existing procedures that control the false discovery rate (FDR) are $p$-value based and ignore such directional information. We introduce a novel procedure, the signed-knockoff procedure, to utilize the directional information and control the FDR in finite samples. We…

@arXiv_csSE_bot@mastoxiv.page
2025-09-18 09:14:21

A Regression Testing Framework with Automated Assertion Generation for Machine Learning Notebooks
Yingao Elaine Yao, Vedant Nimje, Varun Viswanath, Saikat Dutta
https://arxiv.org/abs/2509.13656

A Regression Testing Framework with Automated Assertion Generation for Machine Learning Notebooks
Notebooks have become the de-facto choice for data scientists and machine learning engineers for prototyping and experimenting with machine learning (ML) pipelines. Notebooks provide an interactive interface for code, data, and visualization. However, notebooks provide very limited support for testing. Thus, during continuous development, many subtle bugs that do not lead to crashes often go unnoticed and cause silent errors that manifest as performance regressions. To address this, we introd…

@Techmeme@techhub.social
2025-09-16 22:25:59

Google is testing a Windows desktop app that brings Mac's Spotlight-like search bar to PC users, allowing them to search local files, Google Drive, and the web (Emma Roth/The Verge)
https://www.theverge.com/news/778940/google-app-windows-launch

Google’s new Windows desktop app brings a Spotlight-like search bar to PC
Google is testing a new app on Windows that you can use to search your files, Drive, and the web.

@arXiv_statME_bot@mastoxiv.page
2025-07-21 08:39:00

Controlling IER and EER in replicated regular two-level factorial experiments
Pengfei Li, Oludotun J. Akinlawon, Shengli Zhao
https://arxiv.org/abs/2507.13621

Controlling IER and EER in replicated regular two-level factorial experiments
Replicated regular two-level factorial experiments are very useful for industry. The goal of these experiments is to identify active effects that affect the mean and variance of the response. Hypothesis testing procedures are widely used for this purpose. However, the existing methods give results that are either too anticonservative or conservative in controlling the individual and experimentwise error rates (IER and EER). In this paper, we propose {a Monte Carlo method} and an exact-variance …

@arXiv_csSE_bot@mastoxiv.page
2025-09-19 09:36:21

Wireless Communication Performance Testing: From Laboratory Environment to Research Vessel
Andrei-Raoul Morariu, Andreas Strandberg, Bogdan Iancu, Jerker Bjorkqvist
https://arxiv.org/abs/2509.14740

Wireless Communication Performance Testing: From Laboratory Environment to Research Vessel
This study investigates signal transmission within a shared spectrum, focusing on measurements conducted both in laboratory and outdoor environments. The objective was to demonstrate how laboratory objects obstructing the line of sight can attenuate the signal between a transmitter (Tx) and a receiver (Rx). Additionally, we examined the impact of distance and placement in various locations aboard an electric research boat on signal transmission efficiency. These findings contribute to understan…

@arXiv_csSE_bot@mastoxiv.page
2025-09-19 09:33:51

Evaluating the Effectiveness of Coverage-Guided Fuzzing for Testing Deep Learning Library APIs
Feiran Qin, M. M. Abid Naziri, Hengyu Ai, Saikat Dutta, Marcelo d'Amorim
https://arxiv.org/abs/2509.14626

Evaluating the Effectiveness of Coverage-Guided Fuzzing for Testing Deep Learning Library APIs
Deep Learning (DL) libraries such as PyTorch provide the core components to build major AI-enabled applications. Finding bugs in these libraries is important and challenging. Prior approaches have tackled this by performing either API-level fuzzing or model-level fuzzing, but they do not use coverage guidance, which limits their effectiveness and efficiency. This raises an intriguing question: can coverage guided fuzzing (CGF), in particular frameworks like LibFuzzer, be effectively applied to …

@arXiv_statME_bot@mastoxiv.page
2025-08-18 08:53:20

Two-Sample Testing with Missing Data via Energy Distance: Weighting and Imputation Approaches
Danijel G. Aleksi\'c, Bojana Milo\v{s}evi\'c
https://arxiv.org/abs/2508.11421

Two-Sample Testing with Missing Data via Energy Distance: Weighting and Imputation Approaches
In this paper, we address the problem of two-sample testing in the presence of missing data under a variety of missingness mechanisms. Our focus is on the well-known energy distance-based two-sample test. In addition to the standard complete-case approach, we propose a modification of the test statistic that incorporates all available data, utilizing appropriate weights. The asymptotic null distribution of the test statistic is derived and two resampling procedures for approximating the corresp…

@Techmeme@techhub.social
2025-08-18 20:10:44

Nvidia, Discord, and Epic Games are testing game demos on Discord servers, letting users try a game without downloading it or signing up, starting with Fortnite (Sean Hollister/The Verge)
https://www.theverge.com/news/760894/play-

‘Play Instantly on Discord’: Fortnite will be Nvidia and Discord’s first instant game demo
Try before you try before you buy.

Tootfinder

Opt-in global Mastodon full text search. Join the index!