
2025-07-08 10:49:10
Improving Deep Learning Framework Testing with Model-Level Metamorphic Testing
Yanzhou Mu, Juan Zhai, Chunrong Fang, Xiang Chen, Zhixiang Cao, Peiran Yang, Kexin Zhao, An Guo, Zhenyu Chen
https://arxiv.org/abs/2507.04354
Improving Deep Learning Framework Testing with Model-Level Metamorphic Testing
Yanzhou Mu, Juan Zhai, Chunrong Fang, Xiang Chen, Zhixiang Cao, Peiran Yang, Kexin Zhao, An Guo, Zhenyu Chen
https://arxiv.org/abs/2507.04354
YouTube rolls out a tool to let some creators upload different thumbnails for each video dubbed into a different language, to help expand their global audience (Dan Whateley/Business Insider)
https://www.businessinsider.com/youtube-te
Reinforcement Learning for Automated Cybersecurity Penetration Testing
Daniel L\'opez-Montero, Jos\'e L. \'Alvarez-Aldana, Alicia Morales-Mart\'inez, Marta Gil-L\'opez, Juan M. Au\~n\'on Garc\'ia
https://arxiv.org/abs/2507.02969
Testing for Renamability to Classes of Clause Sets
Albert Brandl, Christian G. Ferm\"uller, Gernot Salzer
https://arxiv.org/abs/2507.05044 https://
Search-based Selection of Metamorphic Relations for Optimized Robustness Testing of Large Language Models
Sangwon Hyun, Shaukat Ali, M. Ali Babar
https://arxiv.org/abs/2507.05565 …
Optimal structure learning and conditional independence testing
Ming Gao, Yuhao Wang, Bryon Aragam
https://arxiv.org/abs/2507.05689 https://
YouTube rolls out a tool to let some creators upload different thumbnails for each video dubbed into a different language, to help expand their global audience (Dan Whateley/Business Insider)
https://www.businessinsider.com/youtube-te
Apropos of yet another conversation today, I’m a big fan of using automation in WCAG testing.
But I also know WCAG well enough to understand the limitations (and lies) the tools.
https://adrianroselli.com/2025/04/automated-wcag-testing-is-grrreat.htm…
MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks
Dumitran Adrian Marius, Theodor-Pierre Moroianu, Buca Mihnea-Vicentiu
https://arxiv.org/abs/2507.03162
@… has stopped adding OCSP URLs to newly issued certificates (since a few hours). Somewhat to my surprise, I haven't heard about any issues yet caused by this (I would've preferred if they had provided better opportunities for testing). But if you observe unexpected behavior with anything related to TLS certificates in the upcoming days, you may …
Testing Hypotheses of Covariate Effects on Topics of Discourse
Gabriel Phelan, David A. Campbell
https://arxiv.org/abs/2506.05570 https://
Testing the ubiquitous presence of very high energy emission in gamma-ray bursts with the MAGIC telescopes
S. Abe, J. Abhir, A. Abhishek, V. A. Acciari, A. Aguasca-Cabot, I. Agudo, T. Aniello, S. Ansoldi, L. A. Antonelli, A. Arbet Engels, C. Arcaro, T. T. H. Arnesen, K. Asano, A. Babic, C. Bakshi, U. Barres de Almeida, J. A. Barrio, L. Barrios-Jimenez, I. Batkovic, J. Baxter, J. Becerra Gonzalez, W. Bednarek, E. Bernardini, J. Bernete, A. Berti, J. Besenrieder, C. Bigongiari, A. Biland…
Question for any folk that do reader/audience/play testing.
I find that the more fidelity (detail, production value) a project has, the less accurate the user's identification of an issue. I try to interpret accordingly.
Do you know of any studies regarding this or personal experiences (that confirm or refute this)? I have studies on how people don't know what they want, etc. But keen on the connection btw fidelity & inaccuracy
Thank you! :)
Other relate…
Testing OH Megamaser Identification Methods in HI Surveys: Updated Source-Flagging Algorithms and New Detections in ALFALFA
Hayley Roberts, Jeremy Darling, Kelley M. Hess, Andrew J. Baker, Elizabeth A. K. Adams, Helga D\'enes
https://arxiv.org/abs/2506.06115
On void formation during the simulated tensile testing of polymer-filler particle composites
John J. Karnes, Supun S. Mohottalalage, Amitesh Maiti, Andrew P. Saab, Todd H. Weisgraber
https://arxiv.org/abs/2507.05547
Testing 4.4.0
ASSURE: Metamorphic Testing for AI-powered Browser Extensions
Xuanqi Gao, Juan Zhai, Shiqing Ma, Siyi Xie, Chao Shen
https://arxiv.org/abs/2507.05307 https…
Test Automatically - Test improvements against current setup with intelligent #AB testing and performance comparison 🎛️ Control What Goes Live - Review auto-generated fixes, approve changes, and roll back if needed with full control ✍️ Version Everything - Complete version
Replicable Distribution Testing
Ilias Diakonikolas, Jingyi Gao, Daniel Kane, Sihan Liu, Christopher Ye
https://arxiv.org/abs/2507.02814 https://
Enter, Exit, Page Fault, Leak: Testing Isolation Boundaries for Microarchitectural Leaks
Oleksii Oleksenko, Flavien Solt, C\'edric Fournet, Jana Hofmann, Boris K\"opf, Stavros Volos
https://arxiv.org/abs/2507.06039
Optimizing Cloud-to-GPU Throughput for Deep Learning With Earth Observation Data
Akram Zaytar, Caleb Robinson, Girmaw Abebe Tadesse, Tammy Glazer, Gilles Hacheme, Anthony Ortiz, Rahul M Dodhia, Juan M Lavista Ferres
https://arxiv.org/abs/2506.06235
TigAug: Data Augmentation for Testing Traffic Light Detection in Autonomous Driving Systems
You Lu, Dingji Wang, Kaifeng Huang, Bihuan Chen, Xin Peng
https://arxiv.org/abs/2507.05932
IPv6 woes solved, I now have (I think, still testing) fully working IPv6 again.
The fix was incredibly braindead and stupid: turning on the IPv4 DHCP server on the LAN side of the Comcast CPE (i.e. on the point to point link between the CPE and my border router).
This would be completely unnecessary in a sane world, since I'm never going to DHCPDISCOVER on this subnet (the CPE and border router both have static world routable IPv4 addresses).
But apparently turning on t…
PyGemini: Unified Software Development towards Maritime Autonomy Systems
Kjetil Vasstein, Christian Le, Simon Lerv{\aa}g Breivik, Trygve Maukon Myhr, Annette Stahl, Edmund F{\o}rland Brekke
https://arxiv.org/abs/2506.06262
Dann finden wir es doch heraus, landet ein Beitrag mit nur dem Hashtag:
#test
einfach auf
https://feddit.org/c/testing@kbin.earth?
Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators
Arturo Castellanos, Anna Korba, Pavlo Mozharovskyi, Hicham Janati
https://arxiv.org/abs/2507.06055
Identity Testing for Circuits with Exponentiation Gates
Jiatu Li, Mengdi Wu
https://arxiv.org/abs/2506.04529 https://arxiv.org/pdf/25…
On Fault Tolerance of Data Storage Systems: A Holistic Perspective
Mai Zheng, Duo Zhang, Ahmed Dajani
https://arxiv.org/abs/2507.03849 https://
Why I Don’t Use #Mocking Frameworks and Why You Might Not Need Them Either by @…
An experimental approach: Converting verbal expressions to numerical scales
Zsombor Sz\'adoczki, S\'andor Boz\'oki, L\'aszl\'o Sipos, Zs\'ofia Galambosi
https://arxiv.org/abs/2507.04539
"There was a recent 2024 study that showed us that individuals who survive an acute COVID-19 infection (...) on average will lose somewhere in the neighbourhood of two to six IQ points per infection."
"although your immune system can take on [a COVID] infection, you want to avoid testing it as much as possible because your body is sustaining damage with each infection that it survives."
Seriously, wear a mask 😷 .
New limits on CPT-Symmetry Violation in Charm mesons
W. Krzemien, M . Kmiec, A. Szabelski, W. Wislicki
https://arxiv.org/abs/2507.05457 https://
Neural Substitute Solver for Efficient Edge Inference of Power Electronic Hybrid Dynamics
Jialin Zheng, Haoyu Wang, Yangbin Zeng, Han Xu, Di Mou, Hong Li, Sergio Vazquez, Leopoldo G. Franquelo
https://arxiv.org/abs/2507.03144
What is going on in the land of Fedora Linux?
First, the latest kernel in Fedora 42 recently went from "working" to "broken" with regard to the video screen on Intel(R) Core(TM) Ultra 7 155H on an ASUSTeK COMPUTER INC. NUC14RVK-B/NUC14RVBU7
Second, the update repositories now all have invalid checksums. (Update: this is now corrected.)
Fedora is Redhat is IBM - you would think that they could do some regression testing and maybe do a better locking of t…
This might also explain why #OpenAI pushed ChatGPT to market before it was worth touching. If you can’t afford testing at scale, why not do a huge free public release so that you can call the last 80% of the dev effort “product support" which should be fully expensible.
NOTE: I AM NOT A TAX EXPERT AND THE ABOVE COULD BE ABSURD.
MalVol-25: A Diverse, Labelled and Detailed Volatile Memory Dataset for Malware Detection and Response Testing and Validation
Dipo Dunsin, Mohamed Chahine Ghanem, Eduardo Almeida Palmieri
https://arxiv.org/abs/2507.03993
"The immigration detention system has become a testing ground for authoritarianism." - @silkys13.bsky.social
https://bsky.app/profile/democracynow.org/post/3lqpltpcg522i
User testing at Xerox PARC and Apple in the 1970s–80s, by Larry Tesler:
https://dl.acm.org/doi/pdf/10.1145/1044774.1534167
Thanks, @…
A Generalized Graph Signal Processing Framework for Multiple Hypothesis Testing over Networks
Xingchao Jian, Martin G\"olz, Feng Ji, Wee Peng Tay, Abdelhak M. Zoubir
https://arxiv.org/abs/2506.03496
Good article summarizing a lot of things relevant to continued COVID'19 caution:
https://www.cbc.ca/radio/quirks/beyond-long-covid-1.7485888
Key points:
COVID'19 weakens the immune system:
"""
So it's not just about infecting you and causing respiratory illness and fever and all of the things that we usually get with the viral infection. This virus also specifically causes your immune system to become weaker.
"""
It damages blood vessels:
"""
In addition to SARS-CoV-2's ability to dysregulate the immune system and suppress the immune system, the spike protein itself is very damaging to blood vessel structures as well as red blood cells and platelets themselves.
"""
The folk idea that infections make our immune system stronger and stronger like a muscle just isn't true (or at least, doesn't apply to COVID'19 because of how, unlike most other viruses, it damages the immune system):
"""
For the longest time in the field of immunology, there was the sort of adage that your immune system needs to be tested every now and again to stay strong. That's an old-fashioned idea.
The more new-fashioned and evidence-based idea is that, although your immune system can take on [a COVID] infection, you want to avoid testing it as much as possible because your body is sustaining damage with each infection that it survives.
"""
Check out today's Metacurity for the most critical infosec developments you might have missed over the weekend, including
--German police ID Trickbot's "Stern,"
--BitMEX thwarts Lazarus Group attack,
--Shin Bet thwarted 85 Iranian cyberattacks aimed at civilians,
--Vibe coding app Lovable failed to fix critical flaw,
--China's quantum satellite Micius has a severe security flaw,
--Russia's GRU Unit 29155 has a hacker team,
--…
Parting Thoughts on Performance Testing - #dotNet
https://improveandrepeat.com/2025/04/parting-thoughts-on-performance-testing/
Replaced article(s) found for physics.ins-det. https://arxiv.org/list/physics.ins-det/new
[1/1]:
- Development, Characterization, and Testing of a Bias Supply for SiPMs in the CMVD Experiment
Chattopadhyay, Saraf, Majumder, Bheesette, Shinde
@… thanks for testing! fair point about overscroll, I don’t think there is an `overscroll-behavior` value that makes sense for the component to add though — this seems more like an app-level option
Testing Hypotheses regarding Covariance and Correlation matrices with the R package CovCorTest
Paavo Sattler, Svenja Jedhoff
https://arxiv.org/abs/2507.03406
AI for the Routine, Humans for the Complex: Accuracy-Driven Data Labelling with Mixed Integer Linear Programming
Mohammad Hossein Amini, Mehrdad Sabetzadeh, Shiva Nejati
https://arxiv.org/abs/2507.04990
Hybrid Approach to Directed Fuzzing
Darya Parygina, Timofey Mezhuev, Daniil Kuts
https://arxiv.org/abs/2507.04855 https://arxiv.org/p…
Vivodyne, which uses AI and robots to grow human tissues in the lab for drug discovery and development, raised a $40M Series A, taking its total funding to $78M (Katherine Davis/Axios)
https://www.axios.com/pro/biotech-deals/2025/05…
Won’t someone think of the jackbooted thugs?
“…ICE officers themselves…facing inhumane conditions…showing symptoms of respiratory infections…can’t get proper testing for diagnoses…at risk of rocket attacks from Yemeni terrorists…lack body armor or other protective gear. …Trump…said…judge was ‘literally putting ICE agents’ lives in danger.’
…not quite true…judge never said they had to stay there…officials are ‘manufacturing the…chaos they decry.’”
»Pay up or stop scraping – Cloudflare program charges bots for each crawl:
Cloudflare now beta testing pay-per-crawl feature to stop endless AI scraping.
Cloudflare is now experimenting with tools that will allow content creators to charge a fee to AI crawlers to scrape their websites.«
This is certainly a good idea, but on the other hand, the competition is trying to eliminate each other. I'm curious… 🍿😎
The most effective way to reduce immigration is to tank the economy. The USA is seriously testing that, it seems. The populations of many countries in Europe would like their governments to follow suit, convinced as they are that there is too much wealth in their countries.
💉 Olympic anti-doping lab puts U.S. meat supply to the test
#food
Testing, Evaluation, Verification and Validation (TEVV) of Digital Twins: A Comprehensive Framework
Gabriella Waters
https://arxiv.org/abs/2507.04555 https…
Practical Short-Length Coding Schemes for Binary Distributed Hypothesis Testing
Ismaila Salihou Adamou, Elsa Dupraz, Reza Asvadi, Tad Matsumoto
https://arxiv.org/abs/2506.01747
Are Depth-2 Regular Expressions Hard to Intersect?
Rocco Ascone, Giulia Bernardini, Alessio Conte, Veronica Guerrini, Giulia Punzi
https://arxiv.org/abs/2507.03593
Machine Learning Based Stress Testing Framework for Indian Financial Market Portfolios
Vidya Sagar G, Shifat Ali, Siddhartha P. Chakrabarty
https://arxiv.org/abs/2507.02011
Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review
Zhicheng Lin
https://arxiv.org/abs/2507.06185 https://arxiv.org/pdf/250…
Goodness-of-fit testing for the stationary density of a size-structured PDE
Van Ha Hoang, Phu Thanh Nguyen, Thanh Mai Pham Ngoc, Vincent Rivoirard, Viet Chi Tran
https://arxiv.org/abs/2506.05103
Testing Determinism and the Role of Time in Atoms
Mark G. Raizen
https://arxiv.org/abs/2507.01203 https://arxiv.org/pdf/2507.01203
DevMuT: Testing Deep Learning Framework via Developer Expertise-Based Mutation
Yanzhou Mu, Juan Zhai, Chunrong Fang, Xiang Chen, Zhixiang Cao, Peiran Yang, Yinglong Zou, Tao Zheng, Zhenyu Chen
https://arxiv.org/abs/2507.04360
Day 5
TL;DR: Continued work on backend security — role-based access is now fully wired up.
✅ Got fine-grained role-based access control fully working today.
• Roles loaded from PostgreSQL
• Injected into JWT during login
• Validated via custom `@Roles()` `RolesGuard`
• Authenticated via `@UseGuards(JwtAuthGuard)` globally
• Introduced `@Public()` decorator to bypass guards for public endpoints
• Swagger supports Bearer token for testing …
A Vehicle-in-the-Loop Simulator with AI-Powered Digital Twins for Testing Automated Driving Controllers
Zengjie Zhang, Giannis Badakis, Michalis Galanis, Adem Bavar\c{s}i, Edwin van Hassel, Mohsen Alirezaei, Sofie Haesaert
https://arxiv.org/abs/2507.02313
A Note on Inferential Decisions, Errors and Path-Dependency
Kangda K. Wren
https://arxiv.org/abs/2507.05634 https://arxiv.org/pdf/250…
Hyperelastic characterization via deep indentation
Mohammad Shojaeifard, Mattia Bacca
https://arxiv.org/abs/2506.05371 https://arxiv.…
Development and Testing of a Low Cost Ultrasonic Leak Detector
Senol Gulgonul
https://arxiv.org/abs/2506.04862 https://arxiv.org/pdf/…
LLM-Guided Scenario-based GUI Testing
Shengcheng Yu, Yuchen Ling, Chunrong Fang, Quan Zhou, Chunyang Chen, Shaomin Zhu, Zhenyu Chen
https://arxiv.org/abs/2506.05079
Testing gravity with wide binaries -- 3D velocities and distances of wide binaries from Gaia and HARPS
R. Saglia, L. Pasquini, F. Patat, H. -G. Ludwig, R. Giribaldi, I. Leao, J. R. de Medeiros, Michael T. Murphy
https://arxiv.org/abs/2506.05049
This https://arxiv.org/abs/2504.09567 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_sta…
Some cool footage and discussion of full-scale EV fire testing by UL/FSRI. This should have been done a long time ago, given how many EVs are on the roads now, but better late than never I guess.
Video: https://www.youtube.com/watch?v=K6j3GtcAfE0
Project website:
An Investigation into Maintenance Support for Neural Networks
Fatema Tuz Zohra, Brittany Johnson
https://arxiv.org/abs/2507.05245 https://
“Check / Uncheck all in a Table”
https://adrianroselli.com/2025/07/check-uncheck-all-in-a-table.html
TL;DR: Unless you have user testing results saying otherwise, maybe put a check-all checkbox outside the table.
The rest of this thread ha…
LIFT: Automating Symbolic Execution Optimization with Large Language Models for AI Networks
Ruoxi Wang, Kun Li, Minghui Xu, Yue Zhang, Kaidi Xu, Chunchi Liu, Yinhao Xiao, Xiuzhen Cheng
https://arxiv.org/abs/2507.04931
This https://arxiv.org/abs/2505.01892 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
New aspects of quantum topological data analysis: Betti number estimation, and testing and tracking of homology and cohomology classes
Nhat A. Nghiem, Junseo Lee
https://arxiv.org/abs/2506.01432
Testing for large-dimensional covariance matrix under differential privacy
Shiwei Sang, Yicheng Zeng, Xuehu Zhu, Shurong Zheng
https://arxiv.org/abs/2506.02410
UniAud: A Unified Auditing Framework for High Auditing Power and Utility with One Training Run
Ruixuan Liu, Li Xiong
https://arxiv.org/abs/2507.04457 https…
Sources say Alibaba, Tencent, Baidu, and other Chinese companies are testing domestic alternatives as they deal with a dwindling stockpile of Nvidia processors (Financial Times)
https://www.ft.com/content/bb1315e8-27df-4a93-a4dc-11e2883fdde3
Evaluating the Evaluators: Trust in Adversarial Robustness Tests
Antonio Emanuele Cin\`a, Maura Pintor, Luca Demetrio, Ambra Demontis, Battista Biggio, Fabio Roli
https://arxiv.org/abs/2507.03450
This https://arxiv.org/abs/2501.09015 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_sta…
A Systematization of Security Vulnerabilities in Computer Use Agents
Daniel Jones, Giorgio Severi, Martin Pouliot, Gary Lopez, Joris de Gruyter, Santiago Zanella-Beguelin, Justin Song, Blake Bullwinkel, Pamela Cortez, Amanda Minnich
https://arxiv.org/abs/2507.05445
Solsmith: Solidity Random Program Generator for Compiler Testing
Lantian Li, Zhihao Liu, Zhongxing Yu
https://arxiv.org/abs/2506.03909 https://
The Impact of LLM-Assistants on Software Developer Productivity: A Systematic Literature Review
Amr Mohamed, Maram Assi, Mariam Guizani
https://arxiv.org/abs/2507.03156
FuzzFeed: An Automatic Approach to Weakest Precondition Generation using LLMs and Fuzzing
Daragh King, Vasileios Koutavas, Laura Kovacs
https://arxiv.org/abs/2507.05272
iPanda: An Intelligent Protocol Testing and Debugging Agent for Conformance Testing
Xikai Sun, Fan Dang, Kebin Liu, Xin Miao, Zihao Yang, Haimo Lu, Yawen Zheng, Yunhao Liu
https://arxiv.org/abs/2507.00378
On the Surprising Efficacy of LLMs for Penetration-Testing
Andreas Happe, J\"urgen Cito
https://arxiv.org/abs/2507.00829 https://
Meta-Fair: AI-Assisted Fairness Testing of Large Language Models
Miguel Romero-Arjona, Jos\'e A. Parejo, Juan C. Alonso, Ana B. S\'anchez, Aitor Arrieta, Sergio Segura
https://arxiv.org/abs/2507.02533
Temac: Multi-Agent Collaboration for Automated Web GUI Testing
Chenxu Liu, Zhiyu Gu, Guoquan Wu, Ying Zhang, Jun Wei, Tao Xie
https://arxiv.org/abs/2506.00520
This https://arxiv.org/abs/2411.05185 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…
The Impact of Software Testing with Quantum Optimization Meets Machine Learning
Gopichand Bandarupalli
https://arxiv.org/abs/2506.02090 https://
Automated Web Application Testing: End-to-End Test Case Generation with Large Language Models and Screen Transition Graphs
Nguyen-Khang Le, Quan Minh Bui, Minh Ngoc Nguyen, Hiep Nguyen, Trung Vo, Son T. Luu, Shoshin Nomura, Minh Le Nguyen
https://arxiv.org/abs/2506.02529
VISCA: Inferring Component Abstractions for Automated End-to-End Testing
Parsa Alian, Martin Tang, Ali Mesbah
https://arxiv.org/abs/2506.04161 https://
Coverage-Guided Testing for Deep Learning Models: A Comprehensive Survey
Hongjing Guo, Chuanqi Tao, Zhiqiu Huang, Weiqin Zou
https://arxiv.org/abs/2507.00496
STCLocker: Deadlock Avoidance Testing for Autonomous Driving Systems
Mingfei Cheng, Renzhi Wang, Xiaofei Xie, Yuan Zhou, Lei Ma
https://arxiv.org/abs/2506.23995
TESTQUEST: A Web Gamification Tool to Improve Locators and Page Objects Quality
Dario Olianas, Diego Clerissi, Maurizio Leotta, Filippo Ricca
https://arxiv.org/abs/2505.24756
Software Fairness Testing in Practice
Ronnie de Souza Santos, Matheus de Morais Leca, Reydne Santos, Cleyton Magalhaes
https://arxiv.org/abs/2506.17095 htt…
This https://arxiv.org/abs/2405.04860 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…