2025-12-27 03:14:49
They are breaking things of huge value to everyone.
Trump weather balloon cuts have Michigan meteorologists debating forecasts | Interlochen Public Radio https://www.interlochenpublicradio.org/2025-12-20/trump-weat…
They are breaking things of huge value to everyone.
Trump weather balloon cuts have Michigan meteorologists debating forecasts | Interlochen Public Radio https://www.interlochenpublicradio.org/2025-12-20/trump-weat…
🚀 Enable via new Plugins section at https://openrouter.ai/settings/plugins - opt-in toggle activates automatic healing layer for all structured output requests
💡 Key insight: A 2% defect rate dropping to 1% means cutting defects, bugs, and support tickets in half - reliability at the margins is wh…
"users who leave their SSDs unpowered for over a year are risking the integrity of their data. The reliability of QLC NAND has improved over the years, so you should probably consider 2–3 years of unpowered usage as the guardrails. Without power, the voltage stored in the NAND cells can be lost, either resulting in missing data or completely useless drives."
Massachusetts is about to see major savings from clean energy. A new analysis shows the state's SMART 3.0 solar-plus-storage program could save ratepayers $313 million annually by 2030.
The key? Pushing out inefficient natural gas plants, cutting reliance on fossil fuels during winter, and slashing 1.6 million metric tons of CO2 per year.
[OT] Dilbert? Salesforce steps back from AI: Executives reveal overconfidence in LLMs, pivot to deterministic automation https://opentools.ai/news/salesforce-steps-back-from-ai-executives-reveal-overconfidence-in-ll…
What happens when you pair solar panels with mini nuclear reactors? Chinese researchers just cracked the code.
Their new microgrid framework combines photovoltaics with small modular reactors, using AI to balance both in real time. The results are striking: 18.7% lower costs, 37.1% fewer emissions, and 98% reliability.
The secret? Smart coordination between battery storage and hydrogen production that adapts on the fly.
Sources: Resolve AI, which is developing an autonomous site reliability engineering tool, raised a Series A at multiple valuation tiers, including at $1B (Marina Temkin/TechCrunch)
https://techcrunch.com/2025/12/19/ex-splunk-execs-…
from my link log —
A distributed systems reliability glossary.
https://antithesis.com/resources/reliability_glossary/
saved 2025-12-03 https://…
Dispersion-Aware Modeling Framework for Parallel Optical Computing
Ziqi Wei, Yuanjian Wan, Yuhu Cheng, Xiao Yu, Peng Xie
https://arxiv.org/abs/2511.18897 https://arxiv.org/pdf/2511.18897 https://arxiv.org/html/2511.18897
arXiv:2511.18897v1 Announce Type: new
Abstract: Optical computing represents a groundbreaking technology that leverages the unique properties of photons, with innate parallelism standing as its most compelling advantage. Parallel optical computing like cascaded Mach-Zehnder interferometers (MZIs) based offers powerful computational capabilities but also introduces new challenges, particularly concerning dispersion due to the introduction of new frequencies. In this work, we extend existing theories of cascaded MZI systems to develop a generalized model tailored for wavelength-multiplexed parallel optical computing. Our comprehensive model incorporates component dispersion characteristics into a wavelength-dependent transfer matrix framework and is experimentally validated. We propose a computationally efficient compensation strategy that reduces global dispersion error within a 40 nm range from 0.22 to 0.039 using edge-spectrum calibration. This work establishes a fundamental framework for dispersion-aware model and error correction in MZI-based parallel optical computing chips, advancing the reliability of multi-wavelength photonic processors.
toXiv_bot_toot
Want a used car that works?
You’d be wise to not get an older Tesla.
In Consumer Reports’ latest ranking for used cars,
the Elon Musk-run automaker came dead last in terms of reliability,
trailing by over forty points from the top spot on a scale between 0 and 100
https://futurism.co…
EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels
Kunyu Peng, Di Wen, Kailun Yang, Jia Fu, Yufan Chen, Ruiping Liu, Jiamin Wu, Junwei Zheng, M. Saquib Sarfraz, Luc Van Gool, Danda Pani Paudel, Rainer Stiefelhagen
https://arxiv.org/abs/2510.12687
These AWS & Cloudflare mega-outages are honestly embarrassing as an industry. Eugh. What are we doing???
We have so many tools & processes for ensuring reliability, but somehow two vendors can each single-handledly wipe everything out anytime.
Critical States Identiffcation in Power System via Lattice Partition and Its Application in Reliability Assessment
Han Hu, Wenjie Wan, Feiyu Chen, Xiaoyu Liu, Bo Yu, Kequan Zhao
https://arxiv.org/abs/2510.09420
38 coastal, remote, and island communities are getting a lifeline for their fragile energy grids.
Through the Energy Technology Innovation Partnership Project, they're designing microgrids, exploring local renewable generation, and hardening systems against extreme weather. The goal: reliable, affordable power that can withstand the next storm.
Discover the power of property-based testing in R with the #quickcheck package! Seamlessly integrates with #testthat and offers a variety of generators for atomic vectors, lists, and tibbles. Perfect for ensuring your code's reliability. Check it out:
Revisiting Metric Reliability for Fine-grained Evaluation of Machine Translation and Summarization in Indian Languages
Amir Hossein Yari, Kalmit Kulkarni, Ahmad Raza Khan, Fajri Koto
https://arxiv.org/abs/2510.07061
Fiabilité: Tesla est la pire marque du marché, selon l'organisation étatsunienne de consommateurs "Consumer Reports".
https://www.consumerreports.org/cars/which-brands-make-the-best-used-cars-a2811658468/
Reliability of Single-Level Equality-Constrained Inverse Optimal Control
Filip Be\v{c}anovi\'c (University of Belgrade - School of Electrical Engineering), Kosta Jovanovi\'c (University of Belgrade - School of Electrical Engineering), Vincent Bonnet (LAAS-CNRS)
https://arxiv.org/abs/2510.08406
Algorithmic analysis of a complex reliability system subject to multiple events with a preventive maintenance strategy and a Bernoulli vacation policy through MMAPs
Juan Eloy Ruiz-Castro, Hugo Ala\'in Zapata-Ceballos
https://arxiv.org/abs/2510.11506
Giddy up.
✅ PowerToys 0.95 is here: new Light Switch utility, faster Command Palette, and Peek with Spacebar - Windows Command Line
https://devblogs.microsoft.com/commandline/powertoys-0-95-is-her…
Assurance of Frontier AI Built for National Security
Matteo Pistillo, Charlotte Stix
https://arxiv.org/abs/2510.08792 https://arxiv.org/pdf/2510.08792
Walk the Talk: Is Your Log-based Software Reliability Maintenance System Really Reliable?
Minghua He, Tong Jia, Chiming Duan, Pei Xiao, Lingzhe Zhang, Kangjin Wang, Yifan Wu, Ying Li, Gang Huang
https://arxiv.org/abs/2509.24352
Corrigendum to "Degree-Based Approximations for Network Reliability Polynomials". Comment on J. Complex Networks 2025, 13, cnaf001
Xinhan Liu, Piet Van Mieghem
https://arxiv.org/abs/2510.06247
Calibratable Disambiguation Loss for Multi-Instance Partial-Label Learning
Wei Tang, Yin-Fang Yang, Weijia Zhang, Min-Ling Zhang
https://arxiv.org/abs/2512.17788 https://arxiv.org/pdf/2512.17788 https://arxiv.org/html/2512.17788
arXiv:2512.17788v1 Announce Type: new
Abstract: Multi-instance partial-label learning (MIPL) is a weakly supervised framework that extends the principles of multi-instance learning (MIL) and partial-label learning (PLL) to address the challenges of inexact supervision in both instance and label spaces. However, existing MIPL approaches often suffer from poor calibration, undermining classifier reliability. In this work, we propose a plug-and-play calibratable disambiguation loss (CDL) that simultaneously improves classification accuracy and calibration performance. The loss has two instantiations: the first one calibrates predictions based on probabilities from the candidate label set, while the second one integrates probabilities from both candidate and non-candidate label sets. The proposed CDL can be seamlessly incorporated into existing MIPL and PLL frameworks. We provide a theoretical analysis that establishes the lower bound and regularization properties of CDL, demonstrating its superiority over conventional disambiguation losses. Experimental results on benchmark and real-world datasets confirm that our CDL significantly enhances both classification and calibration performance.
toXiv_bot_toot
PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks
Yunuo Liu, Dawei Zhu, Zena Al-Khalili, Dai Cheng, Yanjun Chen, Dietrich Klakow, Wei Zhang, Xiaoyu Shen
https://arxiv.org/abs/2510.12409
Proceedings of the International Workshop on Verification of Scientific Software
Stephen F. Siegel, Ganesh Gopalakrishnan
https://arxiv.org/abs/2510.12314 https://
Joyride: Rethinking Linux's network stack design for better performance, security, and reliability
Yanlin Du, Ruslan Nikolaev
https://arxiv.org/abs/2509.25015 https://
The National Security Strategy of the US is worth reading. It clearly shows that the USA is no longer an ally of Europe and that NATO should be worried about the reliability of the US. It also shows how completely unhinged and extreme right the US government has become.
https://www.
RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation
Vasilije Stambolic, Aritra Dhar, Lukas Cavigelli
https://arxiv.org/abs/2510.11195 https://
Quantifying spike train synchrony and directionality: Measures and Applications
Thomas Kreuz
https://arxiv.org/abs/2510.07140 https://arxiv.org/pdf/2510.07…
New FDA turmoil throws agency's reliability into question (Axios)
https://www.axios.com/2025/11/04/fda-staff-exits-george-tidmarsh-hhs
http://www.memeorandum.com/251104/p122#a251104p122
OpenAI says GPT-5.2 Thinking hallucinates less than GPT-5.1 and has improved reliability for agentic AI needs; pre-release testers include Notion, Box, Shopify (Hayden Field/The Verge)
https://www.theverge.com/ai-artificial-intelligence/842529/open…
Efficient Group Lasso Regularized Rank Regression with Data-Driven Parameter Determination
Meixia Lin, Meijiao Shi, Yunhai Xiao, Qian Zhang
https://arxiv.org/abs/2510.11546 http…
Between Knowledge and Care: Evaluating Generative AI-Based IUI in Type 2 Diabetes Management Through Patient and Physician Perspectives
Yibo Meng, Ruiqi Chen, Zhiming Liu, Xiaolan Ding, Yan Guan
https://arxiv.org/abs/2510.10048
A Deep Multi-Task Learning Approach to Impulsive Noise Parameter Estimation
Abdullahi Mohammad, Bdah Eya, Bassant Selim
https://arxiv.org/abs/2510.12179 https://
Proactive and Reactive Autoscaling Techniques for Edge Computing
Suhrid Gupta, Muhammed Tawfiqul Islam, Rajkumar Buyya
https://arxiv.org/abs/2510.10166 https://
log scales like "number of nines" are fun when you start to get to poor levels of performance.
... in a room full of people who would have laughed at them if they had said "let's get up to three nines of reliability",
a team I work with took a KPI to "get loss rates down below 20%" and received wise nods from most of the engineers in the room.
...but that's the equivalent of "let's get up to 0.5 nines"
that's a …
Reliability Sensitivity with Response Gradient
Siu-Kui Au, Zi-Jun Cao
https://arxiv.org/abs/2510.09315 https://arxiv.org/pdf/2510.09315
Part of what makes the Steamdeck great is its stand-by reliability.
Because most desktop/console games are designed for long play sessions, they are difficult to consume on the go. Valve, by making standby seamless, allows for these games to be paused and then continued seamlessly.
Exploratory Semantic Reliability Analysis of Wind Turbine Maintenance Logs using Large Language Models
Max Malyi, Jonathan Shek, Andre Biscaya
https://arxiv.org/abs/2509.22366 h…
The study involved leading laboratories across multiple countries testing identical samples of gut microbiome bacteria. Results revealed startling inconsistencies, with accuracy measures varying dramatically between laboratories – despite analysing the same samples.
MHRA-led study reveals major inconsistencies in global microbiome research
Power Reserve Capacity from Virtual Power Plants with Reliability and Cost Guarantees
Lorenzo Zapparoli, Blazhe Gjorgiev, Giovanni Sansavini
https://arxiv.org/abs/2510.04815 htt…
Evaluating Hallucinations in Multimodal LLMs with Spoken Queries under Diverse Acoustic Conditions
Hansol Park, Hoseong Ahn, Junwon Moon, Yejin Lee, Kyuhong Shim
https://arxiv.org/abs/2510.08581
The age and metallicity dependence of the near-infrared absolute magnitude and colour of red clump stars
Hiroki Onozato, Yoshifusa Ita, Yoshikazu Nakada
https://arxiv.org/abs/2510.09168
Optimizing Cross-Domain Transfer for Universal Machine Learning Interatomic Potentials
Jaesun Kim, Jinmu You, Yutack Park, Yunsung Lim, Yujin Kang, Jisu Kim, Haekwan Jeon, Deokgi Hong, Seung Yul Lee, Saerom Choi, Yongdeok Kim, Jae W. Lee, Seungwu Han
https://arxiv.org/abs/2510.11241
Analyzing Data Quality and Decay in Mega-Constellations: A Physics-Informed Machine Learning Approach
Katarina Dyreby, Francisco Caldas, Cl\'audia Soares
https://arxiv.org/abs/2510.11242
The Landscape of problematic papers in the field of non-coding RNA
Ying Lou, Zhengyi Zhou, Guosheng Wang, Zhesi Shen, Menghui Li
https://arxiv.org/abs/2509.24511 https://…
Visible Light Communication for Vehicular Networks: A Tutorial
Pedro E. G\'oria Silva, Eduardo S. Lima, Jules M. Moualeu, Mohamed Korium, Pedro H. J. Nardelli
https://arxiv.org/abs/2510.11123
Few Shot Semi-Supervised Learning for Abnormal Stop Detection from Sparse GPS Trajectories
Muhammad Ayub Sabir, Junbiao Pang, Jiaqi Wu, Fatima Ashraf
https://arxiv.org/abs/2510.12686
Embedding-Aware Noise Modeling of Quantum Annealing
Seon-Geun Jeong, Mai Dinh Cong, Dae-Il Noh, Quoc-Viet Pham, Won-Joo Hwang
https://arxiv.org/abs/2510.04594 https://
Stand Up, NAO! Increasing the Reliability of Stand-Up Motions Through Error Compensation in Position Control
Philip Reichenberg, Tim Laue
https://arxiv.org/abs/2510.02129 https:…
MMOT: The First Challenging Benchmark for Drone-based Multispectral Multi-Object Tracking
Tianhao Li, Tingfa Xu, Ying Wang, Haolin Qin, Xu Lin, Jianan Li
https://arxiv.org/abs/2510.12565
TripScore: Benchmarking and rewarding real-world travel planning with fine-grained evaluation
Yincen Qu, Huan Xiao, Feng Li, Hui Zhou, Xiangying Dai
https://arxiv.org/abs/2510.09011
Defects4C: Benchmarking Large Language Model Repair Capability with C/C Bugs
Jian Wang, Xiaofei Xie, Qiang Hu, Shangqing Liu, Jiongchi Yu, Jiaolong Klong, Yi Li
https://arxiv.org/abs/2510.11059
📈 Build dashboards: Visualize input/output token usage, sessions & conversations, total costs in USD, terminal type distribution (#VSCode, Apple Terminal), requests per user & tool type usage (Read, Edit, LS, Bash)
🎯 Real insights: Measure ROI & productivity gains, spot performance bottlenecks & reliability issues, track adoption trends & user trust via accept/reject …
Bias-Aware AI Chatbot for Engineering Advising at the University of Maryland A. James Clark School of Engineering
Prarthana P. Kartholy, Thandi M. Labor, Neil N. Panchal, Sean H. Wang, Hillary N. Owusu
https://arxiv.org/abs/2510.09636
Crosslisted article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[4/4]:
- EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalizat...
Peng, Wen, Yang, Fu, Chen, Liu, Wu, Zheng, Sarfraz, Van Gool, Paudel, Stiefelhagen
Shallow Robustness, Deep Vulnerabilities: Multi-Turn Evaluation of Medical LLMs
Blazej Manczak, Eric Lin, Francisco Eiras, James O' Neill, Vaikkunth Mugunthan
https://arxiv.org/abs/2510.12255
Based on Deep Neural Networks: A Machine Learning-Assisted Channel Estimation Method for MIMO Systems
Haoran He
https://arxiv.org/abs/2510.11891 https://ar…
I never could have been literate in Chinese. Not that I could not read those 2 characters as different, but I’d never be able to write characters with that fine a distinction with any sort of reliability. American cursive was bad enough. https://mastodon.social/@mcc/115651531673910237
Inducing State Anxiety in LLM Agents Reproduces Human-Like Biases in Consumer Decision-Making
Ziv Ben-Zion, Zohar Elyoseph, Tobias Spiller, Teddy Lazebnik
https://arxiv.org/abs/2510.06222
IntersectioNDE: Learning Complex Urban Traffic Dynamics based on Interaction Decoupling Strategy
Enli Lin, Ziyuan Yang, Qiujing Lu, Jianming Hu, Shuo Feng
https://arxiv.org/abs/2510.11534
RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection
Yuxin Wen, Arman Zharmagambetov, Ivan Evtimov, Narine Kokhlikyan, Tom Goldstein, Kamalika Chaudhuri, Chuan Guo
https://arxiv.org/abs/2510.04885
LLMs are All You Need? Improving Fuzz Testing for MOJO with Large Language Models
Linghan Huang, Peizhou Zhao, Huaming Chen
https://arxiv.org/abs/2510.10179 https://
Auto-Prompt Ensemble for LLM Judge
Jiajie Li, Huayi Zhang, Peng Lin, Jinjun Xiong, Wei Xu
https://arxiv.org/abs/2510.06538 https://arxiv.org/pdf/2510.06538…
Automated Neural Architecture Design for Industrial Defect Detection
Yuxi Liu, Yunfeng Ma, Yi Tang, Min Liu, Shuai Jiang, Yaonan Wang
https://arxiv.org/abs/2510.06669 https://…
Beating Harmful Stereotypes Through Facts: RAG-based Counter-speech Generation
Greta Damo, Elena Cabrio, Serena Villata
https://arxiv.org/abs/2510.12316 https://
What Do Temporal Graph Learning Models Learn?
Abigail J. Hayes, Tobias Schumacher, Markus Strohmaier
https://arxiv.org/abs/2510.09416 https://arxiv.org/pdf…
Latent-Feature-Informed Neural ODE Modeling for Lightweight Stability Evaluation of Black-box Grid-Tied Inverters
Jialin Zheng, Zhong Liu, Xiaonan Lu
https://arxiv.org/abs/2510.09826
Ultra-Reliable Risk-Aggregated Sum Rate Maximization via Model-Aided Deep Learning
Hassaan Hashmi, Spyridon Pougkakiotis, Dionysis Kalogerias
https://arxiv.org/abs/2509.26311 ht…
OBsmith: Testing JavaScript Obfuscator using LLM-powered sketching
Shan Jiang, Chenguang Zhu, Sarfraz Khurshid
https://arxiv.org/abs/2510.10066 https://arx…
"It feels like hard work trying to talk to it": Understanding Older Adults' Experiences of Encountering and Repairing Conversational Breakdowns with AI Systems
Niharika Mathur, Tamara Zubatiy, Agata Rozga, Elizabeth Mynatt
https://arxiv.org/abs/2510.06690
P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs
Shuai Zhao, Xinyi Wu, Shiqian Zhao, Xiaobao Wu, Zhongliang Guo, Yanhao Jia, Anh Tuan Luu
https://arxiv.org/abs/2510.04503
ExpertAgent: Enhancing Personalized Education through Dynamic Planning and Retrieval-Augmented Long-Chain Reasoning
Binrong Zhu, Guiran Liu, Nina Jiang
https://arxiv.org/abs/2510.07456
SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation
Ayush Zenith, Arnold Zumbrun, Neel Raut, Jing Lin
https://arxiv.org/abs/2510.06596 https://
Can We Reliably Rank Model Performance across Domains without Labeled Data?
Veronica Rammouz, Aaron Gonzalez, Carlos Cruzportillo, Adrian Tan, Nicole Beebe, Anthony Rios
https://arxiv.org/abs/2510.09519
Aggregate Modeling of Air-Conditioner Loads Under Packet-based Control with Both On and Off Grid Access Requests
Mohammad Hassan, Mads R. Almassalkhi
https://arxiv.org/abs/2510.10651
Towards Reliable LLM-based Robot Planning via Combined Uncertainty Estimation
Shiyuan Yin, Chenjia Bai, Zihao Zhang, Junwei Jin, Xinxin Zhang, Chi Zhang, Xuelong Li
https://arxiv.org/abs/2510.08044
Out-of-Distribution Detection from Small Training Sets using Bayesian Neural Network Classifiers
Kevin Raina, Tanya Schmah
https://arxiv.org/abs/2510.06025 https://
GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation
Wen Ye, Zhaocheng Liu, Yuwei Gui, Tingyu Yuan, Yunyue Su, Bowen Fang, Chaoyang Zhao, Qiang Liu, Liang Wang
https://arxiv.org/abs/2510.07217
Integrating Domain Knowledge into Process Discovery Using Large Language Models
Ali Norouzifar, Humam Kourani, Marcus Dees, Wil van der Aalst
https://arxiv.org/abs/2510.07161 ht…
Adaptive Reinforcement Learning for Dynamic Configuration Allocation in Pre-Production Testing
Yu Zhu
https://arxiv.org/abs/2510.05147 https://arxiv.org/pd…
UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene
Christian Maurer, Snehal Jauhri, Sophie Lueth, Georgia Chalvatzaki
https://arxiv.org/abs/2510.06754
Adaptive Tool Generation with Models as Tools and Reinforcement Learning
Chenpeng Wang, Xiaojie Cheng, Chunye Wang, Linfeng Yang, Lei Zhang
https://arxiv.org/abs/2510.06825 http…
Introspection in Learned Semantic Scene Graph Localisation
Manshika Charvi Bissessur, Efimia Panagiotaki, Daniele De Martini
https://arxiv.org/abs/2510.07053 https://
Towards Reliable Retrieval in RAG Systems for Large Legal Datasets
Markus Reuter, Tobias Lingenberg, R\=uta Liepi\c{n}a, Francesca Lagioia, Marco Lippi, Giovanni Sartor, Andrea Passerini, Burcu Sayin
https://arxiv.org/abs/2510.06999
Constrained Natural Language Action Planning for Resilient Embodied Systems
Grayson Byrd, Corban Rivera, Bethany Kemp, Meghan Booker, Aurora Schmidt, Celso M de Melo, Lalithkumar Seenivasan, Mathias Unberath
https://arxiv.org/abs/2510.06357
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
Haoxin Wang, Xiaolong Tu, Hongyu Ke, Huirong Chai, Dawei Chen, Kyungtae Han
https://arxiv.org/abs/2510.06126
Uncovering Overconfident Failures in CXR Models via Augmentation-Sensitivity Risk Scoring
Han-Jay Shu, Wei-Ning Chiu, Shun-Ting Chang, Meng-Ping Huang, Takeshi Tohyama, Ahram Han, Po-Chih Kuo
https://arxiv.org/abs/2510.01683
UnitTenX: Generating Tests for Legacy Packages with AI Agents Powered by Formal Verification
Yiannis Charalambous, Claudionor N. Coelho Jr, Luis Lamb, Lucas C. Cordeiro
https://arxiv.org/abs/2510.05441
Mitigating Judgment Preference Bias in Large Language Models through Group-Based Polling
Shuliang Liu, Zhipeng Xu, Zhenghao Liu, Yukun Yan, Minghe Yu, Yu Gu, Chong Chen, Huiyuan Xie, Ge Yu
https://arxiv.org/abs/2510.08145
Learning Stability Certificate for Robotics in Real-World Environments
Zhe Shen
https://arxiv.org/abs/2510.03123 https://arxiv.org/pdf/2510.03123
Real Time Headway Predictions in Urban Rail Systems and Implications for Service Control: A Deep Learning Approach
Muhammad Usama, Haris Koutsopoulos
https://arxiv.org/abs/2510.03121
Evaluating Large Language Models for IUCN Red List Species Information
Shinya Uryu
https://arxiv.org/abs/2510.02830 https://arxiv.org/pdf/2510.02830…
Bayesian E(3)-Equivariant Interatomic Potential with Iterative Restratification of Many-body Message Passing
Soohaeng Yoo Willow, Tae Hyeon Park, Gi Beom Sim, Sung Wook Moon, Seung Kyu Min, D. ChangMo Yang, Hyun Woo Kim, Juho Lee, Chang Woo Myung
https://arxiv.org/abs/2510.03046
PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning
Wanjia Zhao, Qinwei Ma, Jingzhe Shi, Shirley Wu, Jiaqi Han, Yijia Xiao, Si-Yuan Chen, Xiao Luo, Ludwig Schmidt, James Zou
https://arxiv.org/abs/2510.03185
Knowledge-Graph Based RAG System Evaluation Framework
Sicheng Dong, Vahid Zolfaghari, Nenad Petrovic, Alois Knoll
https://arxiv.org/abs/2510.02549 https://…
Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation
Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger, Sepp Hochreiter
https://arxiv.org/abs/2510.02279
Learning to Reason for Hallucination Span Detection
Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Kundan Krishna, Hadi Pouransari, Cheng-Yu Hsieh, Cem Koc, Joseph Yitan Cheng, Oncel Tuzel, Raviteja Vemulapalli
https://arxiv.org/abs/2510.02173