Tootfinder

@arXiv_hepph_bot@mastoxiv.page
2025-09-29 10:00:27

Fair Universe Higgs Uncertainty Challenge
Ragansu Chakkappai (Universit\'e Paris-Saclay, CNRS/IN2P3, IJCLab, Orsay, France, ChaLearn, USA), Wahid Bhimji (Lawrence Berkeley National Laboratory, Berkeley, USA), Paolo Calafiura (Lawrence Berkeley National Laboratory, Berkeley, USA), Po-Wen Chang (Lawrence Berkeley National Laboratory, Berkeley, USA), Yuan-Tang Chou (University of Washington, Seattle, USA), Sascha Diefenbacher (Lawrence Berkeley National Laboratory, Berkeley, USA), Jor…

Fair Universe Higgs Uncertainty Challenge
This competition in high-energy physics (HEP) and machine learning was the first to strongly emphasise uncertainties in $(H \rightarrow τ^+ τ^-)$ cross-section measurement. Participants were tasked with developing advanced analysis techniques capable of dealing with uncertainties in the input training data and providing credible confidence intervals. The accuracy of these intervals was evaluated using pseudo-experiments to assess correct coverage. The dataset is now published in Zenodo, and t…

@arXiv_csIR_bot@mastoxiv.page
2025-10-03 09:00:51

IoDResearch: Deep Research on Private Heterogeneous Data via the Internet of Data
Zhuofan Shi, Zijie Guo, Xinjian Ma, Gang Huang, Yun Ma, Xiang Jing
https://arxiv.org/abs/2510.01553

IoDResearch: Deep Research on Private Heterogeneous Data via the Internet of Data
The rapid growth of multi-source, heterogeneous, and multimodal scientific data has increasingly exposed the limitations of traditional data management. Most existing DeepResearch (DR) efforts focus primarily on web search while overlooking local private data. Consequently, these frameworks exhibit low retrieval efficiency for private data and fail to comply with the FAIR principles, ultimately resulting in inefficiency and limited reusability. To this end, we propose IoDResearch (Internet of D…

@arXiv_csCY_bot@mastoxiv.page
2025-10-14 07:41:22

Responsible AI Adoption in the Public Sector: A Data-Centric Taxonomy of AI Adoption Challenges
Anastasija Nikiforova, Martin Lnenicka, Ulf Melin, David Valle-Cruz, Asif Gill, Cesar Casiano Flores, Emyana Sirait, Mariusz Luterek, Richard Michael Dreyling, Barbora Tesarova
https://arxiv.org/abs/2510.09634

Responsible AI Adoption in the Public Sector: A Data-Centric Taxonomy of AI Adoption Challenges
Despite Artificial Intelligence (AI) transformative potential for public sector services, decision-making, and administrative efficiency, adoption remains uneven due to complex technical, organizational, and institutional challenges. Responsible AI frameworks emphasize fairness, accountability, and transparency, aligning with principles of trustworthy AI and fair AI, yet remain largely aspirational, overlooking technical and institutional realities, especially foundational data and governance. …

@arXiv_csDB_bot@mastoxiv.page
2025-10-01 08:55:57

The Grammar of FAIR: A Granular Architecture of Semantic Units for FAIR Semantics, Inspired by Biology and Linguistics
Lars Vogt, Barend Mons
https://arxiv.org/abs/2509.26434 ht…

The Grammar of FAIR: A Granular Architecture of Semantic Units for FAIR Semantics, Inspired by Biology and Linguistics
The FAIR Principles aim to make data and knowledge Findable, Accessible, Interoperable, and Reusable, yet current digital infrastructures often lack a unifying semantic framework that bridges human cognition and machine-actionability. In this paper, we introduce the Grammar of FAIR: a granular and modular architecture for FAIR semantics built on the concept of semantic units. Semantic units, comprising atomic statement units and composite compound units, implement the principle of semantic modu…

@arXiv_csLG_bot@mastoxiv.page
2025-12-22 11:50:19

Crosslisted article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[1/3]:
- Optimizing Text Search: A Novel Pattern Matching Algorithm Based on Ukkonen's Approach
Xinyu Guan, Shaohua Zhang
https://arxiv.org/abs/2512.16927 https://mastoxiv.page/@arXiv_csDS_bot/115762062326187898
- SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization
Shravan Chaudhari, Rahul Thomas Jacob, Mononito Goswami, Jiajun Cao, Shihab Rashid, Christian Bock
https://arxiv.org/abs/2512.16956 https://mastoxiv.page/@arXiv_csSE_bot/115762248476963893
- MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval
Saksham Sahai Srivastava, Haoyu He
https://arxiv.org/abs/2512.16962 https://mastoxiv.page/@arXiv_csCR_bot/115762140339109012
- Colormap-Enhanced Vision Transformers for MRI-Based Multiclass (4-Class) Alzheimer's Disease Clas...
Faisal Ahmed
https://arxiv.org/abs/2512.16964 https://mastoxiv.page/@arXiv_eessIV_bot/115762196702065869
- Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
Wanghan Xu, et al.
https://arxiv.org/abs/2512.16969 https://mastoxiv.page/@arXiv_csAI_bot/115762050529328276
- PAACE: A Plan-Aware Automated Agent Context Engineering Framework
Kamer Ali Yuksel
https://arxiv.org/abs/2512.16970 https://mastoxiv.page/@arXiv_csAI_bot/115762054461584205
- A Women's Health Benchmark for Large Language Models
Elisabeth Gruber, et al.
https://arxiv.org/abs/2512.17028 https://mastoxiv.page/@arXiv_csCL_bot/115762049873946945
- Perturb Your Data: Paraphrase-Guided Training Data Watermarking
Pranav Shetty, Mirazul Haque, Petr Babkin, Zhiqiang Ma, Xiaomo Liu, Manuela Veloso
https://arxiv.org/abs/2512.17075 https://mastoxiv.page/@arXiv_csCL_bot/115762077400293945
- Disentangled representations via score-based variational autoencoders
Benjamin S. H. Lyo, Eero P. Simoncelli, Cristina Savin
https://arxiv.org/abs/2512.17127 https://mastoxiv.page/@arXiv_statML_bot/115762251753966702
- Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors
Huixin Zhan
https://arxiv.org/abs/2512.17146 https://mastoxiv.page/@arXiv_csCR_bot/115762318582013305
- Application of machine learning to predict food processing level using Open Food Facts
Arora, Chauhan, Rana, Aditya, Bhagat, Kumar, Kumar, Semar, Singh, Bagler
https://arxiv.org/abs/2512.17169 https://mastoxiv.page/@arXiv_qbioBM_bot/115762302873829397
- Systemic Risk Radar: A Multi-Layer Graph Framework for Early Market Crash Warning
Sandeep Neela
https://arxiv.org/abs/2512.17185 https://mastoxiv.page/@arXiv_qfinRM_bot/115762275982224870
- Do Foundational Audio Encoders Understand Music Structure?
Keisuke Toyama, Zhi Zhong, Akira Takahashi, Shusuke Takahashi, Yuki Mitsufuji
https://arxiv.org/abs/2512.17209 https://mastoxiv.page/@arXiv_csSD_bot/115762341541572505
- CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency
Xiao Liang, Yuxuan An, Di Wang, Jiawei Hu, Zhicheng Jiao, Bin Jing, Quan Wang
https://arxiv.org/abs/2512.17213 https://mastoxiv.page/@arXiv_csCV_bot/115762574180736975
- Machine Learning Assisted Parameter Tuning on Wavelet Transform Amorphous Radial Distribution Fun...
Deriyan Senjaya, Stephen Ekaputra Limantoro
https://arxiv.org/abs/2512.17245 https://mastoxiv.page/@arXiv_condmatmtrlsci_bot/115762447037143855
- AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs
Madhava Gaikwad
https://arxiv.org/abs/2512.17251 https://mastoxiv.page/@arXiv_csCR_bot/115762396593872943
- Practical Framework for Privacy-Preserving and Byzantine-robust Federated Learning
Baolei Zhang, Minghong Fang, Zhuqing Liu, Biao Yi, Peizhao Zhou, Yuan Wang, Tong Li, Zheli Liu
https://arxiv.org/abs/2512.17254 https://mastoxiv.page/@arXiv_csCR_bot/115762402470985707
- Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling A...
Abhivansh Gupta
https://arxiv.org/abs/2512.17259 https://mastoxiv.page/@arXiv_csMA_bot/115762225538364939
- Warmer for Less: A Cost-Efficient Strategy for Cold-Start Recommendations at Pinterest
Saeed Ebrahimi, Weijie Jiang, Jaewon Yang, Olafur Gudmundsson, Yucheng Tu, Huizhong Duan
https://arxiv.org/abs/2512.17277 https://mastoxiv.page/@arXiv_csIR_bot/115762214396869930
- LibriVAD: A Scalable Open Dataset with Deep Learning Benchmarks for Voice Activity Detection
Ioannis Stylianou, Achintya kr. Sarkar, Nauman Dawalatabad, James Glass, Zheng-Hua Tan
https://arxiv.org/abs/2512.17281 https://mastoxiv.page/@arXiv_csSD_bot/115762361858560703
- Penalized Fair Regression for Multiple Groups in Chronic Kidney Disease
Carter H. Nakamoto, Lucia Lushi Chen, Agata Foryciarz, Sherri Rose
https://arxiv.org/abs/2512.17340 https://mastoxiv.page/@arXiv_statME_bot/115762446402738033
toXiv_bot_toot

@arXiv_statML_bot@mastoxiv.page
2025-10-09 09:23:41

PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing
Jianhan Zhang, Jitao Wang, Chengchun Shi, John D. Piette, Donglin Zeng, Zhenke Wu
https://arxiv.org/abs/2510.06935

PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing
Reinforcement learning (RL) aims to learn and evaluate a sequential decision rule, often referred to as a "policy", that maximizes the population-level benefit in an environment across possibly infinitely many time steps. However, the sequential decisions made by an RL algorithm, while optimized to maximize overall population benefits, may disadvantage certain individuals who are in minority or socioeconomically disadvantaged groups. To address this problem, we introduce PyCFRL, a Python librar…

@arXiv_csDS_bot@mastoxiv.page
2025-10-08 08:34:59

Improved Streaming Algorithm for Fair $k$-Center Clustering
Longkun Guo, Zeyu Lin, Chaoqi Jia, Chao Chen
https://arxiv.org/abs/2510.05937 https://arxiv.org…

Improved Streaming Algorithm for Fair $k$-Center Clustering
Many real-world applications pose challenges in incorporating fairness constraints into the $k$-center clustering problem, where the dataset consists of $m$ demographic groups, each with a specified upper bound on the number of centers to ensure fairness. Focusing on big data scenarios, this paper addresses the problem in a streaming setting, where data points arrive one by one sequentially in a continuous stream. Leveraging a structure called the $λ$-independent center set, we propose a one-p…

@tiotasram@kolektiva.social
2025-11-09 12:09:40

Imagine ChatGPT but instead of predicting text it just linked you to the to 3 documents most-influential on the probabilities that would have been used to predict that text.
Could even generate some info about which parts of each would have been combined how.
There would still be issues with how training data is sourced and filtered, but these could be solved by crawling normally respecting robots.txt and by paying filterers a fair wage with a more relaxed work schedule and mental health support.
The energy issues are mainly about wild future investment and wasteful query spam, not optimized present-day per-query usage.
Is this "just search?"
Yes, but it would have some advantages for a lot of use cases, mainly in synthesizing results across multiple documents and in leveraging a language model more fully to find relevant stuff.
When we talk about the harms of current corporate LLMs, the opportunity cost of NOT building things like this is part of that.
The equivalent for art would have been so amazing too! "Here are some artists that can do what you want, with examples pulled from their portfolios."
It would be a really cool coding assistant that I'd actually encourage my students to use (with some guidelines).
#AI #GenAI #LLMs

@andres4ny@social.ridetrans.it
2025-10-07 20:06:59

The propaganda is so wild. When I was younger, they'd lie about things 20-30 years prior, with the assumption that people will have forgotten. Now they are constantly attempting to rewrite history that happened a mere 5 years ago, thinking that they're slick.
To quote Orwell: "The Party told you to reject the evidence of your eyes and ears. It was their final, most essential command."

Renee DiResta
noupside.bsky.social@bsky.brid.gy
“Who was president in 2020?” is the question of our time

[which quotes Senator Jim Banks saying, "The 2020 Census was a fraud. The Biden admin used a shady 'privacy' formula that scrambled the data and miscounted 14 states. It included illegal immigrants and handed Democrats extra seats. Americans deserve a fair count and I'm fighting to fix it."]

@arXiv_eessIV_bot@mastoxiv.page
2025-09-30 08:51:31

Achieving Fair Skin Lesion Detection through Skin Tone Normalization and Channel Pruning
Zihan Wei, Tapabrata Chakraborti
https://arxiv.org/abs/2509.22712 https://

Achieving Fair Skin Lesion Detection through Skin Tone Normalization and Channel Pruning
Recent works have shown that deep learning based skin lesion image classification models trained on unbalanced dataset can exhibit bias toward protected demographic attributes such as race, age,and gender. Current bias mitigation methods usually either achieve high level of fairness with the degradation of accuracy, or only improve the model fairness on a single attribute. Additionally usually most bias mitigation strategies are either pre hoc through data processing or post hoc through fairnes…

@arXiv_csGT_bot@mastoxiv.page
2025-10-02 07:35:20

Dynamic Necklace Splitting
Rishi Advani, Abolfazl Asudeh, Mohsen Dehghankar, Stavros Sintos
https://arxiv.org/abs/2510.00162 https://arxiv.org/pdf/2510.001…

Dynamic Necklace Splitting
The necklace splitting problem is a classic problem in fair division with many applications, including data-informed fair hash maps. We extend necklace splitting to a dynamic setting, allowing for relocation, insertion, and deletion of beads. We present linear-time, optimal algorithms for the two-color case that support all dynamic updates. For more than two colors, we give linear-time, optimal algorithms for relocation subject to a restriction on the number of agents. Finally, we propose a ran…

@arXiv_eessSP_bot@mastoxiv.page
2025-10-10 12:14:50

Crosslisted article(s) found for eess.SP. https://arxiv.org/list/eess.SP/new
[1/1]:
- Estimating Fair Graphs from Graph-Stationary Data
Madeline Navarro, Andrei Buciulea, Samuel Rey, Antonio G. Marques, Santiago Segarra

@arXiv_csDL_bot@mastoxiv.page
2025-10-13 07:44:10

Generating CodeMeta using declarative mapping rules: An open-ended approach using ShExML
Herminio Garc\'ia-Gonz\'alez
https://arxiv.org/abs/2510.09172 https://

Generating CodeMeta using declarative mapping rules: An open-ended approach using ShExML
Nowadays, software is one of the cornerstones when conducting research in several scientific fields which employ computer-based methodologies to answer new research questions. However, for these experiments to be completely reproducible, research software should comply with the FAIR principles, yet its metadata can be represented following different data models and spread across different locations. In order to bring some cohesion to the field, CodeMeta was proposed as a vocabulary to represent…

@arXiv_physicschemph_bot@mastoxiv.page
2025-09-30 08:33:01

WTMAD-4: A Fair Weighting Scheme for GMTKN55
Kyle R. Bryenton, Erin R. Johnson
https://arxiv.org/abs/2509.23498 https://arxiv.org/pdf/2509.23498

WTMAD-4: A Fair Weighting Scheme for GMTKN55
The GMTKN55 data set is a collection of standard benchmarks used in molecular quantum chemistry that spans small- and large-molecule thermochemistry, reaction barriers, and non-covalent interactions. Herein, we identify a flaw in the weighted mean absolute deviation (WTMAD) definitions commonly used to quantify performance of various electronic-structure methods for the GMTKN55 set, which under-weight some of its component benchmarks by orders of magnitude. A new WTMAD-4 metric is proposed, bas…

@arXiv_csDL_bot@mastoxiv.page
2025-10-07 09:48:42

LLM-Based Information Extraction to Support Scientific Literature Research and Publication Workflows
Samy Ateia, Udo Kruschwitz, Melanie Scholz, Agnes Koschmider, Moayad Almohaishi
https://arxiv.org/abs/2510.04749

LLM-Based Information Extraction to Support Scientific Literature Research and Publication Workflows
The increasing volume of scholarly publications requires advanced tools for efficient knowledge discovery and management. This paper introduces ongoing work on a system using Large Language Models (LLMs) for the semantic extraction of key concepts from scientific documents. Our research, conducted within the German National Research Data Infrastructure for and with Computer Science (NFDIxCS) project, seeks to support FAIR (Findable, Accessible, Interoperable, and Reusable) principles in scienti…

Tootfinder

Opt-in global Mastodon full text search. Join the index!