Tootfinder

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:42:41

Scaling Vision Transformers: Evaluating DeepSpeed for Image-Centric Workloads
Huy Trinh, Rebecca Ma, Zeqi Yu, Tahsin Reza
https://arxiv.org/abs/2602.21081 https://arxiv.org/pdf/2602.21081 https://arxiv.org/html/2602.21081
arXiv:2602.21081v1 Announce Type: new
Abstract: Vision Transformers (ViTs) have demonstrated remarkable potential in image processing tasks by utilizing self-attention mechanisms to capture global relationships within data. However, their scalability is hindered by significant computational and memory demands, especially for large-scale models with many parameters. This study aims to leverage DeepSpeed, a highly efficient distributed training framework that is commonly used for language models, to enhance the scalability and performance of ViTs. We evaluate intra- and inter-node training efficiency across multiple GPU configurations on various datasets like CIFAR-10 and CIFAR-100, exploring the impact of distributed data parallelism on training speed, communication overhead, and overall scalability (strong and weak scaling). By systematically varying software parameters, such as batch size and gradient accumulation, we identify key factors influencing performance of distributed training. The experiments in this study provide a foundational basis for applying DeepSpeed to image-related tasks. Future work will extend these investigations to deepen our understanding of DeepSpeed's limitations and explore strategies for optimizing distributed training pipelines for Vision Transformers.
toXiv_bot_toot

@cyrevolt@mastodon.social
2026-01-24 16:05:02

Gemini: "Google does not train its foundational models on private data under NDA, such as non-public chip documentation or confidential enterprise files. My training relies on publicly available web content, licensed datasets, and code."
Well that is unfortunate, because it's the actually interesting case for me to use it. 🤫

@johl@mastodon.xyz
2025-12-15 14:07:44

The very excellent “what happened last week” newsletter by @… focuses on “the A in AI stands for African” this week, namely on how Chinese AI companies are turning Kenya with its chronic unemployment of 67 percent into a hot spot of cheap AI labor.
More and more Kenyan students and recent graduates are hired to label thousands of videos a day through opaq…

The hidden Kenyan workers training China’s AI models
An unemployment crisis has created fertile ground for companies to step in with opaque systems built on WhatsApp groups, middlemen, and bargain-basement wages.

@Techmeme@techhub.social
2026-02-24 07:20:49

A look at the challenges some AI developers face in building models to extract trillions of high-quality tokens from PDFs, which are hard to parse, for training (Josh Dzieza/The Verge)
https://www.theverge.com/ai-artificial-intelligence/882891/ai-pdf-parsing…

How many AIs does it take to read a PDF?
For all of the AI industry’s advancements, the major models like ChatGPT and Claude still struggle with PDFs, one of the oldest and ubiquitous file formats.

@arXiv_physicsfludyn_bot@mastoxiv.page
2026-02-27 08:32:10

From synthetic turbulence to true solutions: A deep diffusion model for discovering periodic orbits in the Navier-Stokes equations
Jeremy P Parker, Tobias M Schneider
https://arxiv.org/abs/2602.23181 https://arxiv.org/pdf/2602.23181 https://arxiv.org/html/2602.23181
arXiv:2602.23181v1 Announce Type: new
Abstract: Generative artificial intelligence has shown remarkable success in synthesizing data that mimic complex real-world systems, but its potential role in the discovery of mathematically meaningful structures in physical models remains underexplored. In this work, we demonstrate how a generative diffusion model can be used to uncover previously unknown solutions of a nonlinear partial differential equation: the two-dimensional Navier-Stokes equations in a turbulent regime. Trained on data from a direct numerical simulation of turbulence, the model learns to generate time series that resemble physically plausible trajectories. By carefully modifying the temporal structure of the model and enforcing the symmetries of the governing equations, we produce synthetic trajectories that are periodic in time, despite the fact that the training data did not contain periodic trajectories. These synthetic trajectories are then refined into true solutions using an iterative solver, yielding 111 new periodic orbits (POs) with very short periods. Our results reveal a previously unobserved richness in the PO structure of this system and suggest a broader role for generative AI: not as replacements for simulation and existing solvers, but as a complementary tool for navigating the complex solution spaces of nonlinear dynamical systems.
toXiv_bot_toot

@Techmeme@techhub.social
2026-02-24 10:50:49

Anthropic introduces "persona selection model", a theory to explain AI's human-like behavior, and details how AI personas form in pre-training and post-training (Anthropic)
https://www.anthropic.com/research/persona-selection-model

The persona selection model
A theory of why AI models act like humans

@rasos@fairmove.net
2026-01-17 09:05:38

Could we become co-owner of #AI models, that used training data, which I published under a viral license?
We have briefly discussed this question in the Austrian #CreativeCommons chapter and came to the conclusion, that copyright can by only claimed by human beings. So the model itself and wha…

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:39:11

Extending $\mu$P: Spectral Conditions for Feature Learning Across Optimizers
Akshita Gupta, Marieme Ngom, Sam Foreman, Venkatram Vishwanath
https://arxiv.org/abs/2602.20937 https://arxiv.org/pdf/2602.20937 https://arxiv.org/html/2602.20937
arXiv:2602.20937v1 Announce Type: new
Abstract: Several variations of adaptive first-order and second-order optimization methods have been proposed to accelerate and scale the training of large language models. The performance of these optimization routines is highly sensitive to the choice of hyperparameters (HPs), which are computationally expensive to tune for large-scale models. Maximal update parameterization $(\mu$P$)$ is a set of scaling rules which aims to make the optimal HPs independent of the model size, thereby allowing the HPs tuned on a smaller (computationally cheaper) model to be transferred to train a larger, target model. Despite promising results for SGD and Adam, deriving $\mu$P for other optimizers is challenging because the underlying tensor programming approach is difficult to grasp. Building on recent work that introduced spectral conditions as an alternative to tensor programs, we propose a novel framework to derive $\mu$P for a broader class of optimizers, including AdamW, ADOPT, LAMB, Sophia, Shampoo and Muon. We implement our $\mu$P derivations on multiple benchmark models and demonstrate zero-shot learning rate transfer across increasing model width for the above optimizers. Further, we provide empirical insights into depth-scaling parameterization for these optimizers.
toXiv_bot_toot

@Mediagazer@mstdn.social
2026-02-24 14:55:50

South Korea's three major terrestrial broadcasters, KBS, MBC, and SBS, sue OpenAI, alleging it used their news content to train AI models without authorization (Lee Yoon-seo/The Korea Herald)
https://www.koreaherald.com/article/10680650

Korea's top TV networks sue OpenAI for training ChatGPT with their news - The Korea Herald
South Korea's three major terrestrial broadcasters KBS, MBC, and SBS filed a lawsuit against OpenAI, seeking an injunction to halt alleged copyright infringemen

@Techmeme@techhub.social
2026-02-26 21:15:48

Sources: Meta last week scrapped the most advanced AI chip it was developing, after struggling with the design, and shifted its focus to a less complicated chip (The Information)
https://www.theinformation.com/articles/metas-internal-chip-design-effo…

Meta’s Internal Chip Design Efforts Hit Roadblocks
As Meta Platforms strikes new chip supply deals with AMD and Nvidia, it has been running into problems with AI chips it is designing internally, according to six people with direct knowledge of the matter. Meta last week scrapped the most advanced chip it was developing for training AI models, ...

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:44:31

The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum
Justin Deschenaux, Caglar Gulcehre, Subham Sekhar Sahoo
https://arxiv.org/abs/2602.21185 https://arxiv.org/pdf/2602.21185 https://arxiv.org/html/2602.21185
arXiv:2602.21185v1 Announce Type: new
Abstract: Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2
toXiv_bot_toot

@Techmeme@techhub.social
2025-12-19 21:06:41

AI robotics startup Physical Intelligence claims vision-language-action models learn to align human videos and robot data as pre-training is scaled up (Physical Intelligence)
https://www.physicalintelligence.company/research/human_to_robot

Emergence of Human to Robot Transfer in Vision-Language-Action Models
Exploring how transfer from human videos to robotic tasks emerges in robotic foundation models as they scale.

@arXiv_csGR_bot@mastoxiv.page
2026-01-21 08:13:41

Copy-Trasform-Paste: Zero-Shot Object-Object Alignment Guided by Vision-Language and Geometric Constraints
Rotem Gatenyo, Ohad Fried
https://arxiv.org/abs/2601.14207 https://arxiv.org/pdf/2601.14207 https://arxiv.org/html/2601.14207
arXiv:2601.14207v1 Announce Type: new
Abstract: We study zero-shot 3D alignment of two given meshes, using a text prompt describing their spatial relation -- an essential capability for content creation and scene assembly. Earlier approaches primarily rely on geometric alignment procedures, while recent work leverages pretrained 2D diffusion models to model language-conditioned object-object spatial relationships. In contrast, we directly optimize the relative pose at test time, updating translation, rotation, and isotropic scale with CLIP-driven gradients via a differentiable renderer, without training a new model. Our framework augments language supervision with geometry-aware objectives: a variant of soft-Iterative Closest Point (ICP) term to encourage surface attachment and a penetration loss to discourage interpenetration. A phased schedule strengthens contact constraints over time, and camera control concentrates the optimization on the interaction region. To enable evaluation, we curate a benchmark containing diverse categories and relations, and compare against baselines. Our method outperforms all alternatives, yielding semantically faithful and physically plausible alignments.
toXiv_bot_toot

@newsie@darktundra.xyz
2025-12-08 21:33:20

Trump plans executive order curbing state AI laws https://therecord.media/trump-plans-ai-exec-order-curbing-state-laws

Trump plans executive order curbing state AI laws
Legislators at both the state and federal level have increasingly scrutinized how AI models suck up data for training purposes.

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:45:11

Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking
Ravi Ghadia, Maksim Abraham, Sergei Vorobyov, Max Ryabinin
https://arxiv.org/abs/2602.21196 https://arxiv.org/pdf/2602.21196 https://arxiv.org/html/2602.21196
arXiv:2602.21196v1 Announce Type: new
Abstract: Efficiently processing long sequences with Transformer models usually requires splitting the computations across accelerators via context parallelism. The dominant approaches in this family of methods, such as Ring Attention or DeepSpeed Ulysses, enable scaling over the context dimension but do not focus on memory efficiency, which limits the sequence lengths they can support. More advanced techniques, such as Fully Pipelined Distributed Transformer or activation offloading, can further extend the possible context length at the cost of training throughput. In this paper, we present UPipe, a simple yet effective context parallelism technique that performs fine-grained chunking at the attention head level. This technique significantly reduces the activation memory usage of self-attention, breaking the activation memory barrier and unlocking much longer context lengths. Our approach reduces intermediate tensor memory usage in the attention layer by as much as 87.5$\%$ for 32B Transformers, while matching previous context parallelism techniques in terms of training speed. UPipe can support the context length of 5M tokens when training Llama3-8B on a single 8$\times$H100 node, improving upon prior methods by over 25$\%$.
toXiv_bot_toot

@Techmeme@techhub.social
2026-01-25 21:05:38

A profile of Mercor, which pays about $2M daily to ~30K experts training AI models at $95/hour on average, with roles like radiologists earning up to $375/hour (Bethan Staton/Financial Times)
https://www.ft.com/content/0cab0fcd-e355-40e8-83a3-2ad5066d7b48

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:36:21

On Electric Vehicle Energy Demand Forecasting and the Effect of Federated Learning
Andreas Tritsarolis, Gil Sampaio, Nikos Pelekis, Yannis Theodoridis
https://arxiv.org/abs/2602.20782 https://arxiv.org/pdf/2602.20782 https://arxiv.org/html/2602.20782
arXiv:2602.20782v1 Announce Type: new
Abstract: The wide spread of new energy resources, smart devices, and demand side management strategies has motivated several analytics operations, from infrastructure load modeling to user behavior profiling. Energy Demand Forecasting (EDF) of Electric Vehicle Supply Equipments (EVSEs) is one of the most critical operations for ensuring efficient energy management and sustainability, since it enables utility providers to anticipate energy/power demand, optimize resource allocation, and implement proactive measures to improve grid reliability. However, accurate EDF is a challenging problem due to external factors, such as the varying user routines, weather conditions, driving behaviors, unknown state of charge, etc. Furthermore, as concerns and restrictions about privacy and sustainability have grown, training data has become increasingly fragmented, resulting in distributed datasets scattered across different data silos and/or edge devices, calling for federated learning solutions. In this paper, we investigate different well-established time series forecasting methodologies to address the EDF problem, from statistical methods (the ARIMA family) to traditional machine learning models (such as XGBoost) and deep neural networks (GRU and LSTM). We provide an overview of these methods through a performance comparison over four real-world EVSE datasets, evaluated under both centralized and federated learning paradigms, focusing on the trade-offs between forecasting fidelity, privacy preservation, and energy overheads. Our experimental results demonstrate, on the one hand, the superiority of gradient boosted trees (XGBoost) over statistical and NN-based models in both prediction accuracy and energy efficiency and, on the other hand, an insight that Federated Learning-enabled models balance these factors, offering a promising direction for decentralized energy demand forecasting.
toXiv_bot_toot

@Techmeme@techhub.social
2025-12-16 01:05:43

The Allen Institute of AI launches Bolmo 7B and Bolmo 1B, claiming they are "the first fully open byte-level language models", built on its Olmo 3 models (Emilia David/VentureBeat)
https://venturebeat.com/ai/bolmos-architecture-unlock…

@arXiv_csLG_bot@mastoxiv.page
2025-12-22 10:34:10

Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation
Liam Collins, Bhuvesh Kumar, Clark Mingxuan Ju, Tong Zhao, Donald Loveland, Leonardo Neves, Neil Shah
https://arxiv.org/abs/2512.17820 https://arxiv.org/pdf/2512.17820 https://arxiv.org/html/2512.17820
arXiv:2512.17820v1 Announce Type: new
Abstract: Modern Sequential Recommendation (SR) models commonly utilize modality features to represent items, motivated in large part by recent advancements in language and vision modeling. To do so, several works completely replace ID embeddings with modality embeddings, claiming that modality embeddings render ID embeddings unnecessary because they can match or even exceed ID embedding performance. On the other hand, many works jointly utilize ID and modality features, but posit that complex fusion strategies, such as multi-stage training and/or intricate alignment architectures, are necessary for this joint utilization. However, underlying both these lines of work is a lack of understanding of the complementarity of ID and modality features. In this work, we address this gap by studying the complementarity of ID- and text-based SR models. We show that these models do learn complementary signals, meaning that either should provide performance gain when used properly alongside the other. Motivated by this, we propose a new SR method that preserves ID-text complementarity through independent model training, then harnesses it through a simple ensembling strategy. Despite this method's simplicity, we show it outperforms several competitive SR baselines, implying that both ID and text features are necessary to achieve state-of-the-art SR performance but complex fusion architectures are not.
toXiv_bot_toot

@Techmeme@techhub.social
2026-02-18 01:56:04

Anthropic expects to pay Amazon, Google, and Microsoft $80B total to run its models on their servers through 2029, plus an additional $100B for training costs (The Information)
https://www.theinformation.com/articles/anthropic-sweetens-deal-cloud-providers

How Anthropic Sweetens the Deal for Its Cloud Providers
Anthropic expects to pay Amazon, Google and Microsoft at least $80 billion to run its Claude AI on their cloud servers through 2029, according to the startup’s most optimistic recent forecasts. That’s not the only way the tech giants can make money from Anthropic: They get a cut of the revenue ...

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:37:41

Transcoder Adapters for Reasoning-Model Diffing
Nathan Hu, Jake Ward, Thomas Icard, Christopher Potts
https://arxiv.org/abs/2602.20904 https://arxiv.org/pdf/2602.20904 https://arxiv.org/html/2602.20904
arXiv:2602.20904v1 Announce Type: new
Abstract: While reasoning models are increasingly ubiquitous, the effects of reasoning training on a model's internal mechanisms remain poorly understood. In this work, we introduce transcoder adapters, a technique for learning an interpretable approximation of the difference in MLP computation before and after fine-tuning. We apply transcoder adapters to characterize the differences between Qwen2.5-Math-7B and its reasoning-distilled variant, DeepSeek-R1-Distill-Qwen-7B. Learned adapters are faithful to the target model's internal computation and next-token predictions. When evaluated on reasoning benchmarks, adapters match the reasoning model's response lengths and typically recover 50-90% of the accuracy gains from reasoning fine-tuning. Adapter features are sparsely activating and interpretable. When examining adapter features, we find that only ~8% have activating examples directly related to reasoning behaviors. We deeply study one such behavior -- the production of hesitation tokens (e.g., "wait"). Using attribution graphs, we trace hesitation to only ~2.4% of adapter features (5.6k total) performing one of two functions. These features are necessary and sufficient for producing hesitation tokens; removing them reduces response length, often without affecting accuracy. Overall, our results provide insight into reasoning training and suggest transcoder adapters may be useful for studying fine-tuning more broadly.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:35:21

WeirNet: A Large-Scale 3D CFD Benchmark for Geometric Surrogate Modeling of Piano Key Weirs
Lisa L\"uddecke, Michael Hohmann, Sebastian Eilermann, Jan Tillmann-Mumm, Pezhman Pourabdollah, Mario Oertel, Oliver Niggemann
https://arxiv.org/abs/2602.20714 https://arxiv.org/pdf/2602.20714 https://arxiv.org/html/2602.20714
arXiv:2602.20714v1 Announce Type: new
Abstract: Reliable prediction of hydraulic performance is challenging for Piano Key Weir (PKW) design because discharge capacity depends on three-dimensional geometry and operating conditions. Surrogate models can accelerate hydraulic-structure design, but progress is limited by scarce large, well-documented datasets that jointly capture geometric variation, operating conditions, and functional performance. This study presents WeirNet, a large 3D CFD benchmark dataset for geometric surrogate modeling of PKWs. WeirNet contains 3,794 parametric, feasibility-constrained rectangular and trapezoidal PKW geometries, each scheduled at 19 discharge conditions using a consistent free-surface OpenFOAM workflow, resulting in 71,387 completed simulations that form the benchmark and with complete discharge coefficient labels. The dataset is released as multiple modalities compact parametric descriptors, watertight surface meshes and high-resolution point clouds together with standardized tasks and in-distribution and out-of-distribution splits. Representative surrogate families are benchmarked for discharge coefficient prediction. Tree-based regressors on parametric descriptors achieve the best overall accuracy, while point- and mesh-based models remain competitive and offer parameterization-agnostic inference. All surrogates evaluate in milliseconds per sample, providing orders-of-magnitude speedups over CFD runtimes. Out-of-distribution results identify geometry shift as the dominant failure mode compared to unseen discharge values, and data-efficiency experiments show diminishing returns beyond roughly 60% of the training data. By publicly releasing the dataset together with simulation setups and evaluation pipelines, WeirNet establishes a reproducible framework for data-driven hydraulic modeling and enables faster exploration of PKW designs during the early stages of hydraulic planning.
toXiv_bot_toot

@Techmeme@techhub.social
2026-01-16 11:30:50

Cloudflare acquires AI data marketplace Human Native for an undisclosed sum, aiming to create a new system where AI developers pay creators for training content (Davis Giangiulio/CNBC)
https://www.cnbc.com/2026/01/15/cloudflare-ai-human-native-acquisition.html

Cloudflare acquires AI data marketplace Human Native
The internet infrastructure company wants AI developers to pay creators for content they use in their models.

@arXiv_csLG_bot@mastoxiv.page
2025-12-22 13:54:55

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[4/5]:
- Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gon\c{c}alo Faria, Noah A. Smith
https://arxiv.org/abs/2504.03790 https://mastoxiv.page/@arXiv_csCL_bot/114301112970577326
- A Survey on Archetypal Analysis
Aleix Alcacer, Irene Epifanio, Sebastian Mair, Morten M{\o}rup
https://arxiv.org/abs/2504.12392 https://mastoxiv.page/@arXiv_statME_bot/114357826909813483
- The Stochastic Occupation Kernel (SOCK) Method for Learning Stochastic Differential Equations
Michael L. Wells, Kamel Lahouel, Bruno Jedynak
https://arxiv.org/abs/2505.11622 https://mastoxiv.page/@arXiv_statML_bot/114539065460187982
- BOLT: Block-Orthonormal Lanczos for Trace estimation of matrix functions
Kingsley Yeon, Promit Ghosal, Mihai Anitescu
https://arxiv.org/abs/2505.12289 https://mastoxiv.page/@arXiv_mathNA_bot/114539035462135281
- Clustering and Pruning in Causal Data Fusion
Otto Tabell, Santtu Tikka, Juha Karvanen
https://arxiv.org/abs/2505.15215 https://mastoxiv.page/@arXiv_statML_bot/114550346291754635
- On the performance of multi-fidelity and reduced-dimensional neural emulators for inference of ph...
Chloe H. Choi, Andrea Zanoni, Daniele E. Schiavazzi, Alison L. Marsden
https://arxiv.org/abs/2506.11683 https://mastoxiv.page/@arXiv_statML_bot/114692410563481289
- Beyond Force Metrics: Pre-Training MLFFs for Stable MD Simulations
Maheshwari, Tang, Ock, Kolluru, Farimani, Kitchin
https://arxiv.org/abs/2506.14850 https://mastoxiv.page/@arXiv_physicschemph_bot/114709402590755731
- Quantifying Uncertainty in the Presence of Distribution Shifts
Yuli Slavutsky, David M. Blei
https://arxiv.org/abs/2506.18283 https://mastoxiv.page/@arXiv_statML_bot/114738165218533987
- ZKPROV: A Zero-Knowledge Approach to Dataset Provenance for Large Language Models
Mina Namazi, Alexander Nemecek, Erman Ayday
https://arxiv.org/abs/2506.20915 https://mastoxiv.page/@arXiv_csCR_bot/114754394485208892
- SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
Zhao, Huang, Xue, Kong, Liu, Tang, Beers, Ting, Luo
https://arxiv.org/abs/2507.01939 https://mastoxiv.page/@arXiv_astrophIM_bot/114788369702591337
- Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based I...
Ko Watanabe, Stanislav Frolov, Aya Hassan, David Dembinsky, Adriano Lucieri, Andreas Dengel
https://arxiv.org/abs/2507.17860 https://mastoxiv.page/@arXiv_csCV_bot/114912976717523345
- PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning
Yushi Feng, Junye Du, Yingying Hong, Qifan Wang, Lequan Yu
https://arxiv.org/abs/2508.10501 https://mastoxiv.page/@arXiv_csAI_bot/115032101532614110
- Unified Acoustic Representations for Screening Neurological and Respiratory Pathologies from Voice
Ran Piao, Yuan Lu, Hareld Kemps, Tong Xia, Aaqib Saeed
https://arxiv.org/abs/2508.20717 https://mastoxiv.page/@arXiv_csSD_bot/115111255835875066
- Machine Learning-Driven Predictive Resource Management in Complex Science Workflows
Tasnuva Chowdhury, et al.
https://arxiv.org/abs/2509.11512 https://mastoxiv.page/@arXiv_csDC_bot/115213444524490263
- MatchFixAgent: Language-Agnostic Autonomous Repository-Level Code Translation Validation and Repair
Ali Reza Ibrahimzada, Brandon Paulsen, Reyhaneh Jabbarvand, Joey Dodds, Daniel Kroening
https://arxiv.org/abs/2509.16187 https://mastoxiv.page/@arXiv_csSE_bot/115247172280557686
- Automated Machine Learning Pipeline: Large Language Models-Assisted Automated Dataset Generation ...
Adam Lahouari, Jutta Rogal, Mark E. Tuckerman
https://arxiv.org/abs/2509.21647 https://mastoxiv.page/@arXiv_condmatmtrlsci_bot/115286737423175311
- Quantifying the Impact of Structured Output Format on Large Language Models through Causal Inference
Han Yuan, Yue Zhao, Li Zhang, Wuqiong Luo, Zheng Ma
https://arxiv.org/abs/2509.21791 https://mastoxiv.page/@arXiv_csCL_bot/115287166674809413
- The Generation Phases of Flow Matching: a Denoising Perspective
Anne Gagneux, S\'egol\`ene Martin, R\'emi Gribonval, Mathurin Massias
https://arxiv.org/abs/2510.24830 https://mastoxiv.page/@arXiv_csCV_bot/115462527449411627
- Data-driven uncertainty-aware seakeeping prediction of the Delft 372 catamaran using ensemble Han...
Giorgio Palma, Andrea Serani, Matteo Diez
https://arxiv.org/abs/2511.04461 https://mastoxiv.page/@arXiv_eessSY_bot/115507785247809767
- Generalized infinite dimensional Alpha-Procrustes based geometries
Salvish Goomanee, Andi Han, Pratik Jawanpuria, Bamdev Mishra
https://arxiv.org/abs/2511.09801 https://mastoxiv.page/@arXiv_statML_bot/115547135711272091
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 16:07:58

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[3/6]:
- Towards Scalable Oversight via Partitioned Human Supervision
Ren Yin, Takashi Ishida, Masashi Sugiyama
https://arxiv.org/abs/2510.22500 https://mastoxiv.page/@arXiv_csLG_bot/115451787490434401
- ContextPilot: Fast Long-Context Inference via Context Reuse
Yinsicheng Jiang, Yeqi Huang, Liang Cheng, Cheng Deng, Xuan Sun, Luo Mai
https://arxiv.org/abs/2511.03475 https://mastoxiv.page/@arXiv_csLG_bot/115502245581974540
- Metabolomic Biomarker Discovery for ADHD Diagnosis Using Interpretable Machine Learning
Nabil Belacel, Mohamed Rachid Boulassel
https://arxiv.org/abs/2601.11283 https://mastoxiv.page/@arXiv_csLG_bot/115921183182326799
- PhysE-Inv: A Physics-Encoded Inverse Modeling approach for Arctic Snow Depth Prediction
Akila Sampath, Vandana Janeja, Jianwu Wang
https://arxiv.org/abs/2601.17074
- SAGE-5GC: Security-Aware Guidelines for Evaluating Anomaly Detection in the 5G Core Network
Cristian Manca, Christian Scano, Giorgio Piras, Fabio Brau, Maura Pintor, Battista Biggio
https://arxiv.org/abs/2602.03596
- LORE: Jointly Learning the Intrinsic Dimensionality and Relative Similarity Structure From Ordina...
Anand, Helbling, Davenport, Berman, Alagapan, Rozell
https://arxiv.org/abs/2602.04192
- Towards Robust Scaling Laws for Optimizers
Alexandra Volkova, Mher Safaryan, Christoph H. Lampert, Dan Alistarh
https://arxiv.org/abs/2602.07712 https://mastoxiv.page/@arXiv_csLG_bot/116046369672796465
- Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs
Sagnik Mukherjee, Lifan Yuan, Pavan Jayasinha, Dilek Hakkani-T\"ur, Hao Peng
https://arxiv.org/abs/2602.07729 https://mastoxiv.page/@arXiv_csLG_bot/116046377539155485
- AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine L...
Yuzhu Cai, Zexi Liu, Xinyu Zhu, Cheng Wang, Siheng Chen
https://arxiv.org/abs/2602.07906 https://mastoxiv.page/@arXiv_csLG_bot/116046423413650658
- VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Guobin Shen, Chenxiao Zhao, Xiang Cheng, Lei Huang, Xing Yu
https://arxiv.org/abs/2602.10693 https://mastoxiv.page/@arXiv_csLG_bot/116057229834947730
- KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
Zukang Xu, Zhixiong Zhao, Xing Hu, Zhixuan Chen, Dawei Yang
https://arxiv.org/abs/2602.11184 https://mastoxiv.page/@arXiv_csLG_bot/116062537528208461
- MUSE: Multi-Tenant Model Serving With Seamless Model Updates
Correia, Ferreira, Martins, Bento, Guerreiro, Pereira, Gomes, Bono, Ferreira, Bizarro
https://arxiv.org/abs/2602.11776 https://mastoxiv.page/@arXiv_csLG_bot/116062952355379801
- Pawsterior: Variational Flow Matching for Structured Simulation-Based Inference
Jorge Carrasco-Pollo, Floor Eijkelboom, Jan-Willem van de Meent
https://arxiv.org/abs/2602.13813 https://mastoxiv.page/@arXiv_csLG_bot/116085828112928218
- Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misa...
Hong Li, Zhen Zhou, Honggang Zhang, Yuping Luo, Xinyue Wang, Han Gong, Zhiyuan Liu
https://arxiv.org/abs/2602.14462 https://mastoxiv.page/@arXiv_csLG_bot/116085997857526328
- Divine Benevolence is an $x^2$: GLUs scale asymptotically faster than MLPs
Alejandro Francisco Queiruga
https://arxiv.org/abs/2602.14495 https://mastoxiv.page/@arXiv_csLG_bot/116086011618741857
- \"UberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset
DatologyAI, et al.
https://arxiv.org/abs/2602.15210 https://mastoxiv.page/@arXiv_csLG_bot/116090912256712568
- GLM-5: from Vibe Coding to Agentic Engineering
GLM-5-Team, et al.
https://arxiv.org/abs/2602.15763 https://mastoxiv.page/@arXiv_csLG_bot/116091080686771018
- Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganizat...
Jayadev Billa
https://arxiv.org/abs/2602.15997 https://mastoxiv.page/@arXiv_csLG_bot/116096541546306333
- AI-CARE: Carbon-Aware Reporting Evaluation Metric for AI Models
KC Santosh, Srikanth Baride, Rodrigue Rizk
https://arxiv.org/abs/2602.16042 https://mastoxiv.page/@arXiv_csLG_bot/116096581524696028
- Beyond Message Passing: A Symbolic Alternative for Expressive and Interpretable Graph Learning
Chuqin Geng, Li Zhang, Haolin Ye, Ziyu Zhao, Yuhe Jiang, Tara Saba, Xinyu Wang, Xujie Si
https://arxiv.org/abs/2602.16947 https://mastoxiv.page/@arXiv_csLG_bot/116102426238903124
toXiv_bot_toot

@Techmeme@techhub.social
2025-12-08 09:15:44

An interview with 10 Kenyan AI annotators shows Chinese companies hire data labelers via opaque middleman networks and WhatsApp groups to avoid accountability (Rest of World)
https://restofworld.org/2025/kenya-china-ai-workers/

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:39:51

Does Order Matter : Connecting The Law of Robustness to Robust Generalization
Himadri Mandal, Vishnu Varadarajan, Jaee Ponde, Aritra Das, Mihir More, Debayan Gupta
https://arxiv.org/abs/2602.20971 https://arxiv.org/pdf/2602.20971 https://arxiv.org/html/2602.20971
arXiv:2602.20971v1 Announce Type: new
Abstract: Bubeck and Sellke (2021) pose as an open problem the connection between the law of robustness and robust generalization. The law of robustness states that overparameterization is necessary for models to interpolate robustly; in particular, robust interpolation requires the learned function to be Lipschitz. Robust generalization asks whether small robust training loss implies small robust test loss. We resolve this problem by explicitly connecting the two for arbitrary data distributions. Specifically, we introduce a nontrivial notion of robust generalization error and convert it into a lower bound on the expected Rademacher complexity of the induced robust loss class. Our bounds recover the $\Omega(n^{1/d})$ regime of Wu et al.\ (2023) and show that, up to constants, robust generalization does not change the order of the Lipschitz constant required for smooth interpolation. We conduct experiments to probe the predicted scaling with dataset size and model capacity, testing whether empirical behavior aligns more closely with the predictions of Bubeck and Sellke (2021) or Wu et al.\ (2023). For MNIST, we find that the lower-bound Lipschitz constant scales on the order predicted by Wu et al.\ (2023). Informally, to obtain low robust generalization error, the Lipschitz constant must lie in a range that we bound, and the allowable perturbation radius is linked to the Lipschitz scale.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 16:08:29

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[6/6]:
- Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning
Chi-Pin Huang, Yunze Man, Zhiding Yu, Min-Hung Chen, Jan Kautz, Yu-Chiang Frank Wang, Fu-En Yang
https://arxiv.org/abs/2601.09708 https://mastoxiv.page/@arXiv_csCV_bot/115898618760721320
- Universality of Many-body Projected Ensemble for Learning Quantum Data Distribution
Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima
https://arxiv.org/abs/2601.18637 https://mastoxiv.page/@arXiv_quantph_bot/115967001797773134
- FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning
Haozheng Luo, Zhuolin Jiang, Md Zahid Hasan, Yan Chen, Soumalya Sarkar
https://arxiv.org/abs/2601.19001 https://mastoxiv.page/@arXiv_csCL_bot/115972068838908815
- Analysis of Shuffling Beyond Pure Local Differential Privacy
Shun Takagi, Seng Pei Liew
https://arxiv.org/abs/2601.19154 https://mastoxiv.page/@arXiv_csDS_bot/115971701218309765
- CryoLVM: Self-supervised Learning from Cryo-EM Density Maps with Large Vision Models
Weining Fu, Kai Shu, Kui Xu, Qiangfeng Cliff Zhang
https://arxiv.org/abs/2602.02620
- XtraLight-MedMamba for Classification of Neoplastic Tubular Adenomas
Sultana, Afsar, Rahu, Singh, Shula, Combs, Forchetti, Asari
https://arxiv.org/abs/2602.04819
- Flow-Based Conformal Predictive Distributions
Trevor Harris
https://arxiv.org/abs/2602.07633 https://mastoxiv.page/@arXiv_statML_bot/116045671088130364
- GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing
Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin
https://arxiv.org/abs/2602.08550 https://mastoxiv.page/@arXiv_csCV_bot/116046486984991360
- UI-Venus-1.5 Technical Report
Venus Team, et al.
https://arxiv.org/abs/2602.09082 https://mastoxiv.page/@arXiv_csCV_bot/116050980295461008
- The Wisdom of Many Queries: Complexity-Diversity Principle for Dense Retriever Training
Xincan Feng, Noriki Nishida, Yusuke Sakai, Yuji Matsumoto
https://arxiv.org/abs/2602.09448 https://mastoxiv.page/@arXiv_csIR_bot/116051022881293649
- Intent Laundering: AI Safety Datasets Are Not What They Seem
Shahriar Golchin, Marc Wetter
https://arxiv.org/abs/2602.16729 https://mastoxiv.page/@arXiv_csCR_bot/116101884238965526
- The Metaphysics We Train: A Heideggerian Reading of Machine Learning
Heman Shakeri
https://arxiv.org/abs/2602.19028 https://mastoxiv.page/@arXiv_csCY_bot/116125225694943789
- Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks
David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi, Maksym Andriushchenko
https://arxiv.org/abs/2602.20156 https://mastoxiv.page/@arXiv_csCR_bot/116125330557447048
- A Very Big Video Reasoning Suite
Maijunxian Wang, et al.
https://arxiv.org/abs/2602.20159 https://mastoxiv.page/@arXiv_csCV_bot/116125664801070747
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:45:31

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs
Yining Hong, Huang Huang, Manling Li, Li Fei-Fei, Jiajun Wu, Yejin Choi
https://arxiv.org/abs/2602.21198 https://arxiv.org/pdf/2602.21198 https://arxiv.org/html/2602.21198
arXiv:2602.21198v1 Announce Type: new
Abstract: Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials where mistakes repeat rather than accumulate into experience. Drawing upon human reflective practitioners, we introduce Reflective Test-Time Planning, which integrates two modes of reflection: \textit{reflection-in-action}, where the agent uses test-time scaling to generate and score multiple candidate actions using internal reflections before execution; and \textit{reflection-on-action}, which uses test-time training to update both its internal reflection model and its action policy based on external reflections after execution. We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight for proper long-horizon credit assignment. Experiments on our newly-designed Long-Horizon Household benchmark and MuJoCo Cupboard Fitting benchmark show significant gains over baseline models, with ablative studies validating the complementary roles of reflection-in-action and reflection-on-action. Qualitative analyses, including real-robot trials, highlight behavioral correction through reflection.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2025-12-22 13:55:06

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[5/5]:
- CLAReSNet: When Convolution Meets Latent Attention for Hyperspectral Image Classification
Asmit Bandyopadhyay, Anindita Das Bhattacharjee, Rakesh Das
https://arxiv.org/abs/2511.12346 https://mastoxiv.page/@arXiv_csCV_bot/115570753208147835
- Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without ...
Dimitris Oikonomou, Nicolas Loizou
https://arxiv.org/abs/2512.02342 https://mastoxiv.page/@arXiv_mathOC_bot/115654870924418771
- Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven App...
Karthik Prabhakar, Durgamadhab Mishra
https://arxiv.org/abs/2512.06699 https://mastoxiv.page/@arXiv_csPF_bot/115688618582182232
- Minimum Bayes Risk Decoding for Error Span Detection in Reference-Free Automatic Machine Translat...
Lyu, Song, Kamigaito, Ding, Tanaka, Utiyama, Funakoshi, Okumura
https://arxiv.org/abs/2512.07540 https://mastoxiv.page/@arXiv_csCL_bot/115689532163491162
- In-Context Learning for Seismic Data Processing
Fabian Fuchs, Mario Ruben Fernandez, Norman Ettrich, Janis Keuper
https://arxiv.org/abs/2512.11575 https://mastoxiv.page/@arXiv_csCV_bot/115723040285820239
- Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking
Rheeya Uppaal, Phu Mon Htut, Min Bai, Nikolaos Pappas, Zheng Qi, Sandesh Swamy
https://arxiv.org/abs/2512.12218 https://mastoxiv.page/@arXiv_csCV_bot/115729165330908574
- Non-Resolution Reasoning (NRR): A Computational Framework for Contextual Identity and Ambiguity P...
Kei Saito
https://arxiv.org/abs/2512.13478 https://mastoxiv.page/@arXiv_csCL_bot/115729234145554554
- Stylized Synthetic Augmentation further improves Corruption Robustness
Georg Siedel, Rojan Regmi, Abhirami Anand, Weijia Shao, Silvia Vock, Andrey Morozov
https://arxiv.org/abs/2512.15675 https://mastoxiv.page/@arXiv_csCV_bot/115740141862163631
- mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
Jonas Pai, Liam Achenbach, Victoriano Montesinos, Benedek Forrai, Oier Mees, Elvis Nava
https://arxiv.org/abs/2512.15692 https://mastoxiv.page/@arXiv_csRO_bot/115739947869830764
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2025-12-22 11:50:19

Crosslisted article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[1/3]:
- Optimizing Text Search: A Novel Pattern Matching Algorithm Based on Ukkonen's Approach
Xinyu Guan, Shaohua Zhang
https://arxiv.org/abs/2512.16927 https://mastoxiv.page/@arXiv_csDS_bot/115762062326187898
- SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization
Shravan Chaudhari, Rahul Thomas Jacob, Mononito Goswami, Jiajun Cao, Shihab Rashid, Christian Bock
https://arxiv.org/abs/2512.16956 https://mastoxiv.page/@arXiv_csSE_bot/115762248476963893
- MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval
Saksham Sahai Srivastava, Haoyu He
https://arxiv.org/abs/2512.16962 https://mastoxiv.page/@arXiv_csCR_bot/115762140339109012
- Colormap-Enhanced Vision Transformers for MRI-Based Multiclass (4-Class) Alzheimer's Disease Clas...
Faisal Ahmed
https://arxiv.org/abs/2512.16964 https://mastoxiv.page/@arXiv_eessIV_bot/115762196702065869
- Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
Wanghan Xu, et al.
https://arxiv.org/abs/2512.16969 https://mastoxiv.page/@arXiv_csAI_bot/115762050529328276
- PAACE: A Plan-Aware Automated Agent Context Engineering Framework
Kamer Ali Yuksel
https://arxiv.org/abs/2512.16970 https://mastoxiv.page/@arXiv_csAI_bot/115762054461584205
- A Women's Health Benchmark for Large Language Models
Elisabeth Gruber, et al.
https://arxiv.org/abs/2512.17028 https://mastoxiv.page/@arXiv_csCL_bot/115762049873946945
- Perturb Your Data: Paraphrase-Guided Training Data Watermarking
Pranav Shetty, Mirazul Haque, Petr Babkin, Zhiqiang Ma, Xiaomo Liu, Manuela Veloso
https://arxiv.org/abs/2512.17075 https://mastoxiv.page/@arXiv_csCL_bot/115762077400293945
- Disentangled representations via score-based variational autoencoders
Benjamin S. H. Lyo, Eero P. Simoncelli, Cristina Savin
https://arxiv.org/abs/2512.17127 https://mastoxiv.page/@arXiv_statML_bot/115762251753966702
- Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors
Huixin Zhan
https://arxiv.org/abs/2512.17146 https://mastoxiv.page/@arXiv_csCR_bot/115762318582013305
- Application of machine learning to predict food processing level using Open Food Facts
Arora, Chauhan, Rana, Aditya, Bhagat, Kumar, Kumar, Semar, Singh, Bagler
https://arxiv.org/abs/2512.17169 https://mastoxiv.page/@arXiv_qbioBM_bot/115762302873829397
- Systemic Risk Radar: A Multi-Layer Graph Framework for Early Market Crash Warning
Sandeep Neela
https://arxiv.org/abs/2512.17185 https://mastoxiv.page/@arXiv_qfinRM_bot/115762275982224870
- Do Foundational Audio Encoders Understand Music Structure?
Keisuke Toyama, Zhi Zhong, Akira Takahashi, Shusuke Takahashi, Yuki Mitsufuji
https://arxiv.org/abs/2512.17209 https://mastoxiv.page/@arXiv_csSD_bot/115762341541572505
- CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency
Xiao Liang, Yuxuan An, Di Wang, Jiawei Hu, Zhicheng Jiao, Bin Jing, Quan Wang
https://arxiv.org/abs/2512.17213 https://mastoxiv.page/@arXiv_csCV_bot/115762574180736975
- Machine Learning Assisted Parameter Tuning on Wavelet Transform Amorphous Radial Distribution Fun...
Deriyan Senjaya, Stephen Ekaputra Limantoro
https://arxiv.org/abs/2512.17245 https://mastoxiv.page/@arXiv_condmatmtrlsci_bot/115762447037143855
- AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs
Madhava Gaikwad
https://arxiv.org/abs/2512.17251 https://mastoxiv.page/@arXiv_csCR_bot/115762396593872943
- Practical Framework for Privacy-Preserving and Byzantine-robust Federated Learning
Baolei Zhang, Minghong Fang, Zhuqing Liu, Biao Yi, Peizhao Zhou, Yuan Wang, Tong Li, Zheli Liu
https://arxiv.org/abs/2512.17254 https://mastoxiv.page/@arXiv_csCR_bot/115762402470985707
- Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling A...
Abhivansh Gupta
https://arxiv.org/abs/2512.17259 https://mastoxiv.page/@arXiv_csMA_bot/115762225538364939
- Warmer for Less: A Cost-Efficient Strategy for Cold-Start Recommendations at Pinterest
Saeed Ebrahimi, Weijie Jiang, Jaewon Yang, Olafur Gudmundsson, Yucheng Tu, Huizhong Duan
https://arxiv.org/abs/2512.17277 https://mastoxiv.page/@arXiv_csIR_bot/115762214396869930
- LibriVAD: A Scalable Open Dataset with Deep Learning Benchmarks for Voice Activity Detection
Ioannis Stylianou, Achintya kr. Sarkar, Nauman Dawalatabad, James Glass, Zheng-Hua Tan
https://arxiv.org/abs/2512.17281 https://mastoxiv.page/@arXiv_csSD_bot/115762361858560703
- Penalized Fair Regression for Multiple Groups in Chronic Kidney Disease
Carter H. Nakamoto, Lucia Lushi Chen, Agata Foryciarz, Sherri Rose
https://arxiv.org/abs/2512.17340 https://mastoxiv.page/@arXiv_statME_bot/115762446402738033
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2025-12-22 13:54:35

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[2/5]:
- The Diffusion Duality
Sahoo, Deschenaux, Gokaslan, Wang, Chiu, Kuleshov
https://arxiv.org/abs/2506.10892 https://mastoxiv.page/@arXiv_csLG_bot/114675526577078472
- Multimodal Representation Learning and Fusion
Jin, Ge, Xie, Luo, Song, Bi, Liang, Guan, Yeong, Song, Hao
https://arxiv.org/abs/2506.20494 https://mastoxiv.page/@arXiv_csLG_bot/114749113025183688
- The kernel of graph indices for vector search
Mariano Tepper, Ted Willke
https://arxiv.org/abs/2506.20584 https://mastoxiv.page/@arXiv_csLG_bot/114749118923266356
- OptScale: Probabilistic Optimality for Inference-time Scaling
Youkang Wang, Jian Wang, Rubing Chen, Xiao-Yong Wei
https://arxiv.org/abs/2506.22376 https://mastoxiv.page/@arXiv_csLG_bot/114771735361664528
- Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods
Fabian Akkerman, Julien Ferry, Christian Artigues, Emmanuel Hebrard, Thibaut Vidal
https://arxiv.org/abs/2507.18242 https://mastoxiv.page/@arXiv_csLG_bot/114913322736512937
- MolMark: Safeguarding Molecular Structures through Learnable Atom-Level Watermarking
Runwen Hu, Peilin Chen, Keyan Ding, Shiqi Wang
https://arxiv.org/abs/2508.17702 https://mastoxiv.page/@arXiv_csLG_bot/115095014405732247
- Dual-Distilled Heterogeneous Federated Learning with Adaptive Margins for Trainable Global Protot...
Fatema Siddika, Md Anwar Hossen, Wensheng Zhang, Anuj Sharma, Juan Pablo Mu\~noz, Ali Jannesari
https://arxiv.org/abs/2508.19009 https://mastoxiv.page/@arXiv_csLG_bot/115100269482762688
- STDiff: A State Transition Diffusion Framework for Time Series Imputation in Industrial Systems
Gary Simethy, Daniel Ortiz-Arroyo, Petar Durdevic
https://arxiv.org/abs/2508.19011 https://mastoxiv.page/@arXiv_csLG_bot/115100270137397046
- EEGDM: Learning EEG Representation with Latent Diffusion Model
Shaocong Wang, Tong Liu, Yihan Li, Ming Li, Kairui Wen, Pei Yang, Wenqi Ji, Minjing Yu, Yong-Jin Liu
https://arxiv.org/abs/2508.20705 https://mastoxiv.page/@arXiv_csLG_bot/115111565155687451
- Data-Free Continual Learning of Server Models in Model-Heterogeneous Cloud-Device Collaboration
Xiao Zhang, Zengzhe Chen, Yuan Yuan, Yifei Zou, Fuzhen Zhuang, Wenyu Jiao, Yuke Wang, Dongxiao Yu
https://arxiv.org/abs/2509.25977 https://mastoxiv.page/@arXiv_csLG_bot/115298721327100391
- Fine-Tuning Masked Diffusion for Provable Self-Correction
Jaeyeon Kim, Seunggeun Kim, Taekyun Lee, David Z. Pan, Hyeji Kim, Sham Kakade, Sitan Chen
https://arxiv.org/abs/2510.01384 https://mastoxiv.page/@arXiv_csLG_bot/115309690976554356
- A Generic Machine Learning Framework for Radio Frequency Fingerprinting
Alex Hiles, Bashar I. Ahmad
https://arxiv.org/abs/2510.09775 https://mastoxiv.page/@arXiv_csLG_bot/115372387779061015
- ASecond-Order SpikingSSM for Wearables
Kartikay Agrawal, Abhijeet Vikram, Vedant Sharma, Vaishnavi Nagabhushana, Ayon Borthakur
https://arxiv.org/abs/2510.14386 https://mastoxiv.page/@arXiv_csLG_bot/115389079527543821
- Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning
Heming Zou, Yixiu Mao, Yun Qu, Qi Wang, Xiangyang Ji
https://arxiv.org/abs/2510.16882 https://mastoxiv.page/@arXiv_csLG_bot/115412243355962887
- Seeing Structural Failure Before it Happens: An Image-Based Physics-Informed Neural Network (PINN...
Omer Jauhar Khan, Sudais Khan, Hafeez Anwar, Shahzeb Khan, Shams Ul Arifeen
https://arxiv.org/abs/2510.23117 https://mastoxiv.page/@arXiv_csLG_bot/115451891042176876
- Training Deep Physics-Informed Kolmogorov-Arnold Networks
Spyros Rigas, Fotios Anagnostopoulos, Michalis Papachristou, Georgios Alexandridis
https://arxiv.org/abs/2510.23501 https://mastoxiv.page/@arXiv_csLG_bot/115451942159737549
- Semi-Supervised Preference Optimization with Limited Feedback
Seonggyun Lee, Sungjun Lim, Seojin Park, Soeun Cheon, Kyungwoo Song
https://arxiv.org/abs/2511.00040 https://mastoxiv.page/@arXiv_csLG_bot/115490555013124989
- Towards Causal Market Simulators
Dennis Thumm, Luis Ontaneda Mijares
https://arxiv.org/abs/2511.04469 https://mastoxiv.page/@arXiv_csLG_bot/115507943827841017
- Incremental Generation is Necessary and Sufficient for Universality in Flow-Based Modelling
Hossein Rouhvarzi, Anastasis Kratsios
https://arxiv.org/abs/2511.09902 https://mastoxiv.page/@arXiv_csLG_bot/115547587245365920
- Optimizing Mixture of Block Attention
Guangxuan Xiao, Junxian Guo, Kasra Mazaheri, Song Han
https://arxiv.org/abs/2511.11571 https://mastoxiv.page/@arXiv_csLG_bot/115564541392410174
- Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs
Shasha Zhou, Mingyu Huang, Jack Cole, Charles Britton, Ming Yin, Jan Wolber, Ke Li
https://arxiv.org/abs/2511.12817 https://mastoxiv.page/@arXiv_csLG_bot/115570877730326947
toXiv_bot_toot

Tootfinder

Opt-in global Mastodon full text search. Join the index!