Tootfinder

Opt-in global Mastodon full text search. Join the index!

@Techmeme@techhub.social
2026-01-06 03:45:35

Jensen Huang says Nvidia's Vera Rubin chips are in "full production"; Nvidia says Rubin can train some LLMs with roughly one-fourth the chips Blackwell needs (Lauren Goode/Wired)
wired.com/story/nvidias-rubin-

@Mediagazer@mstdn.social
2025-12-07 05:16:02

Some Reddit moderators say a surge of AI slop on the site is eroding its authenticity and could lead to a feedback loop of AI models training on AI content (Kat Tenbarge/Wired)
wired.com/story/ai-slop-is-rui

@shriramk@mastodon.social
2026-02-04 01:45:53

New talk abstract dropping. I just hope I can write a talk to live up to it by *checks watch* *gulp* Monday.

Generative AI and Computing Education for Novices

Generative AI (GenAI) has sowed havoc in computing education, especially introductory computing. As AI models grow in sophistication, it is increasingly difficult to find problems that cannot be comprehensively solved by AI. Even "defeat devices" like obscure programming languages have limited viability, due to technical reasons like one-shot learning, temporal reasons such as training, and basic educational considerations like what we want stu…
@Techmeme@techhub.social
2026-02-26 16:40:46

Encord, whose software helps companies developing AI models manage training data for robots and other uses, raised $60M at a $500M pre-money valuation (Rocket Drew/The Information)
theinformation.com/articles/ro

@Mediagazer@mstdn.social
2026-01-26 22:05:45

A group of YouTubers with a combined 6.2M subscribers adds Snap to a class action lawsuit, alleging the company trained its AI systems on their video content (Sarah Perez/TechCrunch)
techcrunch.com/2026/01/26/yout

@Techmeme@techhub.social
2026-01-26 22:10:47

A group of YouTubers with a combined 6.2M subscribers adds Snap to a class action lawsuit, alleging the company trained its AI systems on their video content (Sarah Perez/TechCrunch)
techcrunch.com/2026/01/26/yout

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:42:41

Scaling Vision Transformers: Evaluating DeepSpeed for Image-Centric Workloads
Huy Trinh, Rebecca Ma, Zeqi Yu, Tahsin Reza
arxiv.org/abs/2602.21081 arxiv.org/pdf/2602.21081 arxiv.org/html/2602.21081
arXiv:2602.21081v1 Announce Type: new
Abstract: Vision Transformers (ViTs) have demonstrated remarkable potential in image processing tasks by utilizing self-attention mechanisms to capture global relationships within data. However, their scalability is hindered by significant computational and memory demands, especially for large-scale models with many parameters. This study aims to leverage DeepSpeed, a highly efficient distributed training framework that is commonly used for language models, to enhance the scalability and performance of ViTs. We evaluate intra- and inter-node training efficiency across multiple GPU configurations on various datasets like CIFAR-10 and CIFAR-100, exploring the impact of distributed data parallelism on training speed, communication overhead, and overall scalability (strong and weak scaling). By systematically varying software parameters, such as batch size and gradient accumulation, we identify key factors influencing performance of distributed training. The experiments in this study provide a foundational basis for applying DeepSpeed to image-related tasks. Future work will extend these investigations to deepen our understanding of DeepSpeed's limitations and explore strategies for optimizing distributed training pipelines for Vision Transformers.
toXiv_bot_toot

@cyrevolt@mastodon.social
2026-01-24 16:05:02

Gemini: "Google does not train its foundational models on private data under NDA, such as non-public chip documentation or confidential enterprise files. My training relies on publicly available web content, licensed datasets, and code."
Well that is unfortunate, because it's the actually interesting case for me to use it. 🤫

@johl@mastodon.xyz
2025-12-15 14:07:44

The very excellent “what happened last week” newsletter by @… focuses on “the A in AI stands for African” this week, namely on how Chinese AI companies are turning Kenya with its chronic unemployment of 67 percent into a hot spot of cheap AI labor.
More and more Kenyan students and recent graduates are hired to label thousands of videos a day through opaq…

@Techmeme@techhub.social
2026-02-24 07:20:49

A look at the challenges some AI developers face in building models to extract trillions of high-quality tokens from PDFs, which are hard to parse, for training (Josh Dzieza/The Verge)
theverge.com/ai-artificial-int

@arXiv_physicsfludyn_bot@mastoxiv.page
2026-02-27 08:32:10

From synthetic turbulence to true solutions: A deep diffusion model for discovering periodic orbits in the Navier-Stokes equations
Jeremy P Parker, Tobias M Schneider
arxiv.org/abs/2602.23181 arxiv.org/pdf/2602.23181 arxiv.org/html/2602.23181
arXiv:2602.23181v1 Announce Type: new
Abstract: Generative artificial intelligence has shown remarkable success in synthesizing data that mimic complex real-world systems, but its potential role in the discovery of mathematically meaningful structures in physical models remains underexplored. In this work, we demonstrate how a generative diffusion model can be used to uncover previously unknown solutions of a nonlinear partial differential equation: the two-dimensional Navier-Stokes equations in a turbulent regime. Trained on data from a direct numerical simulation of turbulence, the model learns to generate time series that resemble physically plausible trajectories. By carefully modifying the temporal structure of the model and enforcing the symmetries of the governing equations, we produce synthetic trajectories that are periodic in time, despite the fact that the training data did not contain periodic trajectories. These synthetic trajectories are then refined into true solutions using an iterative solver, yielding 111 new periodic orbits (POs) with very short periods. Our results reveal a previously unobserved richness in the PO structure of this system and suggest a broader role for generative AI: not as replacements for simulation and existing solvers, but as a complementary tool for navigating the complex solution spaces of nonlinear dynamical systems.
toXiv_bot_toot

@Techmeme@techhub.social
2026-02-24 10:50:49

Anthropic introduces "persona selection model", a theory to explain AI's human-like behavior, and details how AI personas form in pre-training and post-training (Anthropic)
anthropic.com/research/persona

@rasos@fairmove.net
2026-01-17 09:05:38

Could we become co-owner of #AI models, that used training data, which I published under a viral license?
We have briefly discussed this question in the Austrian #CreativeCommons chapter and came to the conclusion, that copyright can by only claimed by human beings. So the model itself and wha…

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:39:11

Extending $\mu$P: Spectral Conditions for Feature Learning Across Optimizers
Akshita Gupta, Marieme Ngom, Sam Foreman, Venkatram Vishwanath
arxiv.org/abs/2602.20937 arxiv.org/pdf/2602.20937 arxiv.org/html/2602.20937
arXiv:2602.20937v1 Announce Type: new
Abstract: Several variations of adaptive first-order and second-order optimization methods have been proposed to accelerate and scale the training of large language models. The performance of these optimization routines is highly sensitive to the choice of hyperparameters (HPs), which are computationally expensive to tune for large-scale models. Maximal update parameterization $(\mu$P$)$ is a set of scaling rules which aims to make the optimal HPs independent of the model size, thereby allowing the HPs tuned on a smaller (computationally cheaper) model to be transferred to train a larger, target model. Despite promising results for SGD and Adam, deriving $\mu$P for other optimizers is challenging because the underlying tensor programming approach is difficult to grasp. Building on recent work that introduced spectral conditions as an alternative to tensor programs, we propose a novel framework to derive $\mu$P for a broader class of optimizers, including AdamW, ADOPT, LAMB, Sophia, Shampoo and Muon. We implement our $\mu$P derivations on multiple benchmark models and demonstrate zero-shot learning rate transfer across increasing model width for the above optimizers. Further, we provide empirical insights into depth-scaling parameterization for these optimizers.
toXiv_bot_toot

@Mediagazer@mstdn.social
2026-02-24 14:55:50

South Korea's three major terrestrial broadcasters, KBS, MBC, and SBS, sue OpenAI, alleging it used their news content to train AI models without authorization (Lee Yoon-seo/The Korea Herald)
koreaherald.com/article/106806

@Techmeme@techhub.social
2026-02-26 21:15:48

Sources: Meta last week scrapped the most advanced AI chip it was developing, after struggling with the design, and shifted its focus to a less complicated chip (The Information)
theinformation.com/articles/me

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:44:31

The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum
Justin Deschenaux, Caglar Gulcehre, Subham Sekhar Sahoo
arxiv.org/abs/2602.21185 arxiv.org/pdf/2602.21185 arxiv.org/html/2602.21185
arXiv:2602.21185v1 Announce Type: new
Abstract: Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: s-sahoo.com/duo-ch2
toXiv_bot_toot

@Techmeme@techhub.social
2025-12-19 21:06:41

AI robotics startup Physical Intelligence claims vision-language-action models learn to align human videos and robot data as pre-training is scaled up (Physical Intelligence)
physicalintelligence.company/r

@arXiv_csGR_bot@mastoxiv.page
2026-01-21 08:13:41

Copy-Trasform-Paste: Zero-Shot Object-Object Alignment Guided by Vision-Language and Geometric Constraints
Rotem Gatenyo, Ohad Fried
arxiv.org/abs/2601.14207 arxiv.org/pdf/2601.14207 arxiv.org/html/2601.14207
arXiv:2601.14207v1 Announce Type: new
Abstract: We study zero-shot 3D alignment of two given meshes, using a text prompt describing their spatial relation -- an essential capability for content creation and scene assembly. Earlier approaches primarily rely on geometric alignment procedures, while recent work leverages pretrained 2D diffusion models to model language-conditioned object-object spatial relationships. In contrast, we directly optimize the relative pose at test time, updating translation, rotation, and isotropic scale with CLIP-driven gradients via a differentiable renderer, without training a new model. Our framework augments language supervision with geometry-aware objectives: a variant of soft-Iterative Closest Point (ICP) term to encourage surface attachment and a penetration loss to discourage interpenetration. A phased schedule strengthens contact constraints over time, and camera control concentrates the optimization on the interaction region. To enable evaluation, we curate a benchmark containing diverse categories and relations, and compare against baselines. Our method outperforms all alternatives, yielding semantically faithful and physically plausible alignments.
toXiv_bot_toot

@newsie@darktundra.xyz
2025-12-08 21:33:20

Trump plans executive order curbing state AI laws therecord.media/trump-plans-ai

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:45:11

Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking
Ravi Ghadia, Maksim Abraham, Sergei Vorobyov, Max Ryabinin
arxiv.org/abs/2602.21196 arxiv.org/pdf/2602.21196 arxiv.org/html/2602.21196
arXiv:2602.21196v1 Announce Type: new
Abstract: Efficiently processing long sequences with Transformer models usually requires splitting the computations across accelerators via context parallelism. The dominant approaches in this family of methods, such as Ring Attention or DeepSpeed Ulysses, enable scaling over the context dimension but do not focus on memory efficiency, which limits the sequence lengths they can support. More advanced techniques, such as Fully Pipelined Distributed Transformer or activation offloading, can further extend the possible context length at the cost of training throughput. In this paper, we present UPipe, a simple yet effective context parallelism technique that performs fine-grained chunking at the attention head level. This technique significantly reduces the activation memory usage of self-attention, breaking the activation memory barrier and unlocking much longer context lengths. Our approach reduces intermediate tensor memory usage in the attention layer by as much as 87.5$\%$ for 32B Transformers, while matching previous context parallelism techniques in terms of training speed. UPipe can support the context length of 5M tokens when training Llama3-8B on a single 8$\times$H100 node, improving upon prior methods by over 25$\%$.
toXiv_bot_toot

@Techmeme@techhub.social
2026-01-25 21:05:38

A profile of Mercor, which pays about $2M daily to ~30K experts training AI models at $95/hour on average, with roles like radiologists earning up to $375/hour (Bethan Staton/Financial Times)
ft.com/content/0cab0fcd-e355-4

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:36:21

On Electric Vehicle Energy Demand Forecasting and the Effect of Federated Learning
Andreas Tritsarolis, Gil Sampaio, Nikos Pelekis, Yannis Theodoridis
arxiv.org/abs/2602.20782 arxiv.org/pdf/2602.20782 arxiv.org/html/2602.20782
arXiv:2602.20782v1 Announce Type: new
Abstract: The wide spread of new energy resources, smart devices, and demand side management strategies has motivated several analytics operations, from infrastructure load modeling to user behavior profiling. Energy Demand Forecasting (EDF) of Electric Vehicle Supply Equipments (EVSEs) is one of the most critical operations for ensuring efficient energy management and sustainability, since it enables utility providers to anticipate energy/power demand, optimize resource allocation, and implement proactive measures to improve grid reliability. However, accurate EDF is a challenging problem due to external factors, such as the varying user routines, weather conditions, driving behaviors, unknown state of charge, etc. Furthermore, as concerns and restrictions about privacy and sustainability have grown, training data has become increasingly fragmented, resulting in distributed datasets scattered across different data silos and/or edge devices, calling for federated learning solutions. In this paper, we investigate different well-established time series forecasting methodologies to address the EDF problem, from statistical methods (the ARIMA family) to traditional machine learning models (such as XGBoost) and deep neural networks (GRU and LSTM). We provide an overview of these methods through a performance comparison over four real-world EVSE datasets, evaluated under both centralized and federated learning paradigms, focusing on the trade-offs between forecasting fidelity, privacy preservation, and energy overheads. Our experimental results demonstrate, on the one hand, the superiority of gradient boosted trees (XGBoost) over statistical and NN-based models in both prediction accuracy and energy efficiency and, on the other hand, an insight that Federated Learning-enabled models balance these factors, offering a promising direction for decentralized energy demand forecasting.
toXiv_bot_toot

@Techmeme@techhub.social
2025-12-16 01:05:43

The Allen Institute of AI launches Bolmo 7B and Bolmo 1B, claiming they are "the first fully open byte-level language models", built on its Olmo 3 models (Emilia David/VentureBeat)
venturebeat.com/ai/bolmos-arch

@arXiv_csLG_bot@mastoxiv.page
2025-12-22 10:34:10

Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation
Liam Collins, Bhuvesh Kumar, Clark Mingxuan Ju, Tong Zhao, Donald Loveland, Leonardo Neves, Neil Shah
arxiv.org/abs/2512.17820 arxiv.org/pdf/2512.17820 arxiv.org/html/2512.17820
arXiv:2512.17820v1 Announce Type: new
Abstract: Modern Sequential Recommendation (SR) models commonly utilize modality features to represent items, motivated in large part by recent advancements in language and vision modeling. To do so, several works completely replace ID embeddings with modality embeddings, claiming that modality embeddings render ID embeddings unnecessary because they can match or even exceed ID embedding performance. On the other hand, many works jointly utilize ID and modality features, but posit that complex fusion strategies, such as multi-stage training and/or intricate alignment architectures, are necessary for this joint utilization. However, underlying both these lines of work is a lack of understanding of the complementarity of ID and modality features. In this work, we address this gap by studying the complementarity of ID- and text-based SR models. We show that these models do learn complementary signals, meaning that either should provide performance gain when used properly alongside the other. Motivated by this, we propose a new SR method that preserves ID-text complementarity through independent model training, then harnesses it through a simple ensembling strategy. Despite this method's simplicity, we show it outperforms several competitive SR baselines, implying that both ID and text features are necessary to achieve state-of-the-art SR performance but complex fusion architectures are not.
toXiv_bot_toot

@Techmeme@techhub.social
2026-02-18 01:56:04

Anthropic expects to pay Amazon, Google, and Microsoft $80B total to run its models on their servers through 2029, plus an additional $100B for training costs (The Information)
theinformation.com/articles/an

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:37:41

Transcoder Adapters for Reasoning-Model Diffing
Nathan Hu, Jake Ward, Thomas Icard, Christopher Potts
arxiv.org/abs/2602.20904 arxiv.org/pdf/2602.20904 arxiv.org/html/2602.20904
arXiv:2602.20904v1 Announce Type: new
Abstract: While reasoning models are increasingly ubiquitous, the effects of reasoning training on a model's internal mechanisms remain poorly understood. In this work, we introduce transcoder adapters, a technique for learning an interpretable approximation of the difference in MLP computation before and after fine-tuning. We apply transcoder adapters to characterize the differences between Qwen2.5-Math-7B and its reasoning-distilled variant, DeepSeek-R1-Distill-Qwen-7B. Learned adapters are faithful to the target model's internal computation and next-token predictions. When evaluated on reasoning benchmarks, adapters match the reasoning model's response lengths and typically recover 50-90% of the accuracy gains from reasoning fine-tuning. Adapter features are sparsely activating and interpretable. When examining adapter features, we find that only ~8% have activating examples directly related to reasoning behaviors. We deeply study one such behavior -- the production of hesitation tokens (e.g., "wait"). Using attribution graphs, we trace hesitation to only ~2.4% of adapter features (5.6k total) performing one of two functions. These features are necessary and sufficient for producing hesitation tokens; removing them reduces response length, often without affecting accuracy. Overall, our results provide insight into reasoning training and suggest transcoder adapters may be useful for studying fine-tuning more broadly.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:35:21

WeirNet: A Large-Scale 3D CFD Benchmark for Geometric Surrogate Modeling of Piano Key Weirs
Lisa L\"uddecke, Michael Hohmann, Sebastian Eilermann, Jan Tillmann-Mumm, Pezhman Pourabdollah, Mario Oertel, Oliver Niggemann
arxiv.org/abs/2602.20714 arxiv.org/pdf/2602.20714 arxiv.org/html/2602.20714
arXiv:2602.20714v1 Announce Type: new
Abstract: Reliable prediction of hydraulic performance is challenging for Piano Key Weir (PKW) design because discharge capacity depends on three-dimensional geometry and operating conditions. Surrogate models can accelerate hydraulic-structure design, but progress is limited by scarce large, well-documented datasets that jointly capture geometric variation, operating conditions, and functional performance. This study presents WeirNet, a large 3D CFD benchmark dataset for geometric surrogate modeling of PKWs. WeirNet contains 3,794 parametric, feasibility-constrained rectangular and trapezoidal PKW geometries, each scheduled at 19 discharge conditions using a consistent free-surface OpenFOAM workflow, resulting in 71,387 completed simulations that form the benchmark and with complete discharge coefficient labels. The dataset is released as multiple modalities compact parametric descriptors, watertight surface meshes and high-resolution point clouds together with standardized tasks and in-distribution and out-of-distribution splits. Representative surrogate families are benchmarked for discharge coefficient prediction. Tree-based regressors on parametric descriptors achieve the best overall accuracy, while point- and mesh-based models remain competitive and offer parameterization-agnostic inference. All surrogates evaluate in milliseconds per sample, providing orders-of-magnitude speedups over CFD runtimes. Out-of-distribution results identify geometry shift as the dominant failure mode compared to unseen discharge values, and data-efficiency experiments show diminishing returns beyond roughly 60% of the training data. By publicly releasing the dataset together with simulation setups and evaluation pipelines, WeirNet establishes a reproducible framework for data-driven hydraulic modeling and enables faster exploration of PKW designs during the early stages of hydraulic planning.
toXiv_bot_toot

@Techmeme@techhub.social
2026-01-16 11:30:50

Cloudflare acquires AI data marketplace Human Native for an undisclosed sum, aiming to create a new system where AI developers pay creators for training content (Davis Giangiulio/CNBC)
cnbc.com/2026/01/15/cloudflare

@arXiv_csLG_bot@mastoxiv.page
2025-12-22 13:54:55

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[4/5]:
- Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gon\c{c}alo Faria, Noah A. Smith
arxiv.org/abs/2504.03790 mastoxiv.page/@arXiv_csCL_bot/
- A Survey on Archetypal Analysis
Aleix Alcacer, Irene Epifanio, Sebastian Mair, Morten M{\o}rup
arxiv.org/abs/2504.12392 mastoxiv.page/@arXiv_statME_bo
- The Stochastic Occupation Kernel (SOCK) Method for Learning Stochastic Differential Equations
Michael L. Wells, Kamel Lahouel, Bruno Jedynak
arxiv.org/abs/2505.11622 mastoxiv.page/@arXiv_statML_bo
- BOLT: Block-Orthonormal Lanczos for Trace estimation of matrix functions
Kingsley Yeon, Promit Ghosal, Mihai Anitescu
arxiv.org/abs/2505.12289 mastoxiv.page/@arXiv_mathNA_bo
- Clustering and Pruning in Causal Data Fusion
Otto Tabell, Santtu Tikka, Juha Karvanen
arxiv.org/abs/2505.15215 mastoxiv.page/@arXiv_statML_bo
- On the performance of multi-fidelity and reduced-dimensional neural emulators for inference of ph...
Chloe H. Choi, Andrea Zanoni, Daniele E. Schiavazzi, Alison L. Marsden
arxiv.org/abs/2506.11683 mastoxiv.page/@arXiv_statML_bo
- Beyond Force Metrics: Pre-Training MLFFs for Stable MD Simulations
Maheshwari, Tang, Ock, Kolluru, Farimani, Kitchin
arxiv.org/abs/2506.14850 mastoxiv.page/@arXiv_physicsch
- Quantifying Uncertainty in the Presence of Distribution Shifts
Yuli Slavutsky, David M. Blei
arxiv.org/abs/2506.18283 mastoxiv.page/@arXiv_statML_bo
- ZKPROV: A Zero-Knowledge Approach to Dataset Provenance for Large Language Models
Mina Namazi, Alexander Nemecek, Erman Ayday
arxiv.org/abs/2506.20915 mastoxiv.page/@arXiv_csCR_bot/
- SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
Zhao, Huang, Xue, Kong, Liu, Tang, Beers, Ting, Luo
arxiv.org/abs/2507.01939 mastoxiv.page/@arXiv_astrophIM
- Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based I...
Ko Watanabe, Stanislav Frolov, Aya Hassan, David Dembinsky, Adriano Lucieri, Andreas Dengel
arxiv.org/abs/2507.17860 mastoxiv.page/@arXiv_csCV_bot/
- PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning
Yushi Feng, Junye Du, Yingying Hong, Qifan Wang, Lequan Yu
arxiv.org/abs/2508.10501 mastoxiv.page/@arXiv_csAI_bot/
- Unified Acoustic Representations for Screening Neurological and Respiratory Pathologies from Voice
Ran Piao, Yuan Lu, Hareld Kemps, Tong Xia, Aaqib Saeed
arxiv.org/abs/2508.20717 mastoxiv.page/@arXiv_csSD_bot/
- Machine Learning-Driven Predictive Resource Management in Complex Science Workflows
Tasnuva Chowdhury, et al.
arxiv.org/abs/2509.11512 mastoxiv.page/@arXiv_csDC_bot/
- MatchFixAgent: Language-Agnostic Autonomous Repository-Level Code Translation Validation and Repair
Ali Reza Ibrahimzada, Brandon Paulsen, Reyhaneh Jabbarvand, Joey Dodds, Daniel Kroening
arxiv.org/abs/2509.16187 mastoxiv.page/@arXiv_csSE_bot/
- Automated Machine Learning Pipeline: Large Language Models-Assisted Automated Dataset Generation ...
Adam Lahouari, Jutta Rogal, Mark E. Tuckerman
arxiv.org/abs/2509.21647 mastoxiv.page/@arXiv_condmatmt
- Quantifying the Impact of Structured Output Format on Large Language Models through Causal Inference
Han Yuan, Yue Zhao, Li Zhang, Wuqiong Luo, Zheng Ma
arxiv.org/abs/2509.21791 mastoxiv.page/@arXiv_csCL_bot/
- The Generation Phases of Flow Matching: a Denoising Perspective
Anne Gagneux, S\'egol\`ene Martin, R\'emi Gribonval, Mathurin Massias
arxiv.org/abs/2510.24830 mastoxiv.page/@arXiv_csCV_bot/
- Data-driven uncertainty-aware seakeeping prediction of the Delft 372 catamaran using ensemble Han...
Giorgio Palma, Andrea Serani, Matteo Diez
arxiv.org/abs/2511.04461 mastoxiv.page/@arXiv_eessSY_bo
- Generalized infinite dimensional Alpha-Procrustes based geometries
Salvish Goomanee, Andi Han, Pratik Jawanpuria, Bamdev Mishra
arxiv.org/abs/2511.09801 mastoxiv.page/@arXiv_statML_bo
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 16:07:58

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[3/6]:
- Towards Scalable Oversight via Partitioned Human Supervision
Ren Yin, Takashi Ishida, Masashi Sugiyama
arxiv.org/abs/2510.22500 mastoxiv.page/@arXiv_csLG_bot/
- ContextPilot: Fast Long-Context Inference via Context Reuse
Yinsicheng Jiang, Yeqi Huang, Liang Cheng, Cheng Deng, Xuan Sun, Luo Mai
arxiv.org/abs/2511.03475 mastoxiv.page/@arXiv_csLG_bot/
- Metabolomic Biomarker Discovery for ADHD Diagnosis Using Interpretable Machine Learning
Nabil Belacel, Mohamed Rachid Boulassel
arxiv.org/abs/2601.11283 mastoxiv.page/@arXiv_csLG_bot/
- PhysE-Inv: A Physics-Encoded Inverse Modeling approach for Arctic Snow Depth Prediction
Akila Sampath, Vandana Janeja, Jianwu Wang
arxiv.org/abs/2601.17074
- SAGE-5GC: Security-Aware Guidelines for Evaluating Anomaly Detection in the 5G Core Network
Cristian Manca, Christian Scano, Giorgio Piras, Fabio Brau, Maura Pintor, Battista Biggio
arxiv.org/abs/2602.03596
- LORE: Jointly Learning the Intrinsic Dimensionality and Relative Similarity Structure From Ordina...
Anand, Helbling, Davenport, Berman, Alagapan, Rozell
arxiv.org/abs/2602.04192
- Towards Robust Scaling Laws for Optimizers
Alexandra Volkova, Mher Safaryan, Christoph H. Lampert, Dan Alistarh
arxiv.org/abs/2602.07712 mastoxiv.page/@arXiv_csLG_bot/
- Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs
Sagnik Mukherjee, Lifan Yuan, Pavan Jayasinha, Dilek Hakkani-T\"ur, Hao Peng
arxiv.org/abs/2602.07729 mastoxiv.page/@arXiv_csLG_bot/
- AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine L...
Yuzhu Cai, Zexi Liu, Xinyu Zhu, Cheng Wang, Siheng Chen
arxiv.org/abs/2602.07906 mastoxiv.page/@arXiv_csLG_bot/
- VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Guobin Shen, Chenxiao Zhao, Xiang Cheng, Lei Huang, Xing Yu
arxiv.org/abs/2602.10693 mastoxiv.page/@arXiv_csLG_bot/
- KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
Zukang Xu, Zhixiong Zhao, Xing Hu, Zhixuan Chen, Dawei Yang
arxiv.org/abs/2602.11184 mastoxiv.page/@arXiv_csLG_bot/
- MUSE: Multi-Tenant Model Serving With Seamless Model Updates
Correia, Ferreira, Martins, Bento, Guerreiro, Pereira, Gomes, Bono, Ferreira, Bizarro
arxiv.org/abs/2602.11776 mastoxiv.page/@arXiv_csLG_bot/
- Pawsterior: Variational Flow Matching for Structured Simulation-Based Inference
Jorge Carrasco-Pollo, Floor Eijkelboom, Jan-Willem van de Meent
arxiv.org/abs/2602.13813 mastoxiv.page/@arXiv_csLG_bot/
- Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misa...
Hong Li, Zhen Zhou, Honggang Zhang, Yuping Luo, Xinyue Wang, Han Gong, Zhiyuan Liu
arxiv.org/abs/2602.14462 mastoxiv.page/@arXiv_csLG_bot/
- Divine Benevolence is an $x^2$: GLUs scale asymptotically faster than MLPs
Alejandro Francisco Queiruga
arxiv.org/abs/2602.14495 mastoxiv.page/@arXiv_csLG_bot/
- \"UberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset
DatologyAI, et al.
arxiv.org/abs/2602.15210 mastoxiv.page/@arXiv_csLG_bot/
- GLM-5: from Vibe Coding to Agentic Engineering
GLM-5-Team, et al.
arxiv.org/abs/2602.15763 mastoxiv.page/@arXiv_csLG_bot/
- Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganizat...
Jayadev Billa
arxiv.org/abs/2602.15997 mastoxiv.page/@arXiv_csLG_bot/
- AI-CARE: Carbon-Aware Reporting Evaluation Metric for AI Models
KC Santosh, Srikanth Baride, Rodrigue Rizk
arxiv.org/abs/2602.16042 mastoxiv.page/@arXiv_csLG_bot/
- Beyond Message Passing: A Symbolic Alternative for Expressive and Interpretable Graph Learning
Chuqin Geng, Li Zhang, Haolin Ye, Ziyu Zhao, Yuhe Jiang, Tara Saba, Xinyu Wang, Xujie Si
arxiv.org/abs/2602.16947 mastoxiv.page/@arXiv_csLG_bot/
toXiv_bot_toot

@Techmeme@techhub.social
2025-12-08 09:15:44

An interview with 10 Kenyan AI annotators shows Chinese companies hire data labelers via opaque middleman networks and WhatsApp groups to avoid accountability (Rest of World)
restofworld.org/2025/kenya-chi

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:39:51

Does Order Matter : Connecting The Law of Robustness to Robust Generalization
Himadri Mandal, Vishnu Varadarajan, Jaee Ponde, Aritra Das, Mihir More, Debayan Gupta
arxiv.org/abs/2602.20971 arxiv.org/pdf/2602.20971 arxiv.org/html/2602.20971
arXiv:2602.20971v1 Announce Type: new
Abstract: Bubeck and Sellke (2021) pose as an open problem the connection between the law of robustness and robust generalization. The law of robustness states that overparameterization is necessary for models to interpolate robustly; in particular, robust interpolation requires the learned function to be Lipschitz. Robust generalization asks whether small robust training loss implies small robust test loss. We resolve this problem by explicitly connecting the two for arbitrary data distributions. Specifically, we introduce a nontrivial notion of robust generalization error and convert it into a lower bound on the expected Rademacher complexity of the induced robust loss class. Our bounds recover the $\Omega(n^{1/d})$ regime of Wu et al.\ (2023) and show that, up to constants, robust generalization does not change the order of the Lipschitz constant required for smooth interpolation. We conduct experiments to probe the predicted scaling with dataset size and model capacity, testing whether empirical behavior aligns more closely with the predictions of Bubeck and Sellke (2021) or Wu et al.\ (2023). For MNIST, we find that the lower-bound Lipschitz constant scales on the order predicted by Wu et al.\ (2023). Informally, to obtain low robust generalization error, the Lipschitz constant must lie in a range that we bound, and the allowable perturbation radius is linked to the Lipschitz scale.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 16:08:29

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[6/6]:
- Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning
Chi-Pin Huang, Yunze Man, Zhiding Yu, Min-Hung Chen, Jan Kautz, Yu-Chiang Frank Wang, Fu-En Yang
arxiv.org/abs/2601.09708 mastoxiv.page/@arXiv_csCV_bot/
- Universality of Many-body Projected Ensemble for Learning Quantum Data Distribution
Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima
arxiv.org/abs/2601.18637 mastoxiv.page/@arXiv_quantph_b
- FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning
Haozheng Luo, Zhuolin Jiang, Md Zahid Hasan, Yan Chen, Soumalya Sarkar
arxiv.org/abs/2601.19001 mastoxiv.page/@arXiv_csCL_bot/
- Analysis of Shuffling Beyond Pure Local Differential Privacy
Shun Takagi, Seng Pei Liew
arxiv.org/abs/2601.19154 mastoxiv.page/@arXiv_csDS_bot/
- CryoLVM: Self-supervised Learning from Cryo-EM Density Maps with Large Vision Models
Weining Fu, Kai Shu, Kui Xu, Qiangfeng Cliff Zhang
arxiv.org/abs/2602.02620
- XtraLight-MedMamba for Classification of Neoplastic Tubular Adenomas
Sultana, Afsar, Rahu, Singh, Shula, Combs, Forchetti, Asari
arxiv.org/abs/2602.04819
- Flow-Based Conformal Predictive Distributions
Trevor Harris
arxiv.org/abs/2602.07633 mastoxiv.page/@arXiv_statML_bo
- GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing
Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin
arxiv.org/abs/2602.08550 mastoxiv.page/@arXiv_csCV_bot/
- UI-Venus-1.5 Technical Report
Venus Team, et al.
arxiv.org/abs/2602.09082 mastoxiv.page/@arXiv_csCV_bot/
- The Wisdom of Many Queries: Complexity-Diversity Principle for Dense Retriever Training
Xincan Feng, Noriki Nishida, Yusuke Sakai, Yuji Matsumoto
arxiv.org/abs/2602.09448 mastoxiv.page/@arXiv_csIR_bot/
- Intent Laundering: AI Safety Datasets Are Not What They Seem
Shahriar Golchin, Marc Wetter
arxiv.org/abs/2602.16729 mastoxiv.page/@arXiv_csCR_bot/
- The Metaphysics We Train: A Heideggerian Reading of Machine Learning
Heman Shakeri
arxiv.org/abs/2602.19028 mastoxiv.page/@arXiv_csCY_bot/
- Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks
David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi, Maksym Andriushchenko
arxiv.org/abs/2602.20156 mastoxiv.page/@arXiv_csCR_bot/
- A Very Big Video Reasoning Suite
Maijunxian Wang, et al.
arxiv.org/abs/2602.20159 mastoxiv.page/@arXiv_csCV_bot/
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:45:31

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs
Yining Hong, Huang Huang, Manling Li, Li Fei-Fei, Jiajun Wu, Yejin Choi
arxiv.org/abs/2602.21198 arxiv.org/pdf/2602.21198 arxiv.org/html/2602.21198
arXiv:2602.21198v1 Announce Type: new
Abstract: Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials where mistakes repeat rather than accumulate into experience. Drawing upon human reflective practitioners, we introduce Reflective Test-Time Planning, which integrates two modes of reflection: \textit{reflection-in-action}, where the agent uses test-time scaling to generate and score multiple candidate actions using internal reflections before execution; and \textit{reflection-on-action}, which uses test-time training to update both its internal reflection model and its action policy based on external reflections after execution. We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight for proper long-horizon credit assignment. Experiments on our newly-designed Long-Horizon Household benchmark and MuJoCo Cupboard Fitting benchmark show significant gains over baseline models, with ablative studies validating the complementary roles of reflection-in-action and reflection-on-action. Qualitative analyses, including real-robot trials, highlight behavioral correction through reflection.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2025-12-22 13:55:06

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[5/5]:
- CLAReSNet: When Convolution Meets Latent Attention for Hyperspectral Image Classification
Asmit Bandyopadhyay, Anindita Das Bhattacharjee, Rakesh Das
arxiv.org/abs/2511.12346 mastoxiv.page/@arXiv_csCV_bot/
- Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without ...
Dimitris Oikonomou, Nicolas Loizou
arxiv.org/abs/2512.02342 mastoxiv.page/@arXiv_mathOC_bo
- Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven App...
Karthik Prabhakar, Durgamadhab Mishra
arxiv.org/abs/2512.06699 mastoxiv.page/@arXiv_csPF_bot/
- Minimum Bayes Risk Decoding for Error Span Detection in Reference-Free Automatic Machine Translat...
Lyu, Song, Kamigaito, Ding, Tanaka, Utiyama, Funakoshi, Okumura
arxiv.org/abs/2512.07540 mastoxiv.page/@arXiv_csCL_bot/
- In-Context Learning for Seismic Data Processing
Fabian Fuchs, Mario Ruben Fernandez, Norman Ettrich, Janis Keuper
arxiv.org/abs/2512.11575 mastoxiv.page/@arXiv_csCV_bot/
- Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking
Rheeya Uppaal, Phu Mon Htut, Min Bai, Nikolaos Pappas, Zheng Qi, Sandesh Swamy
arxiv.org/abs/2512.12218 mastoxiv.page/@arXiv_csCV_bot/
- Non-Resolution Reasoning (NRR): A Computational Framework for Contextual Identity and Ambiguity P...
Kei Saito
arxiv.org/abs/2512.13478 mastoxiv.page/@arXiv_csCL_bot/
- Stylized Synthetic Augmentation further improves Corruption Robustness
Georg Siedel, Rojan Regmi, Abhirami Anand, Weijia Shao, Silvia Vock, Andrey Morozov
arxiv.org/abs/2512.15675 mastoxiv.page/@arXiv_csCV_bot/
- mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
Jonas Pai, Liam Achenbach, Victoriano Montesinos, Benedek Forrai, Oier Mees, Elvis Nava
arxiv.org/abs/2512.15692 mastoxiv.page/@arXiv_csRO_bot/
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2025-12-22 11:50:19

Crosslisted article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[1/3]:
- Optimizing Text Search: A Novel Pattern Matching Algorithm Based on Ukkonen's Approach
Xinyu Guan, Shaohua Zhang
arxiv.org/abs/2512.16927 mastoxiv.page/@arXiv_csDS_bot/
- SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization
Shravan Chaudhari, Rahul Thomas Jacob, Mononito Goswami, Jiajun Cao, Shihab Rashid, Christian Bock
arxiv.org/abs/2512.16956 mastoxiv.page/@arXiv_csSE_bot/
- MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval
Saksham Sahai Srivastava, Haoyu He
arxiv.org/abs/2512.16962 mastoxiv.page/@arXiv_csCR_bot/
- Colormap-Enhanced Vision Transformers for MRI-Based Multiclass (4-Class) Alzheimer's Disease Clas...
Faisal Ahmed
arxiv.org/abs/2512.16964 mastoxiv.page/@arXiv_eessIV_bo
- Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
Wanghan Xu, et al.
arxiv.org/abs/2512.16969 mastoxiv.page/@arXiv_csAI_bot/
- PAACE: A Plan-Aware Automated Agent Context Engineering Framework
Kamer Ali Yuksel
arxiv.org/abs/2512.16970 mastoxiv.page/@arXiv_csAI_bot/
- A Women's Health Benchmark for Large Language Models
Elisabeth Gruber, et al.
arxiv.org/abs/2512.17028 mastoxiv.page/@arXiv_csCL_bot/
- Perturb Your Data: Paraphrase-Guided Training Data Watermarking
Pranav Shetty, Mirazul Haque, Petr Babkin, Zhiqiang Ma, Xiaomo Liu, Manuela Veloso
arxiv.org/abs/2512.17075 mastoxiv.page/@arXiv_csCL_bot/
- Disentangled representations via score-based variational autoencoders
Benjamin S. H. Lyo, Eero P. Simoncelli, Cristina Savin
arxiv.org/abs/2512.17127 mastoxiv.page/@arXiv_statML_bo
- Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors
Huixin Zhan
arxiv.org/abs/2512.17146 mastoxiv.page/@arXiv_csCR_bot/
- Application of machine learning to predict food processing level using Open Food Facts
Arora, Chauhan, Rana, Aditya, Bhagat, Kumar, Kumar, Semar, Singh, Bagler
arxiv.org/abs/2512.17169 mastoxiv.page/@arXiv_qbioBM_bo
- Systemic Risk Radar: A Multi-Layer Graph Framework for Early Market Crash Warning
Sandeep Neela
arxiv.org/abs/2512.17185 mastoxiv.page/@arXiv_qfinRM_bo
- Do Foundational Audio Encoders Understand Music Structure?
Keisuke Toyama, Zhi Zhong, Akira Takahashi, Shusuke Takahashi, Yuki Mitsufuji
arxiv.org/abs/2512.17209 mastoxiv.page/@arXiv_csSD_bot/
- CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency
Xiao Liang, Yuxuan An, Di Wang, Jiawei Hu, Zhicheng Jiao, Bin Jing, Quan Wang
arxiv.org/abs/2512.17213 mastoxiv.page/@arXiv_csCV_bot/
- Machine Learning Assisted Parameter Tuning on Wavelet Transform Amorphous Radial Distribution Fun...
Deriyan Senjaya, Stephen Ekaputra Limantoro
arxiv.org/abs/2512.17245 mastoxiv.page/@arXiv_condmatmt
- AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs
Madhava Gaikwad
arxiv.org/abs/2512.17251 mastoxiv.page/@arXiv_csCR_bot/
- Practical Framework for Privacy-Preserving and Byzantine-robust Federated Learning
Baolei Zhang, Minghong Fang, Zhuqing Liu, Biao Yi, Peizhao Zhou, Yuan Wang, Tong Li, Zheli Liu
arxiv.org/abs/2512.17254 mastoxiv.page/@arXiv_csCR_bot/
- Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling A...
Abhivansh Gupta
arxiv.org/abs/2512.17259 mastoxiv.page/@arXiv_csMA_bot/
- Warmer for Less: A Cost-Efficient Strategy for Cold-Start Recommendations at Pinterest
Saeed Ebrahimi, Weijie Jiang, Jaewon Yang, Olafur Gudmundsson, Yucheng Tu, Huizhong Duan
arxiv.org/abs/2512.17277 mastoxiv.page/@arXiv_csIR_bot/
- LibriVAD: A Scalable Open Dataset with Deep Learning Benchmarks for Voice Activity Detection
Ioannis Stylianou, Achintya kr. Sarkar, Nauman Dawalatabad, James Glass, Zheng-Hua Tan
arxiv.org/abs/2512.17281 mastoxiv.page/@arXiv_csSD_bot/
- Penalized Fair Regression for Multiple Groups in Chronic Kidney Disease
Carter H. Nakamoto, Lucia Lushi Chen, Agata Foryciarz, Sherri Rose
arxiv.org/abs/2512.17340 mastoxiv.page/@arXiv_statME_bo
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2025-12-22 13:54:35

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[2/5]:
- The Diffusion Duality
Sahoo, Deschenaux, Gokaslan, Wang, Chiu, Kuleshov
arxiv.org/abs/2506.10892 mastoxiv.page/@arXiv_csLG_bot/
- Multimodal Representation Learning and Fusion
Jin, Ge, Xie, Luo, Song, Bi, Liang, Guan, Yeong, Song, Hao
arxiv.org/abs/2506.20494 mastoxiv.page/@arXiv_csLG_bot/
- The kernel of graph indices for vector search
Mariano Tepper, Ted Willke
arxiv.org/abs/2506.20584 mastoxiv.page/@arXiv_csLG_bot/
- OptScale: Probabilistic Optimality for Inference-time Scaling
Youkang Wang, Jian Wang, Rubing Chen, Xiao-Yong Wei
arxiv.org/abs/2506.22376 mastoxiv.page/@arXiv_csLG_bot/
- Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods
Fabian Akkerman, Julien Ferry, Christian Artigues, Emmanuel Hebrard, Thibaut Vidal
arxiv.org/abs/2507.18242 mastoxiv.page/@arXiv_csLG_bot/
- MolMark: Safeguarding Molecular Structures through Learnable Atom-Level Watermarking
Runwen Hu, Peilin Chen, Keyan Ding, Shiqi Wang
arxiv.org/abs/2508.17702 mastoxiv.page/@arXiv_csLG_bot/
- Dual-Distilled Heterogeneous Federated Learning with Adaptive Margins for Trainable Global Protot...
Fatema Siddika, Md Anwar Hossen, Wensheng Zhang, Anuj Sharma, Juan Pablo Mu\~noz, Ali Jannesari
arxiv.org/abs/2508.19009 mastoxiv.page/@arXiv_csLG_bot/
- STDiff: A State Transition Diffusion Framework for Time Series Imputation in Industrial Systems
Gary Simethy, Daniel Ortiz-Arroyo, Petar Durdevic
arxiv.org/abs/2508.19011 mastoxiv.page/@arXiv_csLG_bot/
- EEGDM: Learning EEG Representation with Latent Diffusion Model
Shaocong Wang, Tong Liu, Yihan Li, Ming Li, Kairui Wen, Pei Yang, Wenqi Ji, Minjing Yu, Yong-Jin Liu
arxiv.org/abs/2508.20705 mastoxiv.page/@arXiv_csLG_bot/
- Data-Free Continual Learning of Server Models in Model-Heterogeneous Cloud-Device Collaboration
Xiao Zhang, Zengzhe Chen, Yuan Yuan, Yifei Zou, Fuzhen Zhuang, Wenyu Jiao, Yuke Wang, Dongxiao Yu
arxiv.org/abs/2509.25977 mastoxiv.page/@arXiv_csLG_bot/
- Fine-Tuning Masked Diffusion for Provable Self-Correction
Jaeyeon Kim, Seunggeun Kim, Taekyun Lee, David Z. Pan, Hyeji Kim, Sham Kakade, Sitan Chen
arxiv.org/abs/2510.01384 mastoxiv.page/@arXiv_csLG_bot/
- A Generic Machine Learning Framework for Radio Frequency Fingerprinting
Alex Hiles, Bashar I. Ahmad
arxiv.org/abs/2510.09775 mastoxiv.page/@arXiv_csLG_bot/
- ASecond-Order SpikingSSM for Wearables
Kartikay Agrawal, Abhijeet Vikram, Vedant Sharma, Vaishnavi Nagabhushana, Ayon Borthakur
arxiv.org/abs/2510.14386 mastoxiv.page/@arXiv_csLG_bot/
- Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning
Heming Zou, Yixiu Mao, Yun Qu, Qi Wang, Xiangyang Ji
arxiv.org/abs/2510.16882 mastoxiv.page/@arXiv_csLG_bot/
- Seeing Structural Failure Before it Happens: An Image-Based Physics-Informed Neural Network (PINN...
Omer Jauhar Khan, Sudais Khan, Hafeez Anwar, Shahzeb Khan, Shams Ul Arifeen
arxiv.org/abs/2510.23117 mastoxiv.page/@arXiv_csLG_bot/
- Training Deep Physics-Informed Kolmogorov-Arnold Networks
Spyros Rigas, Fotios Anagnostopoulos, Michalis Papachristou, Georgios Alexandridis
arxiv.org/abs/2510.23501 mastoxiv.page/@arXiv_csLG_bot/
- Semi-Supervised Preference Optimization with Limited Feedback
Seonggyun Lee, Sungjun Lim, Seojin Park, Soeun Cheon, Kyungwoo Song
arxiv.org/abs/2511.00040 mastoxiv.page/@arXiv_csLG_bot/
- Towards Causal Market Simulators
Dennis Thumm, Luis Ontaneda Mijares
arxiv.org/abs/2511.04469 mastoxiv.page/@arXiv_csLG_bot/
- Incremental Generation is Necessary and Sufficient for Universality in Flow-Based Modelling
Hossein Rouhvarzi, Anastasis Kratsios
arxiv.org/abs/2511.09902 mastoxiv.page/@arXiv_csLG_bot/
- Optimizing Mixture of Block Attention
Guangxuan Xiao, Junxian Guo, Kasra Mazaheri, Song Han
arxiv.org/abs/2511.11571 mastoxiv.page/@arXiv_csLG_bot/
- Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs
Shasha Zhou, Mingyu Huang, Jack Cole, Charles Britton, Ming Yin, Jan Wolber, Ke Li
arxiv.org/abs/2511.12817 mastoxiv.page/@arXiv_csLG_bot/
toXiv_bot_toot