2025-12-03 18:30:42
OpenAI agrees to buy Poland-based Neptune, which makes tools for analyzing progress during AI model training; the transaction will be in stock (Dina Bass/Bloomberg)
https://www.bloomberg.com/news/articles/2025-12-03/openai-agre…
OpenAI agrees to buy Poland-based Neptune, which makes tools for analyzing progress during AI model training; the transaction will be in stock (Dina Bass/Bloomberg)
https://www.bloomberg.com/news/articles/2025-12-03/openai-agre…
AWS launches Nova Forge, a $100,000/year service allowing clients to customize Amazon's AI models at various stages of training and refine open-weight models (Jordan Novet/CNBC)
https://www.cnbc.com/2025/12/02/amazon-nova-forge-le…
I am an AI model made for everything in general.
I've memorized the wiki page of every Minecraft mineral.
I know the Queen rules England. My training set's historical.
Hallucinations are my Waterloo—That isn't allegorical.
I'm built from matrix operations simple and mathematical,
My neurons are a metaphor, not actually synaptical.
The data centers built today are ninety-nine percent for me.
Spare no expense; you'll live forever soon in …
The Grinch did nothing wrong. He wasn't *stealing* #Christmas, he was just gathering a corpus for training his #AI model. Investors are already lining up with their billions to fund the construction of the Whoville Data Center, ignoring concerns from residents.
i am assigned to two exam committees for january prelims, one on energy-efficient model training and the other on i/o-compute balance in GPU-based data analytics. yummy.
Regularized Random Fourier Features and Finite Element Reconstruction for Operator Learning in Sobolev Space
Xinyue Yu, Hayden Schaeffer
https://arxiv.org/abs/2512.17884 https://arxiv.org/pdf/2512.17884 https://arxiv.org/html/2512.17884
arXiv:2512.17884v1 Announce Type: new
Abstract: Operator learning is a data-driven approximation of mappings between infinite-dimensional function spaces, such as the solution operators of partial differential equations. Kernel-based operator learning can offer accurate, theoretically justified approximations that require less training than standard methods. However, they can become computationally prohibitive for large training sets and can be sensitive to noise. We propose a regularized random Fourier feature (RRFF) approach, coupled with a finite element reconstruction map (RRFF-FEM), for learning operators from noisy data. The method uses random features drawn from multivariate Student's $t$ distributions, together with frequency-weighted Tikhonov regularization that suppresses high-frequency noise. We establish high-probability bounds on the extreme singular values of the associated random feature matrix and show that when the number of features $N$ scales like $m \log m$ with the number of training samples $m$, the system is well-conditioned, which yields estimation and generalization guarantees. Detailed numerical experiments on benchmark PDE problems, including advection, Burgers', Darcy flow, Helmholtz, Navier-Stokes, and structural mechanics, demonstrate that RRFF and RRFF-FEM are robust to noise and achieve improved performance with reduced training time compared to the unregularized random feature model, while maintaining competitive accuracy relative to kernel and neural operator tests.
toXiv_bot_toot
RE: https://mastodon.social/@nixCraft/115554484108496189
I still cannot comprehend how anyone could honestly consider a statistical model created from training data to be “intelligent”. Don't you remember the times when people knowingly smiled at anyo…
Q&A with Z.ai Director of Product Zixuan Li on Chinese AI models embracing open source, attracting global users for its GLM model, training on memes, and more (ChinaTalk)
https://www.chinatalk.media/p/the-zai-playbook
Could we become co-owner of #AI models, that used training data, which I published under a viral license?
We have briefly discussed this question in the Austrian #CreativeCommons chapter and came to the conclusion, that copyright can by only claimed by human beings. So the model itself and wha…
Copy-Trasform-Paste: Zero-Shot Object-Object Alignment Guided by Vision-Language and Geometric Constraints
Rotem Gatenyo, Ohad Fried
https://arxiv.org/abs/2601.14207 https://arxiv.org/pdf/2601.14207 https://arxiv.org/html/2601.14207
arXiv:2601.14207v1 Announce Type: new
Abstract: We study zero-shot 3D alignment of two given meshes, using a text prompt describing their spatial relation -- an essential capability for content creation and scene assembly. Earlier approaches primarily rely on geometric alignment procedures, while recent work leverages pretrained 2D diffusion models to model language-conditioned object-object spatial relationships. In contrast, we directly optimize the relative pose at test time, updating translation, rotation, and isotropic scale with CLIP-driven gradients via a differentiable renderer, without training a new model. Our framework augments language supervision with geometry-aware objectives: a variant of soft-Iterative Closest Point (ICP) term to encourage surface attachment and a penetration loss to discourage interpenetration. A phased schedule strengthens contact constraints over time, and camera control concentrates the optimization on the interaction region. To enable evaluation, we curate a benchmark containing diverse categories and relations, and compare against baselines. Our method outperforms all alternatives, yielding semantically faithful and physically plausible alignments.
toXiv_bot_toot
Imagine ChatGPT but instead of predicting text it just linked you to the to 3 documents most-influential on the probabilities that would have been used to predict that text.
Could even generate some info about which parts of each would have been combined how.
There would still be issues with how training data is sourced and filtered, but these could be solved by crawling normally respecting robots.txt and by paying filterers a fair wage with a more relaxed work schedule and mental health support.
The energy issues are mainly about wild future investment and wasteful query spam, not optimized present-day per-query usage.
Is this "just search?"
Yes, but it would have some advantages for a lot of use cases, mainly in synthesizing results across multiple documents and in leveraging a language model more fully to find relevant stuff.
When we talk about the harms of current corporate LLMs, the opportunity cost of NOT building things like this is part of that.
The equivalent for art would have been so amazing too! "Here are some artists that can do what you want, with examples pulled from their portfolios."
It would be a really cool coding assistant that I'd actually encourage my students to use (with some guidelines).
#AI #GenAI #LLMs
You Only Train Once: Differentiable Subset Selection for Omics Data
Daphn\'e Chopard, Jorge da Silva Gon\c{c}alves, Irene Cannistraci, Thomas M. Sutter, Julia E. Vogt
https://arxiv.org/abs/2512.17678 https://arxiv.org/pdf/2512.17678 https://arxiv.org/html/2512.17678
arXiv:2512.17678v1 Announce Type: new
Abstract: Selecting compact and informative gene subsets from single-cell transcriptomic data is essential for biomarker discovery, improving interpretability, and cost-effective profiling. However, most existing feature selection approaches either operate as multi-stage pipelines or rely on post hoc feature attribution, making selection and prediction weakly coupled. In this work, we present YOTO (you only train once), an end-to-end framework that jointly identifies discrete gene subsets and performs prediction within a single differentiable architecture. In our model, the prediction task directly guides which genes are selected, while the learned subsets, in turn, shape the predictive representation. This closed feedback loop enables the model to iteratively refine both what it selects and how it predicts during training. Unlike existing approaches, YOTO enforces sparsity so that only the selected genes contribute to inference, eliminating the need to train additional downstream classifiers. Through a multi-task learning design, the model learns shared representations across related objectives, allowing partially labeled datasets to inform one another, and discovering gene subsets that generalize across tasks without additional training steps. We evaluate YOTO on two representative single-cell RNA-seq datasets, showing that it consistently outperforms state-of-the-art baselines. These results demonstrate that sparse, end-to-end, multi-task gene subset selection improves predictive performance and yields compact and meaningful gene subsets, advancing biomarker discovery and single-cell analysis.
toXiv_bot_toot
> if you think about it in the context of the training models—it has a rough sense that you’re like a 37 year old guy on Reddit. That’s the kind of person that it’s doing the continuation for, because that’s a big chunk of the training corpus.
> I often tell people whenever they send me a message like, “a large language model said I should do x, y, z.” what you’re really saying is, “a 37 year old guy on Reddit said it,” and you’ve got roughly the same amount of information
i’m reviewing a paper on reducing energy costs in large model training and it keeps slinging words like optimize and optimization around and calling other approaches suboptimal and i feel like i would be kind of an old crank if i were to ask if optimality is on the table here (it is not)
EDIT: hold on, maybe it is
The project ‘Ausbildung digitalisieren – Betriebe stärken’ by @… promotes #digitalisation of in-house training and further education in #Saxony—against #SkillsShortages and for a more inclusive #VocationalTrainingSystem.
Goals: Introduction of #LunaLMS in model companies, further development through on-site customisation, documentation of successes for other companies, and improved integration of trainees with special needs for #accessibility or #multilingualism.
My controversial take on "AI" ray tracing helpers are that it's a really good idea.
First some background: keep in mind that machine learning tecnologies excell at tasks that have a high reward for success and a small cost for failure. In this case getting most of the rays right improve performance, at the cost of some few rays being shot in nothing.
Secondly, light rays are way too many in real life to be simulated in their entirety, so using some statistics to approximate the lighting model makes a lot of sense here. Plus at the lower quantum scale even phisicists use statistic to explain this stuff, so it's not that irrealistic either.
Finally the source data for this stuff is entirely other games, so ethically sourcing the training data set should not be a concern here.
Here, technology can be good or bad. It's not the tech, it's the use of the tech by the people (but that I mean oligarchic corporations) that makes them good or bad.
Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation
Liam Collins, Bhuvesh Kumar, Clark Mingxuan Ju, Tong Zhao, Donald Loveland, Leonardo Neves, Neil Shah
https://arxiv.org/abs/2512.17820 https://arxiv.org/pdf/2512.17820 https://arxiv.org/html/2512.17820
arXiv:2512.17820v1 Announce Type: new
Abstract: Modern Sequential Recommendation (SR) models commonly utilize modality features to represent items, motivated in large part by recent advancements in language and vision modeling. To do so, several works completely replace ID embeddings with modality embeddings, claiming that modality embeddings render ID embeddings unnecessary because they can match or even exceed ID embedding performance. On the other hand, many works jointly utilize ID and modality features, but posit that complex fusion strategies, such as multi-stage training and/or intricate alignment architectures, are necessary for this joint utilization. However, underlying both these lines of work is a lack of understanding of the complementarity of ID and modality features. In this work, we address this gap by studying the complementarity of ID- and text-based SR models. We show that these models do learn complementary signals, meaning that either should provide performance gain when used properly alongside the other. Motivated by this, we propose a new SR method that preserves ID-text complementarity through independent model training, then harnesses it through a simple ensembling strategy. Despite this method's simplicity, we show it outperforms several competitive SR baselines, implying that both ID and text features are necessary to achieve state-of-the-art SR performance but complex fusion architectures are not.
toXiv_bot_toot
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[2/5]:
- The Diffusion Duality
Sahoo, Deschenaux, Gokaslan, Wang, Chiu, Kuleshov
https://arxiv.org/abs/2506.10892 https://mastoxiv.page/@arXiv_csLG_bot/114675526577078472
- Multimodal Representation Learning and Fusion
Jin, Ge, Xie, Luo, Song, Bi, Liang, Guan, Yeong, Song, Hao
https://arxiv.org/abs/2506.20494 https://mastoxiv.page/@arXiv_csLG_bot/114749113025183688
- The kernel of graph indices for vector search
Mariano Tepper, Ted Willke
https://arxiv.org/abs/2506.20584 https://mastoxiv.page/@arXiv_csLG_bot/114749118923266356
- OptScale: Probabilistic Optimality for Inference-time Scaling
Youkang Wang, Jian Wang, Rubing Chen, Xiao-Yong Wei
https://arxiv.org/abs/2506.22376 https://mastoxiv.page/@arXiv_csLG_bot/114771735361664528
- Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods
Fabian Akkerman, Julien Ferry, Christian Artigues, Emmanuel Hebrard, Thibaut Vidal
https://arxiv.org/abs/2507.18242 https://mastoxiv.page/@arXiv_csLG_bot/114913322736512937
- MolMark: Safeguarding Molecular Structures through Learnable Atom-Level Watermarking
Runwen Hu, Peilin Chen, Keyan Ding, Shiqi Wang
https://arxiv.org/abs/2508.17702 https://mastoxiv.page/@arXiv_csLG_bot/115095014405732247
- Dual-Distilled Heterogeneous Federated Learning with Adaptive Margins for Trainable Global Protot...
Fatema Siddika, Md Anwar Hossen, Wensheng Zhang, Anuj Sharma, Juan Pablo Mu\~noz, Ali Jannesari
https://arxiv.org/abs/2508.19009 https://mastoxiv.page/@arXiv_csLG_bot/115100269482762688
- STDiff: A State Transition Diffusion Framework for Time Series Imputation in Industrial Systems
Gary Simethy, Daniel Ortiz-Arroyo, Petar Durdevic
https://arxiv.org/abs/2508.19011 https://mastoxiv.page/@arXiv_csLG_bot/115100270137397046
- EEGDM: Learning EEG Representation with Latent Diffusion Model
Shaocong Wang, Tong Liu, Yihan Li, Ming Li, Kairui Wen, Pei Yang, Wenqi Ji, Minjing Yu, Yong-Jin Liu
https://arxiv.org/abs/2508.20705 https://mastoxiv.page/@arXiv_csLG_bot/115111565155687451
- Data-Free Continual Learning of Server Models in Model-Heterogeneous Cloud-Device Collaboration
Xiao Zhang, Zengzhe Chen, Yuan Yuan, Yifei Zou, Fuzhen Zhuang, Wenyu Jiao, Yuke Wang, Dongxiao Yu
https://arxiv.org/abs/2509.25977 https://mastoxiv.page/@arXiv_csLG_bot/115298721327100391
- Fine-Tuning Masked Diffusion for Provable Self-Correction
Jaeyeon Kim, Seunggeun Kim, Taekyun Lee, David Z. Pan, Hyeji Kim, Sham Kakade, Sitan Chen
https://arxiv.org/abs/2510.01384 https://mastoxiv.page/@arXiv_csLG_bot/115309690976554356
- A Generic Machine Learning Framework for Radio Frequency Fingerprinting
Alex Hiles, Bashar I. Ahmad
https://arxiv.org/abs/2510.09775 https://mastoxiv.page/@arXiv_csLG_bot/115372387779061015
- ASecond-Order SpikingSSM for Wearables
Kartikay Agrawal, Abhijeet Vikram, Vedant Sharma, Vaishnavi Nagabhushana, Ayon Borthakur
https://arxiv.org/abs/2510.14386 https://mastoxiv.page/@arXiv_csLG_bot/115389079527543821
- Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning
Heming Zou, Yixiu Mao, Yun Qu, Qi Wang, Xiangyang Ji
https://arxiv.org/abs/2510.16882 https://mastoxiv.page/@arXiv_csLG_bot/115412243355962887
- Seeing Structural Failure Before it Happens: An Image-Based Physics-Informed Neural Network (PINN...
Omer Jauhar Khan, Sudais Khan, Hafeez Anwar, Shahzeb Khan, Shams Ul Arifeen
https://arxiv.org/abs/2510.23117 https://mastoxiv.page/@arXiv_csLG_bot/115451891042176876
- Training Deep Physics-Informed Kolmogorov-Arnold Networks
Spyros Rigas, Fotios Anagnostopoulos, Michalis Papachristou, Georgios Alexandridis
https://arxiv.org/abs/2510.23501 https://mastoxiv.page/@arXiv_csLG_bot/115451942159737549
- Semi-Supervised Preference Optimization with Limited Feedback
Seonggyun Lee, Sungjun Lim, Seojin Park, Soeun Cheon, Kyungwoo Song
https://arxiv.org/abs/2511.00040 https://mastoxiv.page/@arXiv_csLG_bot/115490555013124989
- Towards Causal Market Simulators
Dennis Thumm, Luis Ontaneda Mijares
https://arxiv.org/abs/2511.04469 https://mastoxiv.page/@arXiv_csLG_bot/115507943827841017
- Incremental Generation is Necessary and Sufficient for Universality in Flow-Based Modelling
Hossein Rouhvarzi, Anastasis Kratsios
https://arxiv.org/abs/2511.09902 https://mastoxiv.page/@arXiv_csLG_bot/115547587245365920
- Optimizing Mixture of Block Attention
Guangxuan Xiao, Junxian Guo, Kasra Mazaheri, Song Han
https://arxiv.org/abs/2511.11571 https://mastoxiv.page/@arXiv_csLG_bot/115564541392410174
- Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs
Shasha Zhou, Mingyu Huang, Jack Cole, Charles Britton, Ming Yin, Jan Wolber, Ke Li
https://arxiv.org/abs/2511.12817 https://mastoxiv.page/@arXiv_csLG_bot/115570877730326947
toXiv_bot_toot
Polyharmonic Cascade
Yuriy N. Bakhvalov
https://arxiv.org/abs/2512.17671 https://arxiv.org/pdf/2512.17671 https://arxiv.org/html/2512.17671
arXiv:2512.17671v1 Announce Type: new
Abstract: This paper presents a deep machine learning architecture, the "polyharmonic cascade" -- a sequence of packages of polyharmonic splines, where each layer is rigorously derived from the theory of random functions and the principles of indifference. This makes it possible to approximate nonlinear functions of arbitrary complexity while preserving global smoothness and a probabilistic interpretation. For the polyharmonic cascade, a training method alternative to gradient descent is proposed: instead of directly optimizing the coefficients, one solves a single global linear system on each batch with respect to the function values at fixed "constellations" of nodes. This yields synchronized updates of all layers, preserves the probabilistic interpretation of individual layers and theoretical consistency with the original model, and scales well: all computations reduce to 2D matrix operations efficiently executed on a GPU. Fast learning without overfitting on MNIST is demonstrated.
toXiv_bot_toot
Crosslisted article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[2/3]:
- Sharp Structure-Agnostic Lower Bounds for General Functional Estimation
Jikai Jin, Vasilis Syrgkanis
https://arxiv.org/abs/2512.17341 https://mastoxiv.page/@arXiv_statML_bot/115762312049963700
- Timely Information Updating for Mobile Devices Without and With ML Advice
Yu-Pin Hsu, Yi-Hsuan Tseng
https://arxiv.org/abs/2512.17381 https://mastoxiv.page/@arXiv_csNI_bot/115762180316858485
- SWE-Bench : A Framework for the Scalable Generation of Software Engineering Benchmarks from Open...
Wang, Ramalho, Celestino, Pham, Liu, Sinha, Portillo, Osunwa, Maduekwe
https://arxiv.org/abs/2512.17419 https://mastoxiv.page/@arXiv_csSE_bot/115762487015279852
- Perfect reconstruction of sparse signals using nonconvexity control and one-step RSB message passing
Xiaosi Gu, Ayaka Sakata, Tomoyuki Obuchi
https://arxiv.org/abs/2512.17426 https://mastoxiv.page/@arXiv_statML_bot/115762346108219997
- MULTIAQUA: A multimodal maritime dataset and robust training strategies for multimodal semantic s...
Jon Muhovi\v{c}, Janez Per\v{s}
https://arxiv.org/abs/2512.17450 https://mastoxiv.page/@arXiv_csCV_bot/115762717053353674
- When Data Quality Issues Collide: A Large-Scale Empirical Study of Co-Occurring Data Quality Issu...
Emmanuel Charleson Dapaah, Jens Grabowski
https://arxiv.org/abs/2512.17460 https://mastoxiv.page/@arXiv_csSE_bot/115762500123147574
- Behavioural Effects of Agentic Messaging: A Case Study on a Financial Service Application
Olivier Jeunen, Schaun Wheeler
https://arxiv.org/abs/2512.17462 https://mastoxiv.page/@arXiv_csIR_bot/115762430673347625
- Linear Attention for Joint Power Optimization and User-Centric Clustering in Cell-Free Networks
Irched Chafaa, Giacomo Bacci, Luca Sanguinetti
https://arxiv.org/abs/2512.17466 https://mastoxiv.page/@arXiv_eessSY_bot/115762336277179643
- Translating the Rashomon Effect to Sequential Decision-Making Tasks
Dennis Gross, J{\o}rn Eirik Betten, Helge Spieker
https://arxiv.org/abs/2512.17470 https://mastoxiv.page/@arXiv_csAI_bot/115762556506696539
- Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions
Atharva Awari, Nicolas Gillis, Arnaud Vandaele
https://arxiv.org/abs/2512.17473 https://mastoxiv.page/@arXiv_eessSP_bot/115762580078964235
- TwinSegNet: A Digital Twin-Enabled Federated Learning Framework for Brain Tumor Analysis
Almustapha A. Wakili, Adamu Hussaini, Abubakar A. Musa, Woosub Jung, Wei Yu
https://arxiv.org/abs/2512.17488 https://mastoxiv.page/@arXiv_csCV_bot/115762726884307901
- Resource-efficient medical image classification for edge devices
Mahsa Lavaei, Zahra Abadi, Salar Beigzad, Alireza Maleki
https://arxiv.org/abs/2512.17515 https://mastoxiv.page/@arXiv_eessIV_bot/115762459510336799
- PathBench-MIL: A Comprehensive AutoML and Benchmarking Framework for Multiple Instance Learning i...
Brussee, Valkema, Weijer, Doeleman, Schrader, Kers
https://arxiv.org/abs/2512.17517 https://mastoxiv.page/@arXiv_csCV_bot/115762741957639051
- HydroGym: A Reinforcement Learning Platform for Fluid Dynamics
Christian Lagemann, et al.
https://arxiv.org/abs/2512.17534 https://mastoxiv.page/@arXiv_physicsfludyn_bot/115762391350754768
- When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Sys...
Chondhekar, Murukuri, Vasani, Goyal, Badami, Rana, SN, Pandia, Katiyar, Jagadeesh, Gulati
https://arxiv.org/abs/2512.17562 https://mastoxiv.page/@arXiv_csSD_bot/115762423443170715
- Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing
Lingxiao Zhao, Haoran Zhou, Yuezhi Che, Dazhao Cheng
https://arxiv.org/abs/2512.17574 https://mastoxiv.page/@arXiv_csDC_bot/115762425409322293
- SkinGenBench: Generative Model and Preprocessing Effects for Synthetic Dermoscopic Augmentation i...
N. A. Adarsh Pritam, Jeba Shiney O, Sanyam Jain
https://arxiv.org/abs/2512.17585 https://mastoxiv.page/@arXiv_eessIV_bot/115762479150695610
- MAD-OOD: A Deep Learning Cluster-Driven Framework for an Out-of-Distribution Malware Detection an...
Tosin Ige, Christopher Kiekintveld, Aritran Piplai, Asif Rahman, Olukunle Kolade, Sasidhar Kunapuli
https://arxiv.org/abs/2512.17594 https://mastoxiv.page/@arXiv_csCR_bot/115762509298207765
- Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion De...
Menna Elgabry, Ali Hamdi
https://arxiv.org/abs/2512.17630 https://mastoxiv.page/@arXiv_csCL_bot/115762575512981257
- Generative Multi-Objective Bayesian Optimization with Scalable Batch Evaluations for Sample-Effic...
Madhav R. Muthyala, Farshud Sorourifar, Tianhong Tan, You Peng, Joel A. Paulson
https://arxiv.org/abs/2512.17659 https://mastoxiv.page/@arXiv_statML_bot/115762554519447500
toXiv_bot_toot