Basis, which builds AI agents to help accounting firms with tasks like tax returns, raised $100M led by Accel at a $1.15B valuation, for $138M in total funding (Rebecca Torrence/Bloomberg)
https://www.bloomberg.com/news/articles/20
🥳 New Kitten¹ release
• Added `initialise()` hook to `kitten.Component` instances.
This gets called at the end of the constructor and is handy if you don’t want to override the constructor and have to handle the `data` parameter and remember to call `super(data)`. You can still access passed data from `this.data`.
Note that the component is not part of the view hierarchy on the client at this point. If you have tasks you need to perform only once per page – for example, ins…
windsurfers: Windsurfers network (1986)
A network of interpersonal contacts among windsurfers in southern California during the Fall of 1986. The edge weights indicate the perception of social affiliations majored by the tasks in which each individual was asked to sort cards with other surfer’s name in the order of closeness.
This network has 43 nodes and 336 edges.
Tags: Social, Offline, Weighted
heise | To-Do-Apps im Vergleich: Google Tasks vs. Zenkit, Todoist und Tasks.org
Endlich nichts mehr vergessen: Google Tasks erinnert pünktlich an Aufgaben, Pflichten und Geburtstage. Wir zeigen, welche Apps mehr Komfort und Features bieten.
Been using a number of AI models over the past week or so as work has slowed down, giving me time to explore things more deeply.
Been using Claude Code with musistudio/claude-code-router which is great as I can switch between different models on similar tasks.
Experience so far has been that Gemini 3 Flash is very good for thinking and coding tasks but the code does tend to be fragile so rewrites are needed. For tough problems where the errors are not straightforward it falls d…
From Isolation to Integration: Building an Adaptive Expert Forest for Pre-Trained Model-based Class-Incremental Learning
Ruiqi Liu, Boyu Diao, Hangda Liu, Zhulin An, Fei Wang, Yongjun Xu
https://arxiv.org/abs/2602.20911 https://arxiv.org/pdf/2602.20911 https://arxiv.org/html/2602.20911
arXiv:2602.20911v1 Announce Type: new
Abstract: Class-Incremental Learning (CIL) requires models to learn new classes without forgetting old ones. A common method is to freeze a pre-trained model and train a new, lightweight adapter for each task. While this prevents forgetting, it treats the learned knowledge as a simple, unstructured collection and fails to use the relationships between tasks. To this end, we propose the Semantic-guided Adaptive Expert Forest (SAEF), a new method that organizes adapters into a structured hierarchy for better knowledge sharing. SAEF first groups tasks into conceptual clusters based on their semantic relationships. Then, within each cluster, it builds a balanced expert tree by creating new adapters from merging the adapters of similar tasks. At inference time, SAEF finds and activates a set of relevant experts from the forest for any given input. The final prediction is made by combining the outputs of these activated experts, weighted by how confident each expert is. Experiments on several benchmark datasets show that SAEF achieves SOTA performance.
toXiv_bot_toot
I yearn for C 26 and not having to do repetitive tasks ever again to have a modicum of reflection. I want my static reflections yesterday, and my template for one day before that.
Dynamic reversal of IT-PFC information flow orchestrates visual categorization under perceptual uncertainty https://www.biorxiv.org/content/10.64898/2025.12.17.695044v1 Quite a mouthful to say that "the brain actually reverses its information flow when things get blurr…
Oh, this is cool: A mind mapper for the terminal #cli
This is the level of prep my wife brings to Christmas lunch. Can you tell she's a scientist? As you can see, it's going well. I've been assigned a few tasks, but she mostly wants to do this herself, it seems.
#Christmas
Beijing-based DP Technology, which develops AI tools used by researchers for tasks like computer-aided drug design and battery design, raised a ~$114M Series C (Eunice Xu/South China Morning Post)
https://www.scmp.com/business/companies/ar
mich nervt es wenn programme eine plugin schnittstelle haben aber keine möglichkeit damit alle funktionalitäten des programms zu erweitern
zb kann man wohl in gimp keine zusätzlichen tools zur toolbox hinzufügen sodass man sich immer durch menüs klicken muss
es lohnt also nicht wirklich für wiederkehrende tasks ein plugin zu entwickeln
<…
Experimental insights into data augmentation techniques for deep learning-based multimode fiber imaging: limitations and success
Jawaria Maqbool, M. Imran Cheema
https://arxiv.org/abs/2511.19072 https://arxiv.org/pdf/2511.19072 https://arxiv.org/html/2511.19072
arXiv:2511.19072v1 Announce Type: new
Abstract: Multimode fiber~(MMF) imaging using deep learning has high potential to produce compact, minimally invasive endoscopic systems. Nevertheless, it relies on large, diverse real-world medical data, whose availability is limited by privacy concerns and practical challenges. Although data augmentation has been extensively studied in various other deep learning tasks, it has not been systematically explored for MMF imaging. This work provides the first in-depth experimental and computational study on the efficacy and limitations of augmentation techniques in this field. We demonstrate that standard image transformations and conditional generative adversarial-based synthetic speckle generation fail to improve, or even deteriorate, reconstruction quality, as they neglect the complex modal interference and dispersion that results in speckle formation. To address this, we introduce a physical data augmentation method in which only organ images are digitally transformed, while their corresponding speckles are experimentally acquired via fiber. This approach preserves the physics of light-fiber interaction and enhances the reconstruction structural similarity index measure~(SSIM) by up to 17\%, forming a viable system for reliable MMF imaging under limited data conditions.
toXiv_bot_toot
Scaling Vision Transformers: Evaluating DeepSpeed for Image-Centric Workloads
Huy Trinh, Rebecca Ma, Zeqi Yu, Tahsin Reza
https://arxiv.org/abs/2602.21081 https://arxiv.org/pdf/2602.21081 https://arxiv.org/html/2602.21081
arXiv:2602.21081v1 Announce Type: new
Abstract: Vision Transformers (ViTs) have demonstrated remarkable potential in image processing tasks by utilizing self-attention mechanisms to capture global relationships within data. However, their scalability is hindered by significant computational and memory demands, especially for large-scale models with many parameters. This study aims to leverage DeepSpeed, a highly efficient distributed training framework that is commonly used for language models, to enhance the scalability and performance of ViTs. We evaluate intra- and inter-node training efficiency across multiple GPU configurations on various datasets like CIFAR-10 and CIFAR-100, exploring the impact of distributed data parallelism on training speed, communication overhead, and overall scalability (strong and weak scaling). By systematically varying software parameters, such as batch size and gradient accumulation, we identify key factors influencing performance of distributed training. The experiments in this study provide a foundational basis for applying DeepSpeed to image-related tasks. Future work will extend these investigations to deepen our understanding of DeepSpeed's limitations and explore strategies for optimizing distributed training pipelines for Vision Transformers.
toXiv_bot_toot
Interesting methodology for tracking AI progress. Categorize tasks by how long they take for an expert human. This is a decent proxy for task complexity.
Then ask what's the most complex (i.e. longest-taking) task that a model can complete with a 50% success rate.
Measure this over time and log-plot to estimate the doubling time.
https://met…
China's MiniMax releases M2.1, an upgrade to its open-source M2 model that it says has "significantly enhanced" coding capabilities in Rust, Java, and others (MiniMax)
https://www.minimax.io/news/minimax-m21
Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training
Anas Barakat, Souradip Chakraborty, Khushbu Pahwa, Amrit Singh Bedi
https://arxiv.org/abs/2602.21189 https://arxiv.org/pdf/2602.21189 https://arxiv.org/html/2602.21189
arXiv:2602.21189v1 Announce Type: new
Abstract: Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical reasoning, code generation, and short-answer reasoning. It defines success if any of $k$ independently sampled solutions passes a verifier. This multi-sample inference metric has motivated inference-aware fine-tuning methods that directly optimize pass@$k$. However, prior work reports a recurring trade-off: pass@k improves while pass@1 degrades under such methods. This trade-off is practically important because pass@1 often remains a hard operational constraint due to latency and cost budgets, imperfect verifier coverage, and the need for a reliable single-shot fallback. We study the origin of this trade-off and provide a theoretical characterization of when pass@k policy optimization can reduce pass@1 through gradient conflict induced by prompt interference. We show that pass@$k$ policy gradients can conflict with pass@1 gradients because pass@$k$ optimization implicitly reweights prompts toward low-success prompts; when these prompts are what we term negatively interfering, their upweighting can rotate the pass@k update direction away from the pass@1 direction. We illustrate our theoretical findings with large language model experiments on verifiable mathematical reasoning tasks.
toXiv_bot_toot
It’s important to distinguish two different hypothetical ways in which gen AI can constitute a massive wealth transfer:
Scenario 1, “LLMs are the new petrochemicals:” Gen AI is actually effective for all sorts of tasks as advertised. It becomes a necessity for economic participation / useful work / whatever, and ownership of the data model and/or data centers thus means control of high-value resources.
2/
Google’s vibe-coding tool, Opal,
is making its way to Gemini.
The company on Wednesday said it is integrating the tool,
which lets you build AI-powered mini apps,
inside the Gemini web app,
allowing users to create their own custom apps,
which Google calls Gems.
Introduced in 2024,
Gems are customized versions of Gemini designed for specific tasks or scenarios.
For instance, some of Google’s pre-made Gems include
a learning coach,…
UrbanFM: Scaling Urban Spatio-Temporal Foundation Models
Wei Chen, Yuqian Wu, Junle Chen, Xiaofang Zhou, Yuxuan Liang
https://arxiv.org/abs/2602.20677 https://arxiv.org/pdf/2602.20677 https://arxiv.org/html/2602.20677
arXiv:2602.20677v1 Announce Type: new
Abstract: Urban systems, as dynamic complex systems, continuously generate spatio-temporal data streams that encode the fundamental laws of human mobility and city evolution. While AI for Science has witnessed the transformative power of foundation models in disciplines like genomics and meteorology, urban computing remains fragmented due to "scenario-specific" models, which are overfitted to specific regions or tasks, hindering their generalizability. To bridge this gap and advance spatio-temporal foundation models for urban systems, we adopt scaling as the central perspective and systematically investigate two key questions: what to scale and how to scale. Grounded in first-principles analysis, we identify three critical dimensions: heterogeneity, correlation, and dynamics, aligning these principles with the fundamental scientific properties of urban spatio-temporal data. Specifically, to address heterogeneity through data scaling, we construct WorldST. This billion-scale corpus standardizes diverse physical signals, such as traffic flow and speed, from over 100 global cities into a unified data format. To enable computation scaling for modeling correlations, we introduce the MiniST unit, a novel split mechanism that discretizes continuous spatio-temporal fields into learnable computational units to unify representations of grid-based and sensor-based observations. Finally, addressing dynamics via architecture scaling, we propose UrbanFM, a minimalist self-attention architecture designed with limited inductive biases to autonomously learn dynamic spatio-temporal dependencies from massive data. Furthermore, we establish EvalST, the largest-scale urban spatio-temporal benchmark to date. Extensive experiments demonstrate that UrbanFM achieves remarkable zero-shot generalization across unseen cities and tasks, marking a pivotal first step toward large-scale urban spatio-temporal foundation models.
toXiv_bot_toot
Do you use LLMs to generate regular expressions? We do, too! Do you *review* your regexes? Is that frustrating? How can we put humans in the loop better, doing relatively few, meaningful tasks? Please try out our new tool PICK:regex, available for VSCode!
https://blog.brownplt.org/2025/12/11/p
Anyone know of an Android to-do list application that is
* Completely device local, no network connectivity required or used
* No ads or spyware
* Doesn't time-out tasks even if they sit around for a year uncompleted (looking at you, google calendar)
* Supports recurring maintenance tasks for weekly, monthly, etc. cleaning or something
Open source preferred, but willing to pay a reasonable price if it's out there as a commercial tool
Good website #uber guys...
I needed to update my password. Then I couldn't go back (yes yes, outside of my browser's back button)
Hierarchic-EEG2Text: Assessing EEG-To-Text Decoding across Hierarchical Abstraction Levels
Anupam Sharma, Harish Katti, Prajwal Singh, Shanmuganathan Raman, Krishna Miyapuram
https://arxiv.org/abs/2602.20932 https://arxiv.org/pdf/2602.20932 https://arxiv.org/html/2602.20932
arXiv:2602.20932v1 Announce Type: new
Abstract: An electroencephalogram (EEG) records the spatially averaged electrical activity of neurons in the brain, measured from the human scalp. Prior studies have explored EEG-based classification of objects or concepts, often for passive viewing of briefly presented image or video stimuli, with limited classes. Because EEG exhibits a low signal-to-noise ratio, recognizing fine-grained representations across a large number of classes remains challenging; however, abstract-level object representations may exist. In this work, we investigate whether EEG captures object representations across multiple hierarchical levels, and propose episodic analysis, in which a Machine Learning (ML) model is evaluated across various, yet related, classification tasks (episodes). Unlike prior episodic EEG studies that rely on fixed or randomly sampled classes of equal cardinality, we adopt hierarchy-aware episode sampling using WordNet to generate episodes with variable classes of diverse hierarchy. We also present the largest episodic framework in the EEG domain for detecting observed text from EEG signals in the PEERS dataset, comprising $931538$ EEG samples under $1610$ object labels, acquired from $264$ human participants (subjects) performing controlled cognitive tasks, enabling the study of neural dynamics underlying perception, decision-making, and performance monitoring.
We examine how the semantic abstraction level affects classification performance across multiple learning techniques and architectures, providing a comprehensive analysis. The models tend to improve performance when the classification categories are drawn from higher levels of the hierarchy, suggesting sensitivity to abstraction. Our work highlights abstraction depth as an underexplored dimension of EEG decoding and motivates future research in this direction.
toXiv_bot_toot
Tom's Hardware has a headline this week summarizing a Financial Times interview with a Microsoft #AI exec that begins thusly:
"Microsoft’s AI boss says AI can replace every white-collar job in 18 months".
If you watch the interview, that is not what was said. The statement is a bit more nuanced claim that AI can fully automate tasks of some white collar work.
But I…
windsurfers: Windsurfers network (1986)
A network of interpersonal contacts among windsurfers in southern California during the Fall of 1986. The edge weights indicate the perception of social affiliations majored by the tasks in which each individual was asked to sort cards with other surfer’s name in the order of closeness.
This network has 43 nodes and 336 edges.
Tags: Social, Offline, Weighted
In the old days you could solicit for remote sysadmin jobs by including the pitch in your distro's README
Ansible playbook done "enough", everything is now matching. Evidently I did forget to transition one server over to Debian, but got that sorted and now everything is running pretty well.
#ansible #homelab
“Bees can learn a surprising amount of information from observing peers, including which flowers to visit, but also how to solve complex object-manipulation tasks. Accordingly, many complex social behaviors are much more driven by individual problem solving than by a diffuse swarm intelligence, as was traditionally thought.”
- Lars Chittka, ‘The Mind of a Bee’
Someone built an indexed & vectorized index & a conversational AI for the Epstein Files.
Never mind Qs like "How many times was Trump mentioned?" Try asking it complex tasks & questions that require intelligent reasoning like, "Is there any evidence..."
https://epstein.trynia.ai/
TIL that #Immich hard-codes all its paths into its postgresql database. What a nightmare for migrations. None of the tasks in the UI helped. Tried replacing it in the db, no chance. Had to resort to bind mounting shenanigans.
Word and Excel vs LLMs.
Secretaries became executive assistants, their role evolved to higher-level coordination, communication, and decision support. Accountants gained the ability to do far more analysis, strategic planning, and advisory work. The tools eliminated tedious manual tasks, but the roles themselves weren't eliminated. They were elevated.
The same pattern applies to programmers. LLMs can handle boilerplate, generate first drafts, automate simple tasks.
It's funny how "AI" tools are simulteanously marketed as "agents" that can run fully in the background and do stuff but whenever they do something bad it's the user at fault for not supervising the software that doesn't work.
Even when it’s directly used and the user has the chance to review everything—it’s extremely dangerous, especially at tasks it is doing fine like 95% of the time and/or when the bad things are only subtly wrong.
Imagine other tools being like this, like a steering wheel that turns the car 95 out of a 100 times. 2% of the time it steers into the other direction. 3% of the time it steers 5x as much as normally.
NeuroSketch: An Effective Framework for Neural Decoding via Systematic Architectural Optimization
Gaorui Zhang, Zhizhang Yuan, Jialan Yang, Junru Chen, Li Meng, Yang Yang
https://arxiv.org/abs/2512.09524 https://arxiv.org/pdf/2512.09524 https://arxiv.org/html/2512.09524
arXiv:2512.09524v1 Announce Type: new
Abstract: Neural decoding, a critical component of Brain-Computer Interface (BCI), has recently attracted increasing research interest. Previous research has focused on leveraging signal processing and deep learning methods to enhance neural decoding performance. However, the in-depth exploration of model architectures remains underexplored, despite its proven effectiveness in other tasks such as energy forecasting and image classification. In this study, we propose NeuroSketch, an effective framework for neural decoding via systematic architecture optimization. Starting with the basic architecture study, we find that CNN-2D outperforms other architectures in neural decoding tasks and explore its effectiveness from temporal and spatial perspectives. Building on this, we optimize the architecture from macro- to micro-level, achieving improvements in performance at each step. The exploration process and model validations take over 5,000 experiments spanning three distinct modalities (visual, auditory, and speech), three types of brain signals (EEG, SEEG, and ECoG), and eight diverse decoding tasks. Experimental results indicate that NeuroSketch achieves state-of-the-art (SOTA) performance across all evaluated datasets, positioning it as a powerful tool for neural decoding. Our code and scripts are available at https://github.com/Galaxy-Dawn/NeuroSketch.
toXiv_bot_toot
@… Thanks for the feedback!
Great idea to use Jinja! I’ve considered using a macro processor (e.g., m4) for similar tasks, but who wants to write m4 macros!? A template engine is a much better idea.
@… Thanks for the feedback!
Great idea to use Jinja! I’ve considered using a macro processor (e.g., m4) for similar tasks, but who wants to write m4 macros!? A template engine is a much better idea.
Estimation of Confidence Bounds in Binary Classification using Wilson Score Kernel Density Estimation
Thorbj{\o}rn Mosekj{\ae}r Iversen, Zebin Duan, Frederik Hagelskj{\ae}r
https://arxiv.org/abs/2602.20947 https://arxiv.org/pdf/2602.20947 https://arxiv.org/html/2602.20947
arXiv:2602.20947v1 Announce Type: new
Abstract: The performance and ease of use of deep learning-based binary classifiers have improved significantly in recent years. This has opened up the potential for automating critical inspection tasks, which have traditionally only been trusted to be done manually. However, the application of binary classifiers in critical operations depends on the estimation of reliable confidence bounds such that system performance can be ensured up to a given statistical significance. We present Wilson Score Kernel Density Classification, which is a novel kernel-based method for estimating confidence bounds in binary classification. The core of our method is the Wilson Score Kernel Density Estimator, which is a function estimator for estimating confidence bounds in Binomial experiments with conditionally varying success probabilities. Our method is evaluated in the context of selective classification on four different datasets, illustrating its use as a classification head of any feature extractor, including vision foundation models. Our proposed method shows similar performance to Gaussian Process Classification, but at a lower computational complexity.
toXiv_bot_toot
Anthropic launches Cowork for Claude, built on Claude Code to automate complex tasks with minimal prompting, as a research preview for Claude Max subscribers (Webb Wright/ZDNET)
https://www.zdnet.com/article/anthropic-cowork-for-claude-complex-action…
Probing Dec-POMDP Reasoning in Cooperative MARL
Kale-ab Tessera, Leonard Hinckeldey, Riccardo Zamboni, David Abel, Amos Storkey
https://arxiv.org/abs/2602.20804 https://arxiv.org/pdf/2602.20804 https://arxiv.org/html/2602.20804
arXiv:2602.20804v1 Announce Type: new
Abstract: Cooperative multi-agent reinforcement learning (MARL) is typically framed as a decentralised partially observable Markov decision process (Dec-POMDP), a setting whose hardness stems from two key challenges: partial observability and decentralised coordination. Genuinely solving such tasks requires Dec-POMDP reasoning, where agents use history to infer hidden states and coordinate based on local information. Yet it remains unclear whether popular benchmarks actually demand this reasoning or permit success via simpler strategies. We introduce a diagnostic suite combining statistically grounded performance comparisons and information-theoretic probes to audit the behavioural complexity of baseline policies (IPPO and MAPPO) across 37 scenarios spanning MPE, SMAX, Overcooked, Hanabi, and MaBrax. Our diagnostics reveal that success on these benchmarks rarely requires genuine Dec-POMDP reasoning. Reactive policies match the performance of memory-based agents in over half the scenarios, and emergent coordination frequently relies on brittle, synchronous action coupling rather than robust temporal influence. These findings suggest that some widely used benchmarks may not adequately test core Dec-POMDP assumptions under current training paradigms, potentially leading to over-optimistic assessments of progress. We release our diagnostic tooling to support more rigorous environment design and evaluation in cooperative MARL.
toXiv_bot_toot
Easy Adaptation: An Efficient Task-Specific Knowledge Injection Method for Large Models in Resource-Constrained Environments
Dong Chen, Zhengqing Hu, Shixing Zhao, Yibo Guo
https://arxiv.org/abs/2512.17771 https://arxiv.org/pdf/2512.17771 https://arxiv.org/html/2512.17771
arXiv:2512.17771v1 Announce Type: new
Abstract: While the enormous parameter scale endows Large Models (LMs) with unparalleled performance, it also limits their adaptability across specific tasks. Parameter-Efficient Fine-Tuning (PEFT) has emerged as a critical approach for effectively adapting LMs to a diverse range of downstream tasks. However, existing PEFT methods face two primary challenges: (1) High resource cost. Although PEFT methods significantly reduce resource demands compared to full fine-tuning, it still requires substantial time and memory, making it impractical in resource-constrained environments. (2) Parameter dependency. PEFT methods heavily rely on updating a subset of parameters associated with LMs to incorporate task-specific knowledge. Yet, due to increasing competition in the LMs landscape, many companies have adopted closed-source policies for their leading models, offering access only via Application Programming Interface (APIs). Whereas, the expense is often cost-prohibitive and difficult to sustain, as the fine-tuning process of LMs is extremely slow. Even if small models perform far worse than LMs in general, they can achieve superior results on particular distributions while requiring only minimal resources. Motivated by this insight, we propose Easy Adaptation (EA), which designs Specific Small Models (SSMs) to complement the underfitted data distribution for LMs. Extensive experiments show that EA matches the performance of PEFT on diverse tasks without accessing LM parameters, and requires only minimal resources.
toXiv_bot_toot
windsurfers: Windsurfers network (1986)
A network of interpersonal contacts among windsurfers in southern California during the Fall of 1986. The edge weights indicate the perception of social affiliations majored by the tasks in which each individual was asked to sort cards with other surfer’s name in the order of closeness.
This network has 43 nodes and 336 edges.
Tags: Social, Offline, Weighted
Legal AI startup Ivo, which aims to reduce hallucinations by breaking legal reviews into 400 tasks, raised a $55M Series B, a source says at a $355M valuation (Aditya Soni/Reuters)
https://www.reuters.com/technology/legal-ai-startup…
Started the official rewrite of the Sisyphus client in #golang, working on getting the Ffmpeg command-line tasks parsed and validated against the schema. This should make things easier to distribute with respect to the client as I can just distribute static binaries.
#programming
Exploring the Impact of Parameter Update Magnitude on Forgetting and Generalization of Continual Learning
JinLi He, Liang Bai, Xian Yang
https://arxiv.org/abs/2602.20796 https://arxiv.org/pdf/2602.20796 https://arxiv.org/html/2602.20796
arXiv:2602.20796v1 Announce Type: new
Abstract: The magnitude of parameter updates are considered a key factor in continual learning. However, most existing studies focus on designing diverse update strategies, while a theoretical understanding of the underlying mechanisms remains limited. Therefore, we characterize model's forgetting from the perspective of parameter update magnitude and formalize it as knowledge degradation induced by task-specific drift in the parameter space, which has not been fully captured in previous studies due to their assumption of a unified parameter space. By deriving the optimal parameter update magnitude that minimizes forgetting, we unify two representative update paradigms, frozen training and initialized training, within an optimization framework for constrained parameter updates. Our theoretical results further reveals that sequence tasks with small parameter distances exhibit better generalization and less forgetting under frozen training rather than initialized training. These theoretical insights inspire a novel hybrid parameter update strategy that adaptively adjusts update magnitude based on gradient directions. Experiments on deep neural networks demonstrate that this hybrid approach outperforms standard training strategies, providing new theoretical perspectives and practical inspiration for designing efficient and scalable continual learning algorithms.
toXiv_bot_toot
AI robotics startup Physical Intelligence claims vision-language-action models learn to align human videos and robot data as pre-training is scaled up (Physical Intelligence)
https://www.physicalintelligence.company/research/human_to_robot
Understanding the Role of Rehearsal Scale in Continual Learning under Varying Model Capacities
JinLi He, Liang Bai, Xian Yang
https://arxiv.org/abs/2602.20791 https://arxiv.org/pdf/2602.20791 https://arxiv.org/html/2602.20791
arXiv:2602.20791v1 Announce Type: new
Abstract: Rehearsal is one of the key techniques for mitigating catastrophic forgetting and has been widely adopted in continual learning algorithms due to its simplicity and practicality. However, the theoretical understanding of how rehearsal scale influences learning dynamics remains limited. To address this gap, we formulate rehearsal-based continual learning as a multidimensional effectiveness-driven iterative optimization problem, providing a unified characterization across diverse performance metrics. Within this framework, we derive a closed-form analysis of adaptability, memorability, and generalization from the perspective of rehearsal scale. Our results uncover several intriguing and counterintuitive findings. First, rehearsal can impair model's adaptability, in sharp contrast to its traditionally recognized benefits. Second, increasing the rehearsal scale does not necessarily improve memory retention. When tasks are similar and noise levels are low, the memory error exhibits a diminishing lower bound. Finally, we validate these insights through numerical simulations and extended analyses on deep neural networks across multiple real-world datasets, revealing statistical patterns of rehearsal mechanisms in continual learning.
toXiv_bot_toot
WeirNet: A Large-Scale 3D CFD Benchmark for Geometric Surrogate Modeling of Piano Key Weirs
Lisa L\"uddecke, Michael Hohmann, Sebastian Eilermann, Jan Tillmann-Mumm, Pezhman Pourabdollah, Mario Oertel, Oliver Niggemann
https://arxiv.org/abs/2602.20714 https://arxiv.org/pdf/2602.20714 https://arxiv.org/html/2602.20714
arXiv:2602.20714v1 Announce Type: new
Abstract: Reliable prediction of hydraulic performance is challenging for Piano Key Weir (PKW) design because discharge capacity depends on three-dimensional geometry and operating conditions. Surrogate models can accelerate hydraulic-structure design, but progress is limited by scarce large, well-documented datasets that jointly capture geometric variation, operating conditions, and functional performance. This study presents WeirNet, a large 3D CFD benchmark dataset for geometric surrogate modeling of PKWs. WeirNet contains 3,794 parametric, feasibility-constrained rectangular and trapezoidal PKW geometries, each scheduled at 19 discharge conditions using a consistent free-surface OpenFOAM workflow, resulting in 71,387 completed simulations that form the benchmark and with complete discharge coefficient labels. The dataset is released as multiple modalities compact parametric descriptors, watertight surface meshes and high-resolution point clouds together with standardized tasks and in-distribution and out-of-distribution splits. Representative surrogate families are benchmarked for discharge coefficient prediction. Tree-based regressors on parametric descriptors achieve the best overall accuracy, while point- and mesh-based models remain competitive and offer parameterization-agnostic inference. All surrogates evaluate in milliseconds per sample, providing orders-of-magnitude speedups over CFD runtimes. Out-of-distribution results identify geometry shift as the dominant failure mode compared to unseen discharge values, and data-efficiency experiments show diminishing returns beyond roughly 60% of the training data. By publicly releasing the dataset together with simulation setups and evaluation pipelines, WeirNet establishes a reproducible framework for data-driven hydraulic modeling and enables faster exploration of PKW designs during the early stages of hydraulic planning.
toXiv_bot_toot
windsurfers: Windsurfers network (1986)
A network of interpersonal contacts among windsurfers in southern California during the Fall of 1986. The edge weights indicate the perception of social affiliations majored by the tasks in which each individual was asked to sort cards with other surfer’s name in the order of closeness.
This network has 43 nodes and 336 edges.
Tags: Social, Offline, Weighted
Google rolls out Gemini 3.1 Pro, which it says is "a step forward in core reasoning", for AI Pro and Ultra subscribers; the .1 increment is a first for Google (Abner Li/9to5Google)
https://9to5google.com/2026/02/19/google-announces-gem…
Crosslisted article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[2/3]:
- Diffusion Modulation via Environment Mechanism Modeling for Planning
Hanping Zhang, Yuhong Guo
https://arxiv.org/abs/2602.20422 https://mastoxiv.page/@arXiv_csAI_bot/116130110576555049
- Heterogeneity-Aware Client Selection Methodology For Efficient Federated Learning
Nihal Balivada, Shrey Gupta, Shashank Shreedhar Bhatt, Suyash Gupta
https://arxiv.org/abs/2602.20450 https://mastoxiv.page/@arXiv_csDC_bot/116130191233002036
- Prior-Agnostic Incentive-Compatible Exploration
Ramya Ramalingam, Osbert Bastani, Aaron Roth
https://arxiv.org/abs/2602.20465 https://mastoxiv.page/@arXiv_csGT_bot/116130245628406144
- PhyGHT: Physics-Guided HyperGraph Transformer for Signal Purification at the HL-LHC
Mohammed Rakib, Luke Vaughan, Shivang Patel, Flera Rizatdinova, Alexander Khanov, Atriya Sen
https://arxiv.org/abs/2602.20475 https://mastoxiv.page/@arXiv_hepex_bot/116130242350426528
- ActionEngine: From Reactive to Programmatic GUI Agents via State Machine Memory
Zhong, Faisal, Fran\c{c}a, Leesatapornwongsa, Szekeres, Rong, Nath
https://arxiv.org/abs/2602.20502 https://mastoxiv.page/@arXiv_csAI_bot/116130180718734838
- Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination
Rakshit Trivedi, Kartik Sharma, David C Parkes
https://arxiv.org/abs/2602.20517 https://mastoxiv.page/@arXiv_csAI_bot/116130223344095649
- Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning
Lovelace, Belardi, Zalouk, Polavaram, Kundurthy, Weinberger
https://arxiv.org/abs/2602.20528 https://mastoxiv.page/@arXiv_csCL_bot/116130628998822849
- Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,\lambda}$ T...
Yanming Lai, Defeng Sun
https://arxiv.org/abs/2602.20555 https://mastoxiv.page/@arXiv_statML_bot/116130512372759166
- Personal Information Parroting in Language Models
Nishant Subramani, Kshitish Ghate, Mona Diab
https://arxiv.org/abs/2602.20580 https://mastoxiv.page/@arXiv_csCL_bot/116130630309564204
- Characterizing Online and Private Learnability under Distributional Constraints via Generalized S...
Mo\"ise Blanchard, Abhishek Shetty, Alexander Rakhlin
https://arxiv.org/abs/2602.20585 https://mastoxiv.page/@arXiv_statML_bot/116130525452248337
- Amortized Bayesian inference for actigraph time sheet data from mobile devices
Daniel Zhou, Sudipto Banerjee
https://arxiv.org/abs/2602.20611 https://mastoxiv.page/@arXiv_statML_bot/116130543144314661
- Knowing the Unknown: Interpretable Open-World Object Detection via Concept Decomposition Model
Xueqiang Lv, Shizhou Zhang, Yinghui Xing, Di Xu, Peng Wang, Yanning Zhang
https://arxiv.org/abs/2602.20616 https://mastoxiv.page/@arXiv_csCV_bot/116130795466851481
- On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes
Boao Kong, Hengrui Zhang, Kun Yuan
https://arxiv.org/abs/2602.20646 https://mastoxiv.page/@arXiv_mathOC_bot/116130476952419594
- DANCE: Doubly Adaptive Neighborhood Conformal Estimation
Feng, Reich, Beaglehole, Luo, Park, Yoo, Huang, Mao, Boz, Kim
https://arxiv.org/abs/2602.20652 https://mastoxiv.page/@arXiv_statML_bot/116130551664144143
- Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal an...
Mohammad Sadra Rajabi, Aanuoluwapo Ojelade, Sunwook Kim, Maury A. Nussbaum
https://arxiv.org/abs/2602.20658 https://mastoxiv.page/@arXiv_csCV_bot/116130809228818544
- F10.7 Index Prediction: A Multiscale Decomposition Strategy with Wavelet Transform for Performanc...
Xuran Ma, et al.
https://arxiv.org/abs/2602.20712 https://mastoxiv.page/@arXiv_astrophIM_bot/116130530693731576
- Communication-Inspired Tokenization for Structured Image Representations
Davtyan, Sahin, Haghighi, Stapf, Acuaviva, Alahi, Favaro
https://arxiv.org/abs/2602.20731 https://mastoxiv.page/@arXiv_csCV_bot/116130824303022936
- SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing
Yifei Xu, et al.
https://arxiv.org/abs/2602.20751 https://mastoxiv.page/@arXiv_csCL_bot/116130739757479992
- Assessing the Impact of Speaker Identity in Speech Spoofing Detection
Anh-Tuan Dao, Driss Matrouf, Nicholas Evans
https://arxiv.org/abs/2602.20805 https://mastoxiv.page/@arXiv_csSD_bot/116130218074059060
- Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation
Sayantan Dasgupta, Trevor Cohn, Timothy Baldwin
https://arxiv.org/abs/2602.20816 https://mastoxiv.page/@arXiv_csCL_bot/116130753521420972
- DRESS: A Continuous Framework for Structural Graph Refinement
Eduar Castrillo Velilla
https://arxiv.org/abs/2602.20833 https://mastoxiv.page/@arXiv_csDS_bot/116130545112457981
toXiv_bot_toot
Anthropic launches Agent Skills, which let AI assistants perform specialized tasks using modular instructions, and says Microsoft, Cursor, and others use them (Michael Nuñez/VentureBeat)
https://venturebeat.com/ai/anthropic-launches-enterprise-age…
Can You Hear Me Now? A Benchmark for Long-Range Graph Propagation
Luca Miglior, Matteo Tolloso, Alessio Gravina, Davide Bacciu
https://arxiv.org/abs/2512.17762 https://arxiv.org/pdf/2512.17762 https://arxiv.org/html/2512.17762
arXiv:2512.17762v1 Announce Type: new
Abstract: Effectively capturing long-range interactions remains a fundamental yet unresolved challenge in graph neural network (GNN) research, critical for applications across diverse fields of science. To systematically address this, we introduce ECHO (Evaluating Communication over long HOps), a novel benchmark specifically designed to rigorously assess the capabilities of GNNs in handling very long-range graph propagation. ECHO includes three synthetic graph tasks, namely single-source shortest paths, node eccentricity, and graph diameter, each constructed over diverse and structurally challenging topologies intentionally designed to introduce significant information bottlenecks. ECHO also includes two real-world datasets, ECHO-Charge and ECHO-Energy, which define chemically grounded benchmarks for predicting atomic partial charges and molecular total energies, respectively, with reference computations obtained at the density functional theory (DFT) level. Both tasks inherently depend on capturing complex long-range molecular interactions. Our extensive benchmarking of popular GNN architectures reveals clear performance gaps, emphasizing the difficulty of true long-range propagation and highlighting design choices capable of overcoming inherent limitations. ECHO thereby sets a new standard for evaluating long-range information propagation, also providing a compelling example for its need in AI for science.
toXiv_bot_toot
UK AI Security Institute report: AI models are rapidly improving at potentially dangerous biological and chemical tasks, and show fast jumps in self-replication (Shakeel Hashim/Transformer)
https://www.transformernews.ai/p/aisi-ai-s
windsurfers: Windsurfers network (1986)
A network of interpersonal contacts among windsurfers in southern California during the Fall of 1986. The edge weights indicate the perception of social affiliations majored by the tasks in which each individual was asked to sort cards with other surfer’s name in the order of closeness.
This network has 43 nodes and 336 edges.
Tags: Social, Offline, Weighted
windsurfers: Windsurfers network (1986)
A network of interpersonal contacts among windsurfers in southern California during the Fall of 1986. The edge weights indicate the perception of social affiliations majored by the tasks in which each individual was asked to sort cards with other surfer’s name in the order of closeness.
This network has 43 nodes and 336 edges.
Tags: Social, Offline, Weighted
Alibaba debuts Qwen 3.5, adding "visual agentic capabilities" to independently execute tasks, and says it is 60% cheaper to use and 8x better at large workloads (Eduardo Baptista/Reuters)
https://www.reuters.com/world/china/alibaba-un…
Weighted Stochastic Differential Equation to Implement Wasserstein-Fisher-Rao Gradient Flow
Herlock Rahimi
https://arxiv.org/abs/2512.17878 https://arxiv.org/pdf/2512.17878 https://arxiv.org/html/2512.17878
arXiv:2512.17878v1 Announce Type: new
Abstract: Score-based diffusion models currently constitute the state of the art in continuous generative modeling. These methods are typically formulated via overdamped or underdamped Ornstein--Uhlenbeck-type stochastic differential equations, in which sampling is driven by a combination of deterministic drift and Brownian diffusion, resulting in continuous particle trajectories in the ambient space. While such dynamics enjoy exponential convergence guarantees for strongly log-concave target distributions, it is well known that their mixing rates deteriorate exponentially in the presence of nonconvex or multimodal landscapes, such as double-well potentials. Since many practical generative modeling tasks involve highly non-log-concave target distributions, considerable recent effort has been devoted to developing sampling schemes that improve exploration beyond classical diffusion dynamics.
A promising line of work leverages tools from information geometry to augment diffusion-based samplers with controlled mass reweighting mechanisms. This perspective leads naturally to Wasserstein--Fisher--Rao (WFR) geometries, which couple transport in the sample space with vertical (reaction) dynamics on the space of probability measures. In this work, we formulate such reweighting mechanisms through the introduction of explicit correction terms and show how they can be implemented via weighted stochastic differential equations using the Feynman--Kac representation. Our study provides a preliminary but rigorous investigation of WFR-based sampling dynamics, and aims to clarify their geometric and operator-theoretic structure as a foundation for future theoretical and algorithmic developments.
toXiv_bot_toot
Hands-on with Google's Auto Browse for Chrome: its ability to perform multistep tasks is noticeably better than similar tools but struggles with complex tasks (Reece Rogers/Wired)
https://www.wired.com/story/google-chrome-auto-browse-hands-on/
You Only Train Once: Differentiable Subset Selection for Omics Data
Daphn\'e Chopard, Jorge da Silva Gon\c{c}alves, Irene Cannistraci, Thomas M. Sutter, Julia E. Vogt
https://arxiv.org/abs/2512.17678 https://arxiv.org/pdf/2512.17678 https://arxiv.org/html/2512.17678
arXiv:2512.17678v1 Announce Type: new
Abstract: Selecting compact and informative gene subsets from single-cell transcriptomic data is essential for biomarker discovery, improving interpretability, and cost-effective profiling. However, most existing feature selection approaches either operate as multi-stage pipelines or rely on post hoc feature attribution, making selection and prediction weakly coupled. In this work, we present YOTO (you only train once), an end-to-end framework that jointly identifies discrete gene subsets and performs prediction within a single differentiable architecture. In our model, the prediction task directly guides which genes are selected, while the learned subsets, in turn, shape the predictive representation. This closed feedback loop enables the model to iteratively refine both what it selects and how it predicts during training. Unlike existing approaches, YOTO enforces sparsity so that only the selected genes contribute to inference, eliminating the need to train additional downstream classifiers. Through a multi-task learning design, the model learns shared representations across related objectives, allowing partially labeled datasets to inform one another, and discovering gene subsets that generalize across tasks without additional training steps. We evaluate YOTO on two representative single-cell RNA-seq datasets, showing that it consistently outperforms state-of-the-art baselines. These results demonstrate that sparse, end-to-end, multi-task gene subset selection improves predictive performance and yields compact and meaningful gene subsets, advancing biomarker discovery and single-cell analysis.
toXiv_bot_toot
OpenAI says GPT‑5.2 Thinking beats or ties industry professionals on 70.9% of GDPval knowledge work tasks, delivering outputs at >11x the speed and <1% the cost (OpenAI)
https://openai.com/index/introducing-gpt-5-2
Crosslisted article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[2/3]:
- Sharp Structure-Agnostic Lower Bounds for General Functional Estimation
Jikai Jin, Vasilis Syrgkanis
https://arxiv.org/abs/2512.17341 https://mastoxiv.page/@arXiv_statML_bot/115762312049963700
- Timely Information Updating for Mobile Devices Without and With ML Advice
Yu-Pin Hsu, Yi-Hsuan Tseng
https://arxiv.org/abs/2512.17381 https://mastoxiv.page/@arXiv_csNI_bot/115762180316858485
- SWE-Bench : A Framework for the Scalable Generation of Software Engineering Benchmarks from Open...
Wang, Ramalho, Celestino, Pham, Liu, Sinha, Portillo, Osunwa, Maduekwe
https://arxiv.org/abs/2512.17419 https://mastoxiv.page/@arXiv_csSE_bot/115762487015279852
- Perfect reconstruction of sparse signals using nonconvexity control and one-step RSB message passing
Xiaosi Gu, Ayaka Sakata, Tomoyuki Obuchi
https://arxiv.org/abs/2512.17426 https://mastoxiv.page/@arXiv_statML_bot/115762346108219997
- MULTIAQUA: A multimodal maritime dataset and robust training strategies for multimodal semantic s...
Jon Muhovi\v{c}, Janez Per\v{s}
https://arxiv.org/abs/2512.17450 https://mastoxiv.page/@arXiv_csCV_bot/115762717053353674
- When Data Quality Issues Collide: A Large-Scale Empirical Study of Co-Occurring Data Quality Issu...
Emmanuel Charleson Dapaah, Jens Grabowski
https://arxiv.org/abs/2512.17460 https://mastoxiv.page/@arXiv_csSE_bot/115762500123147574
- Behavioural Effects of Agentic Messaging: A Case Study on a Financial Service Application
Olivier Jeunen, Schaun Wheeler
https://arxiv.org/abs/2512.17462 https://mastoxiv.page/@arXiv_csIR_bot/115762430673347625
- Linear Attention for Joint Power Optimization and User-Centric Clustering in Cell-Free Networks
Irched Chafaa, Giacomo Bacci, Luca Sanguinetti
https://arxiv.org/abs/2512.17466 https://mastoxiv.page/@arXiv_eessSY_bot/115762336277179643
- Translating the Rashomon Effect to Sequential Decision-Making Tasks
Dennis Gross, J{\o}rn Eirik Betten, Helge Spieker
https://arxiv.org/abs/2512.17470 https://mastoxiv.page/@arXiv_csAI_bot/115762556506696539
- Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions
Atharva Awari, Nicolas Gillis, Arnaud Vandaele
https://arxiv.org/abs/2512.17473 https://mastoxiv.page/@arXiv_eessSP_bot/115762580078964235
- TwinSegNet: A Digital Twin-Enabled Federated Learning Framework for Brain Tumor Analysis
Almustapha A. Wakili, Adamu Hussaini, Abubakar A. Musa, Woosub Jung, Wei Yu
https://arxiv.org/abs/2512.17488 https://mastoxiv.page/@arXiv_csCV_bot/115762726884307901
- Resource-efficient medical image classification for edge devices
Mahsa Lavaei, Zahra Abadi, Salar Beigzad, Alireza Maleki
https://arxiv.org/abs/2512.17515 https://mastoxiv.page/@arXiv_eessIV_bot/115762459510336799
- PathBench-MIL: A Comprehensive AutoML and Benchmarking Framework for Multiple Instance Learning i...
Brussee, Valkema, Weijer, Doeleman, Schrader, Kers
https://arxiv.org/abs/2512.17517 https://mastoxiv.page/@arXiv_csCV_bot/115762741957639051
- HydroGym: A Reinforcement Learning Platform for Fluid Dynamics
Christian Lagemann, et al.
https://arxiv.org/abs/2512.17534 https://mastoxiv.page/@arXiv_physicsfludyn_bot/115762391350754768
- When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Sys...
Chondhekar, Murukuri, Vasani, Goyal, Badami, Rana, SN, Pandia, Katiyar, Jagadeesh, Gulati
https://arxiv.org/abs/2512.17562 https://mastoxiv.page/@arXiv_csSD_bot/115762423443170715
- Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing
Lingxiao Zhao, Haoran Zhou, Yuezhi Che, Dazhao Cheng
https://arxiv.org/abs/2512.17574 https://mastoxiv.page/@arXiv_csDC_bot/115762425409322293
- SkinGenBench: Generative Model and Preprocessing Effects for Synthetic Dermoscopic Augmentation i...
N. A. Adarsh Pritam, Jeba Shiney O, Sanyam Jain
https://arxiv.org/abs/2512.17585 https://mastoxiv.page/@arXiv_eessIV_bot/115762479150695610
- MAD-OOD: A Deep Learning Cluster-Driven Framework for an Out-of-Distribution Malware Detection an...
Tosin Ige, Christopher Kiekintveld, Aritran Piplai, Asif Rahman, Olukunle Kolade, Sasidhar Kunapuli
https://arxiv.org/abs/2512.17594 https://mastoxiv.page/@arXiv_csCR_bot/115762509298207765
- Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion De...
Menna Elgabry, Ali Hamdi
https://arxiv.org/abs/2512.17630 https://mastoxiv.page/@arXiv_csCL_bot/115762575512981257
- Generative Multi-Objective Bayesian Optimization with Scalable Batch Evaluations for Sample-Effic...
Madhav R. Muthyala, Farshud Sorourifar, Tianhong Tan, You Peng, Joel A. Paulson
https://arxiv.org/abs/2512.17659 https://mastoxiv.page/@arXiv_statML_bot/115762554519447500
toXiv_bot_toot
Z.ai launches GLM-5, its flagship open-weight model, saying it has best-in-class performance among open-source models in reasoning, coding, and agentic tasks (Z.ai)
https://z.ai/blog/glm-5
Alibaba's DAMO Academy releases RynnBrain, an open-source foundation model to help robots perform real-world tasks like navigating rooms, trained on Qwen3-VL (Saritha Rai/Bloomberg)
https://www.bloomberg.com/news/articles/2026…
Cloud computing provider Nebius agrees to buy Tavily, which helps AI agents search for up-to-date information for tasks like coding, a source says for $275M (Dina Bass/Bloomberg)
https://www.bloomberg.com/news/articles/20…
OpenAI is rolling out a HIPAA-compliant version of ChatGPT for clinicians to assist with medical reasoning and administrative tasks, at Cedars-Sinai and others (Shirin Ghaffary/Bloomberg)
https://www.bloomberg.com/news/newsletters
An OpenAI survey of 9,000 workers at 100 companies: it saves workers ~40 to 60 minutes per day on average for professional tasks; OpenAI has 1M business clients (Shirin Ghaffary/Bloomberg)
https://www.bloomberg.com/news/articles/20
OpenAI launches GPT-5.3-Codex, which it says runs 25% faster, enabling longer-running tasks, and "is our first model that was instrumental in creating itself" (David Gewirtz/ZDNET)
https://www.zdnet.com/article/openai-gpt-5-3-codex-faster-goes-beyond-c…
An analysis of 100T tokens from the past year shows reasoning models now represent over half of all usage, open-weight model use has grown steadily, and more (OpenRouter)
https://openrouter.ai/state-of-ai
Pine, which offers an AI agent to automate digital chores, like making calls, handling emails, and operating software to complete tasks, raised a $25M Series A (FinSMEs)
https://www.finsmes.com/2025/12/pine-raises-25m-in-series-a-funding.html