Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.

Despite its rise as an economic superpower,
China remains reliant on a global financial system anchored by the dollar.
Turning the renminbi into a globally accepted currency would let Beijing conduct more trade on its own terms and blunt a longstanding source of American leverage.
That push has gained momentum from the wars in Ukraine and Iran,
as sanctions drive American adversaries toward the renminbi to bypass the Western financial system.
In effect, China’…

@arXiv_physicschemph_bot@mastoxiv.page
2026-03-27 08:19:37

Autotuning T-PaiNN: Enabling Data-Efficient GNN Interatomic Potential Development via Classical-to-Quantum Transfer Learning
Vivienne Pelletier, Vedant Bhat, Daniel J. Rivera, Steven A. Wilson, Christopher L. Muhich
arxiv.org/abs/2603.24752 arxiv.org/pdf/2603.24752 arxiv.org/html/2603.24752
arXiv:2603.24752v1 Announce Type: new
Abstract: Machine-learned interatomic potentials (MLIPs), particularly graph neural network (GNN)-based models, offer a promising route to achieving near-density functional theory (DFT) accuracy at significantly reduced computational cost. However, their practical deployment is often limited by the large volumes of expensive quantum mechanical training data required. In this work, we introduce a transfer learning framework, Transfer-PaiNN (T-PaiNN), that substantially improves the data efficiency of GNN-MLIPs by leveraging inexpensive classical force field data. The approach consists of pretraining a PaiNN MLIP architecture on large-scale datasets generated from classical molecular simulations, followed by fine-tuning (dubbed autotuning) using a comparatively small DFT dataset. We demonstrate the effectiveness of autotuning T-PaiNN on both gas-phase molecular systems (QM9 dataset) and condensed-phase liquid water. Across all cases, T-PaiNN significantly outperforms models trained solely on DFT data, achieving order-of-magnitude reductions in mean absolute error while accelerating training convergence. For example, using the QM9 data set, error reductions of up to 25 times are observed in low-data regimes, while liquid water simulations show improved predictions of energies, forces, and experimentally relevant properties such as density and diffusion. These gains arise from the model's ability to learn general features of the potential energy surface from extensive classical sampling, which are subsequently refined to quantum accuracy. Overall, this work establishes transfer learning from classical force fields as a practical and computationally efficient strategy for developing high-accuracy, data-efficient GNN interatomic potentials, enabling broader application of MLIPs to complex chemical systems.
toXiv_bot_toot

@arXiv_csPF_bot@mastoxiv.page
2026-03-24 07:38:32

Democratizing AI: A Comparative Study in Deep Learning Efficiency and Future Trends in Computational Processing
Lisan Al Amin, Md Ismail Hossain, Rupak Kumar Das, Mahbubul Islam, Saddam Mukta, Abdulaziz Tabbakh
arxiv.org/abs/2603.20920 arxiv.org/pdf/2603.20920 arxiv.org/html/2603.20920
arXiv:2603.20920v1 Announce Type: new
Abstract: The exponential growth in data has intensified the demand for computational power to train large-scale deep learning models. However, the rapid growth in model size and complexity raises concerns about equal and fair access to computational resources, particularly under increasing energy and infrastructure constraints. GPUs have emerged as essential for accelerating such workloads. This study benchmarks four deep learning models (Conv6, VGG16, ResNet18, CycleGAN) using TensorFlow and PyTorch on Intel Xeon CPUs and NVIDIA Tesla T4 GPUs. Our experiments demonstrate that, on average, GPU training achieves speedups ranging from 11x to 246x depending on model complexity, with lightweight models (Conv6) showing the highest acceleration (246x), mid-sized models (VGG16, ResNet18) achieving 51-116x speedups, and complex generative models (CycleGAN) reaching 11x improvements compared to CPU training. Additionally, in our PyTorch vs. TensorFlow comparison, we observed that TensorFlow's kernel-fusion optimizations reduce inference latency by approximately 15%. We also analyze GPU memory usage trends and projecting requirements through 2025 using polynomial regression. Our findings highlight that while GPUs are essential for sustaining AI's growth, democratized and shared access to GPU resources is critical for enabling research innovation across institutions with limited computational budgets.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:35:21

WeirNet: A Large-Scale 3D CFD Benchmark for Geometric Surrogate Modeling of Piano Key Weirs
Lisa L\"uddecke, Michael Hohmann, Sebastian Eilermann, Jan Tillmann-Mumm, Pezhman Pourabdollah, Mario Oertel, Oliver Niggemann
arxiv.org/abs/2602.20714 arxiv.org/pdf/2602.20714 arxiv.org/html/2602.20714
arXiv:2602.20714v1 Announce Type: new
Abstract: Reliable prediction of hydraulic performance is challenging for Piano Key Weir (PKW) design because discharge capacity depends on three-dimensional geometry and operating conditions. Surrogate models can accelerate hydraulic-structure design, but progress is limited by scarce large, well-documented datasets that jointly capture geometric variation, operating conditions, and functional performance. This study presents WeirNet, a large 3D CFD benchmark dataset for geometric surrogate modeling of PKWs. WeirNet contains 3,794 parametric, feasibility-constrained rectangular and trapezoidal PKW geometries, each scheduled at 19 discharge conditions using a consistent free-surface OpenFOAM workflow, resulting in 71,387 completed simulations that form the benchmark and with complete discharge coefficient labels. The dataset is released as multiple modalities compact parametric descriptors, watertight surface meshes and high-resolution point clouds together with standardized tasks and in-distribution and out-of-distribution splits. Representative surrogate families are benchmarked for discharge coefficient prediction. Tree-based regressors on parametric descriptors achieve the best overall accuracy, while point- and mesh-based models remain competitive and offer parameterization-agnostic inference. All surrogates evaluate in milliseconds per sample, providing orders-of-magnitude speedups over CFD runtimes. Out-of-distribution results identify geometry shift as the dominant failure mode compared to unseen discharge values, and data-efficiency experiments show diminishing returns beyond roughly 60% of the training data. By publicly releasing the dataset together with simulation setups and evaluation pipelines, WeirNet establishes a reproducible framework for data-driven hydraulic modeling and enables faster exploration of PKW designs during the early stages of hydraulic planning.
toXiv_bot_toot

@Techmeme@techhub.social
2026-04-14 22:46:52

Anthropic details using AI agents to accelerate alignment research on "weak-to-strong supervision", where a weak model supervises the training of a stronger one (Anthropic)
anthropic.com/research/automat

@cyrevolt@mastodon.social
2026-04-07 09:36:26

Free speech is the power to criticize your government without fear of prosecution. It is a basic human right.
It is not about getting away with hurting your neighbor who never caused you any harm. Dignity is a human right.
That is also what "love your neighbor" is about.
It's just a sensible thing to do.
Do not tolerate hate speech.
There is no (need for a) "freedom to hate".
Be kind to one another. 🧡

@arXiv_csDS_bot@mastoxiv.page
2026-02-04 07:39:24

ZOR filters: fast and smaller than fuse filters
Antoine Limasset
arxiv.org/abs/2602.03525 arxiv.org/pdf/2602.03525 arxiv.org/html/2602.03525
arXiv:2602.03525v1 Announce Type: new
Abstract: Probabilistic membership filters support fast approximate membership queries with a controlled false-positive probability $\varepsilon$ and are widely used across storage, analytics, networking, and bioinformatics \cite{chang2008bigtable,dayan2018optimalbloom,broder2004network,harris2020improved,marchet2023scalable,chikhi2025logan,hernandez2025reindeer2}. In the static setting, state-of-the-art designs such as XOR and fuse filters achieve low overhead and very fast queries, but their peeling-based construction succeeds only with high probability, which complicates deterministic builds \cite{graf2020xor,graf2022binary,ulrich2023taxor}.
We introduce \emph{ZOR filters}, a deterministic continuation of XOR/fuse filters that guarantees construction termination while preserving the same XOR-based query mechanism. ZOR replaces restart-on-failure with deterministic peeling that abandons a small fraction of keys, and restores false-positive-only semantics by storing the remainder in a compact auxiliary structure. In our experiments, the abandoned fraction drops below $1\%$ for moderate arity (e.g., $N\ge 5$), so the auxiliary handles a negligible fraction of keys. As a result, ZOR filters can achieve overhead within $1\%$ of the information-theoretic lower bound $\log_2(1/\varepsilon)$ while retaining fuse-like query performance; the additional cost is concentrated on negative queries due to the auxiliary check. Our current prototype builds several-fold slower than highly optimized fuse builders because it maintains explicit incidence information during deterministic peeling; closing this optimisation gap is an engineering target.
toXiv_bot_toot

@Techmeme@techhub.social
2026-02-01 06:51:00

Sources: Alibaba has delivered more than 100K units of the Zhenwu 810E, an ASIC for AI training and inference, surpassing those of its domestic rival Cambricon (Ann Cao/South China Morning Post)
scmp.com/tech/article/3341860/

@arXiv_csCL_bot@mastoxiv.page
2026-03-31 10:13:27

Adaptive Block-Scaled Data Types
Jack Cook, Hyemin S. Lee, Kathryn Le, Junxian Guo, Giovanni Traverso, Anantha P. Chandrakasan, Song Han
arxiv.org/abs/2603.28765 arxiv.org/pdf/2603.28765 arxiv.org/html/2603.28765
arXiv:2603.28765v1 Announce Type: new
Abstract: NVFP4 has grown increasingly popular as a 4-bit format for quantizing large language models due to its hardware support and its ability to retain useful information with relatively few bits per parameter. However, the format is not without limitations: recent work has shown that NVFP4 suffers from its error distribution, resulting in large amounts of quantization error on near-maximal values in each group of 16 values. In this work, we leverage this insight to design new Adaptive Block-Scaled Data Types that can adapt to the distribution of their input values. For four-bit quantization, our proposed IF4 (Int/Float 4) data type selects between FP4 and INT4 representations for each group of 16 values, which are then scaled by an E4M3 scale factor as is done with NVFP4. The selected data type is denoted using the scale factor's sign bit, which is currently unused in NVFP4, and we apply the same insight to design formats for other bit-widths, including IF3 and IF6. When used to quantize language models, we find that IF4 outperforms existing 4-bit block-scaled formats, achieving lower loss during quantized training and achieving higher accuracy on many tasks in post-training quantization. We additionally design and evaluate an IF4 Multiply-Accumulate (MAC) unit to demonstrate that IF4 can be implemented efficiently in next-generation hardware accelerators. Our code is available at github.com/mit-han-lab/fourove.
toXiv_bot_toot