Adaptive Block-Scaled Data Types
Jack Cook, Hyemin S. Lee, Kathryn Le, Junxian Guo, Giovanni Traverso, Anantha P. Chandrakasan, Song Han
https://arxiv.org/abs/2603.28765 https://arxiv.org/pdf/2603.28765 https://arxiv.org/html/2603.28765
arXiv:2603.28765v1 Announce Type: new
Abstract: NVFP4 has grown increasingly popular as a 4-bit format for quantizing large language models due to its hardware support and its ability to retain useful information with relatively few bits per parameter. However, the format is not without limitations: recent work has shown that NVFP4 suffers from its error distribution, resulting in large amounts of quantization error on near-maximal values in each group of 16 values. In this work, we leverage this insight to design new Adaptive Block-Scaled Data Types that can adapt to the distribution of their input values. For four-bit quantization, our proposed IF4 (Int/Float 4) data type selects between FP4 and INT4 representations for each group of 16 values, which are then scaled by an E4M3 scale factor as is done with NVFP4. The selected data type is denoted using the scale factor's sign bit, which is currently unused in NVFP4, and we apply the same insight to design formats for other bit-widths, including IF3 and IF6. When used to quantize language models, we find that IF4 outperforms existing 4-bit block-scaled formats, achieving lower loss during quantized training and achieving higher accuracy on many tasks in post-training quantization. We additionally design and evaluate an IF4 Multiply-Accumulate (MAC) unit to demonstrate that IF4 can be implemented efficiently in next-generation hardware accelerators. Our code is available at https://github.com/mit-han-lab/fouroversix.
toXiv_bot_toot
Despite its rise as an economic superpower,
China remains reliant on a global financial system anchored by the dollar.
Turning the renminbi into a globally accepted currency would let Beijing conduct more trade on its own terms and blunt a longstanding source of American leverage.
That push has gained momentum from the wars in Ukraine and Iran,
as sanctions drive American adversaries toward the renminbi to bypass the Western financial system.
In effect, China’…
Cities and companies are making climate-friendly eating the new normal. NYC is leading by serving more plant-based meals in schools, while the EU's deforestation law ensures high-emission foods reflect their true environmental cost.
The secret? Training the next generation of chefs and making sustainable choices delicious and accessible.
Autotuning T-PaiNN: Enabling Data-Efficient GNN Interatomic Potential Development via Classical-to-Quantum Transfer Learning
Vivienne Pelletier, Vedant Bhat, Daniel J. Rivera, Steven A. Wilson, Christopher L. Muhich
https://arxiv.org/abs/2603.24752 https://arxiv.org/pdf/2603.24752 https://arxiv.org/html/2603.24752
arXiv:2603.24752v1 Announce Type: new
Abstract: Machine-learned interatomic potentials (MLIPs), particularly graph neural network (GNN)-based models, offer a promising route to achieving near-density functional theory (DFT) accuracy at significantly reduced computational cost. However, their practical deployment is often limited by the large volumes of expensive quantum mechanical training data required. In this work, we introduce a transfer learning framework, Transfer-PaiNN (T-PaiNN), that substantially improves the data efficiency of GNN-MLIPs by leveraging inexpensive classical force field data. The approach consists of pretraining a PaiNN MLIP architecture on large-scale datasets generated from classical molecular simulations, followed by fine-tuning (dubbed autotuning) using a comparatively small DFT dataset. We demonstrate the effectiveness of autotuning T-PaiNN on both gas-phase molecular systems (QM9 dataset) and condensed-phase liquid water. Across all cases, T-PaiNN significantly outperforms models trained solely on DFT data, achieving order-of-magnitude reductions in mean absolute error while accelerating training convergence. For example, using the QM9 data set, error reductions of up to 25 times are observed in low-data regimes, while liquid water simulations show improved predictions of energies, forces, and experimentally relevant properties such as density and diffusion. These gains arise from the model's ability to learn general features of the potential energy surface from extensive classical sampling, which are subsequently refined to quantum accuracy. Overall, this work establishes transfer learning from classical force fields as a practical and computationally efficient strategy for developing high-accuracy, data-efficient GNN interatomic potentials, enabling broader application of MLIPs to complex chemical systems.
toXiv_bot_toot
Anthropic details using AI agents to accelerate alignment research on "weak-to-strong supervision", where a weak model supervises the training of a stronger one (Anthropic)
https://www.anthropic.com/research/automated-alignment-researchers
Democratizing AI: A Comparative Study in Deep Learning Efficiency and Future Trends in Computational Processing
Lisan Al Amin, Md Ismail Hossain, Rupak Kumar Das, Mahbubul Islam, Saddam Mukta, Abdulaziz Tabbakh
https://arxiv.org/abs/2603.20920 https://arxiv.org/pdf/2603.20920 https://arxiv.org/html/2603.20920
arXiv:2603.20920v1 Announce Type: new
Abstract: The exponential growth in data has intensified the demand for computational power to train large-scale deep learning models. However, the rapid growth in model size and complexity raises concerns about equal and fair access to computational resources, particularly under increasing energy and infrastructure constraints. GPUs have emerged as essential for accelerating such workloads. This study benchmarks four deep learning models (Conv6, VGG16, ResNet18, CycleGAN) using TensorFlow and PyTorch on Intel Xeon CPUs and NVIDIA Tesla T4 GPUs. Our experiments demonstrate that, on average, GPU training achieves speedups ranging from 11x to 246x depending on model complexity, with lightweight models (Conv6) showing the highest acceleration (246x), mid-sized models (VGG16, ResNet18) achieving 51-116x speedups, and complex generative models (CycleGAN) reaching 11x improvements compared to CPU training. Additionally, in our PyTorch vs. TensorFlow comparison, we observed that TensorFlow's kernel-fusion optimizations reduce inference latency by approximately 15%. We also analyze GPU memory usage trends and projecting requirements through 2025 using polynomial regression. Our findings highlight that while GPUs are essential for sustaining AI's growth, democratized and shared access to GPU resources is critical for enabling research innovation across institutions with limited computational budgets.
toXiv_bot_toot
Mapping the Turn: An Eulerian Binormal-Axis Diagnostic for Recirculating 3D Flows
John Marshall Cooper, Wen Wu
https://arxiv.org/abs/2605.18439 https://arxiv.org/pdf/2605.18439 https://arxiv.org/html/2605.18439
arXiv:2605.18439v1 Announce Type: new
Abstract: Three-dimensional (3D) recirculating flows are often interpreted qualitatively from selected streamline visualizations. In separated flows, such recirculating motion is central to the drag modulation, but the local orientation of recirculation remains difficult to quantify in a field-based form. This work introduces an Eulerian binormal-axis diagnostic that locally evaluates the orientation of streamline turning at each point in the velocity field, yielding a spatially resolved field of the recirculating direction. Motivated by the Frenet-Serret binormal direction of a curved streamline, the diagnostic uses the velocity vector and its convective acceleration to extract the local streamline-turning axis without requiring explicit streamline integration. The resulting direction is encoded with barycentric RGB weights to visualize streamwise, spanwise, and wall-normal turning axis contributions. The diagnostic is first applied to Hill's spherical vortex, which provides a controlled analytic example of 3D recirculating motion for interpreting the binormal-axis direction and the associated barycentric RGB encoding. It is then applied to the mean field of a pressure-gradient-induced 3D separation bubble. The resulting visualizations show that the diagnostic reveals orientation changes that are not apparent from streamline visualization. The proposed diagnostic therefore converts qualitative streamline impressions into a spatially resolved measure of local streamline-turning orientation, providing a quantitative complement to conventional 3D flow visualization.
toXiv_bot_toot
Free speech is the power to criticize your government without fear of prosecution. It is a basic human right.
It is not about getting away with hurting your neighbor who never caused you any harm. Dignity is a human right.
That is also what "love your neighbor" is about.
It's just a sensible thing to do.
Do not tolerate hate speech.
There is no (need for a) "freedom to hate".
Be kind to one another. 🧡