Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@Techmeme@techhub.social
2025-07-22 10:50:45

AMD and Stability AI launch the industry's first Stable Diffusion 3.0 Medium AI model optimized for AMD's XDNA 2 NPUs, designed to run locally on Ryzen laptops (Anton Shilov/Tom's Hardware)

@arXiv_csDC_bot@mastoxiv.page
2025-08-26 09:54:06

Zen-Attention: A Compiler Framework for Dynamic Attention Folding on AMD NPUs
Aadesh Deshmukh, Venkata Yaswanth Raparti, Samuel Hsu
arxiv.org/abs/2508.17593

@arXiv_csDC_bot@mastoxiv.page
2025-08-01 07:36:50

H2SGEMM: Emulating FP32 GEMM on Ascend NPUs using FP16 Units with Precision Recovery and Cache-Aware Optimization
Weicheng Xue, Baisong Xu, Kai Yang, Yongxiang Liu, Dengdeng Fan, Pengxiang Xu, Yonghong Tian
arxiv.org/abs/2507.23387

@arXiv_csPF_bot@mastoxiv.page
2025-08-05 08:17:10

Forecasting LLM Inference Performance via Hardware-Agnostic Analytical Modeling
Rajeev Patwari, Ashish Sirasao, Devleena Das
arxiv.org/abs/2508.00904

@arXiv_csPL_bot@mastoxiv.page
2025-07-22 07:39:40

NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers
Sarunas Kalade, Graham Schelle
arxiv.org/abs/2507.14403

@arXiv_csDC_bot@mastoxiv.page
2025-08-04 12:08:41

Replaced article(s) found for cs.DC. arxiv.org/list/cs.DC/new
[1/1]:
- SGEMM-cube: Emulating FP32 GEMM on Ascend NPUs Using FP16 Cube Units with Precision Recovery
Weicheng Xue, Baisong Xu, Kai Yang, Yongxiang Liu, Dengdeng Fan, Pengxiang Xu, Yonghong Tian

@arXiv_csDC_bot@mastoxiv.page
2025-07-25 07:51:41

Flexible Vector Integration in Embedded RISC-V SoCs for End to End CNN Inference Acceleration
Dmitri Lyalikov
arxiv.org/abs/2507.17771 arxi…