Flexible Operator Fusion for Fast Sparse Transformer with Diverse Masking on GPU
Wenhao Dai, Haodong Deng, Mengfei Rong, Xinyu Yang, Hongyu Liu, Fangxin Liu, Hailong Yang, Weifeng Liu, Qingxiao Sun
https://arxiv.org/abs/2506.06095
Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems
Shilong Tao, Zhe Feng, Haonan Sun, Zhanxing Zhu, Yunhuai Liu
https://arxiv.org/abs/2506.06021
teaching a transformer net to approximate ray-tracing
input is a sequence of triangle data (limited to 4k triangles), and a sequence of 8x8 radiance pixels representing the camera (limited to 512x512 resolution)
output is a sequence of 8x8 pixels of the rendered scene. training was ~10 days. generation is ~100ms (NVIDIA A100)
the results are pretty good. scaling will be a challenge (since transformers are quadratic in sequence length)
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Jiatao Gu, Tianrong Chen, David Berthelot, Huangjie Zheng, Yuyang Wang, Ruixiang Zhang, Laurent Dinh, Miguel Angel Bautista, Josh Susskind, Shuangfei Zhai
https://arxiv.org/abs/2506.06276
Is BERTopic Better than PLSA for Extracting Key Topics in Aviation Safety Reports?
Aziida Nanyonga, Joiner Keith, Turhan Ugur, Wild Graham
https://arxiv.org/abs/2506.06328
Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning
Yujia Huo, Jianchun Liu, Hongli Xu, Zhenguo Ma, Shilong Wang, Liusheng Huang
https://arxiv.org/abs/2506.05977
SatelliteFormula: Multi-Modal Symbolic Regression from Remote Sensing Imagery for Physics Discovery
Zhenyu Yu, Mohd. Yamani Idna Idris, Pei Wang, Yuelong Xia, Fei Ma, Rizwan Qureshi
https://arxiv.org/abs/2506.06176
Efficient Tactile Perception with Soft Electrical Impedance Tomography and Pre-trained Transformer
Huazhi Dong (Sharel), Ronald B. Liu (Sharel), Sihao Teng (Sharel), Delin Hu (Sharel), Peisan (Sharel), E, Francesco Giorgio-Serchi, Yunjie Yang
https://arxiv.org/abs/2506.02824
External Attention Transformer: A Robust AI Model for Identifying Initial Eccentricity Signatures in Binary Black Hole Events in Simulated Advanced LIGO Data
Elahe Khalouei, Cristiano G. Sabiu, Hyung Mok Lee, A. Gopakumar
https://arxiv.org/abs/2506.03634
Optimizing Software Defined Battery Systems for Transformer Protection
Sonia Martin, Obidike Nnorom Jr., Philip Levis, Ram Rajagopal
https://arxiv.org/abs/2506.03439
A Transformer-Based Neural Network for Optimal Deterministic-Allocation and Anonymous Joint Auction Design
Zhen Zhang, Luowen Liu, Wanzhi Zhang, Zitian Guo, Kun Huang, Qi Qi, Qiang Liu, Xingxing Wang
https://arxiv.org/abs/2506.02435
TRiMM: Transformer-Based Rich Motion Matching for Real-Time multi-modal Interaction in Digital Humans
Yueqian Guo, Tianzhao Li, Xin Lyu, Jiehaolin Chen, Zhaohan Wang, Sirui Xiao, Yurun Chen, Yezi He, Helin Li, Fan Zhang
https://arxiv.org/abs/2506.01077
Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution
Chang Eun Song, Priyansh Bhatnagar, Zihan Xia, Nam Sung Kim, Tajana Rosing, Mingu Kang
https://arxiv.org/abs/2506.00020
Identifying interactions across brain areas while accounting for individual-neuron dynamics with a Transformer-based variational autoencoder
Qi Xin, Robert E. Kass
https://arxiv.org/abs/2506.02263
Uncertainty-Aware Genomic Classification of Alzheimer's Disease: A Transformer-Based Ensemble Approach with Monte Carlo Dropout
Taeho Jo, Eun Hye Lee, Alzheimer's Disease Sequencing Project
https://arxiv.org/abs/2506.00662
Diffusion Transformer-based Universal Dose Denoising for Pencil Beam Scanning Proton Therapy
Yuzhen Ding, Jason Holmes, Hongying Feng, Martin Bues, Lisa A. McGee, Jean-Claude M. Rwigema, Nathan Y. Yu, Terence S. Sio, Sameer R. Keole, William W. Wong, Steven E. Schild, Jonathan B. Ashman, Sujay A. Vora, Daniel J. Ma, Samir H. Patel, Wei Liu
https://…
Transformative or Conservative? Conservation laws for ResNets and Transformers
Sibylle Marcotte, R\'emi Gribonval, Gabriel Peyr\'e
https://arxiv.org/abs/2506.06194
SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer
Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Abu Osama Siddiqui, Sarthak Jain, Priyabrata Mallick, Jaya Sai Kiran Patibandla, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma
https://
Tug-of-war between idiom's figurative and literal meanings in LLMs
Soyoung Oh, Xinting Huang, Mathis Pink, Michael Hahn, Vera Demberg
https://arxiv.org/abs/2506.01723
Discharge dynamics in a cylindrical SDBD prototype reactor under ns-pulsed and sinusoidal AC operation
Konstantinos Giotis (HVL, ECE, LSPM), Dimitrios Stefas (LSPM), Yanis Agha (LSPM), Hans H\"oft (INP), Xavier Duten (LSPM), Panagiotis Svarnas (HVL, ECE), Guillaume Lombardi (LSPM), Kristaq Gazeli (LSPM)
https://arxiv.org/ab…
Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style-Rich Representation
Yongqi Wang, Chunlei Zhang, Hangting Chen, Zhou Zhao, Dong Yu
https://arxiv.org/abs/2506.02997
Trading Under Uncertainty: A Distribution-Based Strategy for Futures Markets Using FutureQuant Transformer
Wenhao Guo, Yuda Wang, Zeqiao Huang, Changjiang Zhang, Shumin ma
https://arxiv.org/abs/2505.05595
Generalizable, real-time neural decoding with hybrid state-space models
Avery Hee-Woon Ryoo, Nanda H. Krishna, Ximeng Mao, Mehdi Azabou, Eva L. Dyer, Matthew G. Perich, Guillaume Lajoie
https://arxiv.org/abs/2506.05320
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon, Jae W. Lee, Sangdoo Yun, Hyun Oh Song
https://arxiv.org/abs/2505.23416
Sleep Brain and Cardiac Activity Predict Cognitive Flexibility and Conceptual Reasoning Using Deep Learning
Boshra Khajehpiri, Eric Granger, Massimiliano de Zambotti, Fiona C. Baker, Mohamad Forouzanfar
https://arxiv.org/abs/2506.00279
BD at BEA 2025 Shared Task: MPNet Ensembles for Pedagogical Mistake Identification and Localization in AI Tutor Responses
Shadman Rohan, Ishita Sur Apan, Muhtasim Ibteda Shochcho, Md Fahim, Mohammad Ashfaq Ur Rahman, AKM Mahbubur Rahman, Amin Ahsan Ali
https://arxiv.org/abs/2506.01817
I have released LLama2.c64 - an LLM running on a C64 with 2MB REU. It runs the Llama2 LLM architecture, using the tokenizer and weights from the Tinystories 260K model.
It's a storytelling model that tries its best to spin your prompt into a story, as if told by a kindergarten child. It will generate one output token about every 8 minutes.
…
CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection
Woojin Shin, Donghwa Kang, Byeongyun Park, Brent Byunghoon Kang, Jinkyu Lee, Hyeongboo Baek
https://arxiv.org/abs/2505.23317
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
Yakun Song, Jiawei Chen, Xiaobin Zhuang, Chenpeng Du, Ziyang Ma, Jian Wu, Jian Cong, Dongya Jia, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen
https://arxiv.org/abs/2506.00385
RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination
Chong Zeng, Yue Dong, Pieter Peers, Hongzhi Wu, Xin Tong
https://arxiv.org/abs/2505.21925
Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation
Yuxuan Hu, Haibin Wu, Ruchao Fan, Xiaofei Wang, Heng Lu, Yao Qian, Jinyu Li
https://arxiv.org/abs/2506.04392
DeepPlantCRE: A Transformer-CNN Hybrid Framework for Plant Gene Expression Modeling and Cross-Species Generalization
Yingjun Wu, Jingyun Huang, Liang Ming, Pengcheng Deng, Maojun Wang, Zeyu Zhang
https://arxiv.org/abs/2505.09883
Automatic detection of abnormal clinical EEG: comparison of a finetuned foundation model with two deep learning models
Aurore Bussalb, Fran\c{c}ois Le Gac, Guillaume Jubien, Mohamed Rahmouni, Ruggero G. Bettinardi, Pedro Marinho R. de Oliveira, Phillipe Derambure, Nicolas Gaspard, Jacques Jonas, Louis Maillard, Laurent Vercueil, Herv\'e Vespignani, Philippe Laval, Laurent Koessler, Ulysse Gimenez
Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment
Parismita Gogoi, Vishwanath Pratap Singh, Seema Khadirnaikar, Soma Siddhartha, Sishir Kalita, Jagabandhu Mishra, Md Sahidullah, Priyankoo Sarmah, S. R. M. Prasanna
https://arxiv.org/abs/2506.00861
Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)
Subba Reddy Oota, Akshett Jindal, Ishani Mondal, Khushbu Pahwa, Satya Sai Srinath Namburi, Manish Shrivastava, Maneesh Singh, Bapi S. Raju, Manish Gupta
https://arxiv.org/abs/2505.20029