
2025-06-10 16:54:09
This https://arxiv.org/abs/2410.03483 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
This https://arxiv.org/abs/2410.03483 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Sukjun Hwang, Brandon Wang, Albert Gu
https://arxiv.org/abs/2507.07955 https://arxiv.org/pdf/2507.07955 https://arxiv.org/html/2507.07955
arXiv:2507.07955v1 Announce Type: new
Abstract: Despite incredible progress in language models (LMs) in recent years, largely resulting from moving away from specialized models designed for specific tasks to general models based on powerful architectures (e.g. the Transformer) that learn everything from raw data, pre-processing steps such as tokenization remain a barrier to true end-to-end foundation models. We introduce a collection of new techniques that enable a dynamic chunking mechanism which automatically learns content -- and context -- dependent segmentation strategies learned jointly with the rest of the model. Incorporating this into an explicit hierarchical network (H-Net) allows replacing the (implicitly hierarchical) tokenization-LM-detokenization pipeline with a single model learned fully end-to-end. When compute- and data- matched, an H-Net with one stage of hierarchy operating at the byte level outperforms a strong Transformer language model operating over BPE tokens. Iterating the hierarchy to multiple stages further increases its performance by modeling multiple levels of abstraction, demonstrating significantly better scaling with data and matching a token-based Transformer of twice its size. H-Nets pretrained on English show significantly increased character-level robustness, and qualitatively learn meaningful data-dependent chunking strategies without any heuristics or explicit supervision. Finally, the H-Net's improvement over tokenized pipelines is further increased in languages and modalities with weaker tokenization heuristics, such as Chinese and code, or DNA sequences (nearly 4x improvement in data efficiency over baselines), showing the potential of true end-to-end models that learn and scale better from unprocessed data.
toXiv_bot_toot
Structured State Space Model Dynamics and Parametrization for Spiking Neural Networks
Maxime Fabre, Lyubov Dudchenko, Emre Neftci
https://arxiv.org/abs/2506.06374
This https://arxiv.org/abs/2502.16626 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_…
This https://arxiv.org/abs/2412.07911 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIT_…
WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction
Jakaria Islam Emon, Kazi Tamanna Alam, Md. Abu Salek
https://arxiv.org/abs/2506.05899
This https://arxiv.org/abs/2409.10283 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
HITSZ's End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track
Xuchen Wei, Yangxin Wu, Yaoyin Zhang, Henglyu Liu, Kehai Chen, Xuefeng Bai, Min Zhang
https://arxiv.org/abs/2507.19616
Nonstationary Distribution Estimation via Wasserstein Probability Flows
Edward J. Anderson, Dominic S. T. Keehan
https://arxiv.org/abs/2507.05893 https://
Fibonacci Numbers and Model-Complete Axiomatization of Presburger Arithmetic Expanded with a Beatty Sequence
Mohsen Khani, Ali N. Valizadeh, Afshin Zarei
https://arxiv.org/abs/2508.02303
Greedy Dynamic Matching
Nick Arnosti, Felipe Simon
https://arxiv.org/abs/2507.04551 https://arxiv.org/pdf/2507.04551
Generating Long Semantic IDs in Parallel for Recommendation
Yupeng Hou, Jiacheng Li, Ashley Shin, Jinsung Jeon, Abhishek Santhanam, Wei Shao, Kaveh Hassani, Ning Yao, Julian McAuley
https://arxiv.org/abs/2506.05781
MCeT: Behavioral Model Correctness Evaluation using Large Language Models
Khaled Ahmed, Jialing Song, Boqi Chen, Ou Wei, Bingzhou Zheng
https://arxiv.org/abs/2508.00630 https://…
Omega-regular Verification and Control for Distributional Specifications in MDPs
S. Akshay (Dept of CSE, Indian Institute of Technology Bombay), Ouldouz Neysari (Singapore Management University, University of Tehran), {\DJ}or{\dj}e \v{Z}ikeli\'c (Singapore Management University)
https://arxiv.org/abs/2507.04286
Elementary Steps of Energy Conversion in Strongly Correlated Systems: Beyond Single Quasiparticles and Rigid Bands
V. Moshnyaga, Ch. Jooss P. E. Bl\"ochl, V. Bruchmann-Bamberg, A. Dehning, L. Allen-Rump, C. Hausmann, M. Kr\"uger, A. Rathnakaran, S. Rajpurohit, D. Steil, C. Flathmann, J. Hoffmann, M. Seibt, C. Volkert
https://
Probing the statistics of sequence-dependent DNA conformations in solution using SAXS
Heidar J. Koning, Anuradha Pullakhandam, Andrew E. Whitten, Charles S. Bond, Michel Peyrard
https://arxiv.org/abs/2508.04358
First Contact: Data-driven Friction-Stir Process Control
James Koch, Ethan King, WoongJo Choi, Megan Ebers, David Garcia, Ken Ross, Keerti Kappagantula
https://arxiv.org/abs/2507.03177
Input-Sensitive Reconfiguration of Sliding Cubes
Hugo Akitaya, Matias Korman, Frederick Stock
https://arxiv.org/abs/2507.04170 https://
TrinityDNA: A Bio-Inspired Foundational Model for Efficient Long-Sequence DNA Modeling
Qirong Yang, Yucheng Guo, Zicheng Liu, Yujie Yang, Qijin Yin, Siyuan Li, Shaomin Ji, Linlin Chao, Xiaoming Zhang, Stan Z. Li
https://arxiv.org/abs/2507.19229
hqQUBO: A Hybrid-querying Quantum Optimization Model Validated with 16-qubits on an Ion Trap Quantum Computer for Life Science Applications
Rong Chen, Quan-Xin Mei, Wen-Ding Zhao, Lin Yao, Hao-Xiang Yang, Shun-Yao Zhang, Jiao Chen, Hong-Lin Li
https://arxiv.org/abs/2506.01559
This https://arxiv.org/abs/2409.03833 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_grqc_…
This https://arxiv.org/abs/2505.24293 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
This https://arxiv.org/abs/2501.18626 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…
Testing parametric additive time-varying GARCH models
Niklas Ahlgren, Alexander Back, Timo Ter\"asvirta
https://arxiv.org/abs/2506.23821 https://
Performance of the image persistence model for Euclid infrared detectors
B. Kubik, R. Barbier, G. Smadja, S. Ferriol, Y. Conseil, Y. Copin, W. Gillard, S. Dusini, K. Jahnke, E. Prieto, N. Auricchio, E. Balbi, A. Balestra, P. Battaglia, V. Capobianco, R. Chary, L. Corcione, F. Cogato, G. Delucchi, E. Franceschi, L. Gabarra, F. Gianotti, F. Grupp, E. Lentini, S. Ligori, E. Medinaceli, G. Morgante, K. Paterson, E. Romelli, L. Sauniere, M. Schirmer, C. Sirignano G. Testera, M. Trifoglio, A. Troja, L. Valenziano, M. Frailis, M. Scodeggio, J. -C. Barriere, M. Berthe, C. Bodendorf, A. Caillat, M. Carle, R. Casas, H. Cho, A. Costille, F. Ducret, B. Garilli, W. Holmes, F. Hormuth, A. Hornstrup, M. Jhabvala, R. Kohley, D. Le Mignant, P. B. Lilje, I. Lloro, C. Padilla, G. Polenta, J. -C. Salvignol, G. Seidel, B. Serra, A. Secroun, L. Stanco, R. Toledo-Moreo, S. Anselmi, E. Borsato, L. Caillat, C. Colodro-Conde, V. Conforti, J. E. Davies, A. Renzi, F. Dal Corso, S. Davini, A. Derosa, J. J. Diaz, S. Di Domizio, D. Di Ferdinando, R. Farinelli, A. G. Ferrari, F. Fornari, F. Giacomini, O. Krause, F. Laudisio, J. Macias-Perez, J. Marpaud, N. Mauri, R. da Silva, M. Niclas, F. Passalacqua, I. Risso, P. Lagier, A. N. Sorensen, P. Stassi, J. Steinwagner, M. Tenti, C. Thizy, S. Tosi, R. Travaglini, O. Tubio, C. Valieri, S. Ventura, C. Vescovi, J. Zoubian
#toXiv_bot_toot
AI Accelerators for Large Language Model In-ference: Architecture Analysis and Scaling Strategies
Amit Sharma
https://arxiv.org/abs/2506.00008 https://
Guiding an Automatic Speech Recognition Decoder Using Large Language Models
Eyal Cohen (Technion - Israel Institute of Technology), Bhiksha Raj (Carnegie Mellon University), Joseph Keshet (Technion - Israel Institute of Technology)
https://arxiv.org/abs/2508.02228
Counting Distinct Square Substrings in Sublinear Time
Panagiotis Charalampopoulos, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Tomasz Wale\'n, Wiktor Zuba
https://arxiv.org/abs/2508.03930
Monitoring Robustness and Individual Fairness
Ashutosh Gupta, Thomas A. Henzinger, Konstantin Kueffner, Kaushik Mallik, David Pape
https://arxiv.org/abs/2506.00496
Bridging Expressivity and Scalability with Adaptive Unitary SSMs
Arjun Karuvally, Franz Nowak, Anderson T. Keller, Carmen Amo Alonso, Terrence J. Sejnowski, Hava T. Siegelmann
https://arxiv.org/abs/2507.05238
Sequence-Only Prediction of Binding Affinity Changes: A Robust and Interpretable Model for Antibody Engineering
Chen Liu, Mingchen Li, Yang Tan, Wenrui Gou, Guisheng Fan, Bingxin Zhou
https://arxiv.org/abs/2505.20301
Error estimates of linear decoupled structure-preserving incremental viscosity splitting methods for the Cahn--Hilliard--Navier--Stokes system
Baolin Kuang, Hongfei Fu, Xiaoli Li
https://arxiv.org/abs/2508.01141
Effect of Matter Accretion on Lithium Enhancement of Giants
Xuefeng Li, Jianrong Shi, Yan Li, Hongliang Yan, Jinghua Zhang, Fei Guo
https://arxiv.org/abs/2508.00405 https://
Gaussian Sequence Model: Sample Complexities of Testing, Estimation and LFHT
Zeyu Jia, Yury Polyanskiy
https://arxiv.org/abs/2507.16734 https://
Dispersion models on a circle: universal properties and asymptotic results
Jean-Fran\c{c}ois Marckert, Zo\'e Varin
https://arxiv.org/abs/2507.00737 htt…
Geometry Meets Incentives: Sample-Efficient Incentivized Exploration with Linear Contexts
Benjamin Schiffer, Mark Sellke
https://arxiv.org/abs/2506.01685 h…
Latent-X: An Atom-level Frontier Model for De Novo Protein Binder Design
Latent Labs Team, Alex Bridgland, Jonathan Crabb\'e, Henry Kenlay, Daniella Pretorius, Sebastian M. Schmon, Agrin Hilmkil, Rebecca Bartke-Croughan, Robin Rombach, Michael Flashman, Tomas Matteson, Simon Mathis, Alexander W. R. Nelson, David Yuan, Annette Obika, Simon A. A. Kohl
https:/…
DeSamba: Decoupled Spectral Adaptive Framework for 3D Multi-Sequence MRI Lesion Classification
Dezhen Wang, Sheng Miao, Rongxin Chai, Jiufa Cui
https://arxiv.org/abs/2507.15487
Persuasion in the Long Run: When history matters
Hyeonggyun Ko
https://arxiv.org/abs/2508.01662 https://arxiv.org/pdf/2508.01662
An nth-cousin mating model and the n-anacci numbers
Elisa Heinrich Mora, Noah A. Rosenberg
https://arxiv.org/abs/2506.16577 https://a…
StruMamba3D: Exploring Structural Mamba for Self-supervised Point Cloud Representation Learning
Chuxin Wang, Yixin Zha, Wenfei Yang, Tianzhu Zhang
https://arxiv.org/abs/2506.21541
SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model
Zhao Yang, Jiwei Zhu, Bing Su
https://arxiv.org/abs/2506.01833 https://
The Kuramoto model on the Sierpinski Gasket
Georgi S. Medvedev, Matthew S. Mizuhara
https://arxiv.org/abs/2506.12940 https://arxiv.or…
Reducing Quantum Circuit Synthesis to #SAT
Dekel Zak, Jingyi Mei, Jean-Marie Lagniez, Alfons Laarman
https://arxiv.org/abs/2508.00416
Dissociating model architectures from inference computations
Noor Sajid, Johan Medrano
https://arxiv.org/abs/2507.15776 https://arxiv…
This https://arxiv.org/abs/2503.14328 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_mat…
Hybrid Tokenization Strategy for DNA Language Model using Byte Pair Encoding and K-MER Methods
Ganesh Sapkota, Md Hasibur Rahman
https://arxiv.org/abs/2507.18570 https://…
Integrating LLM-Derived Multi-Semantic Intent into Graph Model for Session-based Recommendation
Shuo Zhang, Xiao Li, Jiayi Wu, Fan Yang, Xiang Li, Ming Gao
https://arxiv.org/abs/2507.20147
A Minimum Distance Estimator Approach for Misspecified Ergodic Processes
Jaroslav I. Borodavka, Sebastian Krumscheid, Grigorios A. Pavliotis
https://arxiv.org/abs/2506.12432
Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture
Sebastian Murgul, Michael Heizmann
https://arxiv.org/abs/2507.00466
eccDNAMamba: A Pre-Trained Model for Ultra-Long eccDNA Sequence Analysis
Zhenke Liu, Jien Li, Ziqi Zhang
https://arxiv.org/abs/2506.18940 https://
Pendulum Model of Spiking Neurons
Joy Bose
https://arxiv.org/abs/2507.22146 https://arxiv.org/pdf/2507.22146
Hybrid Approach for Electricity Price Forecasting using AlexNet and LSTM
Bosubabu Sambana, Kotamsetty Geethika Devi, Bandi Rajeswara Reddy, Galeti Mohammad Hussain, Gownivalla Siddartha
https://arxiv.org/abs/2506.23504
Transformers as Multi-task Learners: Decoupling Features in Hidden Markov Models
Yifan Hao, Chenlu Ye, Chi Han, Tong Zhang
https://arxiv.org/abs/2506.01919
FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas M\"uller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, Luke Smith
Critical Metallicity of Cool Supergiant Formation. II. Physical Origin
Po-Sheng Ou, Ke-Jung Chen
https://arxiv.org/abs/2506.01753 https://
Topological crystals and soliton lattices in a Gross-Neveu model with Hilbert-space fragmentation
Sergio Cerezo-Roquebr\'un, Simon Hands, Alejandro Bermudez
https://arxiv.org/abs/2506.18675
Inverse scattering transform via affine map: applications to high-speed nonlinear optical communications
Ilia Kuk, Ildar R. Gabitov
https://arxiv.org/abs/2507.20470 https://
This https://arxiv.org/abs/2501.00256 has been replaced.
initial toot: https://mastoxiv.page/@arX…
Aptamer-protein interaction prediction model based on transformer
Zhichao Yan, Yue Kang, Buyong Ma
https://arxiv.org/abs/2506.16084 https://
Alice and the Caterpillar: A more descriptive null model for assessing data mining results
Giulia Preti, Gianmarco De Francisci Morales, Matteo Riondato
https://arxiv.org/abs/2506.09764
Role of bubble positioning in force induced melting of DNA
Bidisha Mukherjee, Amit Raj Singh, Garima Mishra
https://arxiv.org/abs/2506.18821 https://
General Proximal Quasi-Newton Methods based on model functions for nonsmooth nonconvex problems
Xiaoxi Jia, Peter Ochs
https://arxiv.org/abs/2507.18363 https://
STEP Planner: Constructing cross-hierarchical subgoal tree as an embodied long-horizon task planner
Zhou Tianxing, Wang Zhirui, Ao Haojia, Chen Guangyan, Xing Boyang, Cheng Jingwen, Yang Yi, Yue Yufeng
https://arxiv.org/abs/2506.21030
Captain Cinema: Towards Short Movie Generation
Junfei Xiao, Ceyuan Yang, Lvmin Zhang, Shengqu Cai, Yang Zhao, Yuwei Guo, Gordon Wetzstein, Maneesh Agrawala, Alan Yuille, Lu Jiang
https://arxiv.org/abs/2507.18634
This https://arxiv.org/abs/2504.10545 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…
Dark energy era with a resolution of Hubble tension in generalized entropic cosmology
Priyanka Adhikary, Sudipta Das, Sergei D. Odintsov, Tanmoy Paul
https://arxiv.org/abs/2507.15273
Misspecified Bayesianism
Pooya Molavi
https://arxiv.org/abs/2507.22775 https://arxiv.org/pdf/2507.22775
Uncertainty-Aware Genomic Classification of Alzheimer's Disease: A Transformer-Based Ensemble Approach with Monte Carlo Dropout
Taeho Jo, Eun Hye Lee, Alzheimer's Disease Sequencing Project
https://arxiv.org/abs/2506.00662
Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving
Wonung Kim, Yubin Lee, Yoonsung Kim, Jinwoo Hwang, Seongryong Oh, Jiyong Jung, Aziz Huseynov, Woong Gyu Park, Chang Hyun Park, Divya Mahajan, Jongse Park
https://arxiv.org/abs/2507.10178
Speeding up thermalization and quantum state preparation through engineered quantum collisions
Sofia Sgroi, Salvatore Lorenzo, Luca Innocenti, Paolo A. Erdman, G. Massimo Palma, Mauro Paternostro
https://arxiv.org/abs/2506.20625
A Stackelberg Game of Demand Response from the Aggregator's Perspective
Seangleng Khe, Parin Chaipunya, Athikom Bangviwat
https://arxiv.org/abs/2507.12708
A MILP-Based Solution to Multi-Agent Motion Planning and Collision Avoidance in Constrained Environments
Akshay Jaitly, Jack Cline, Siavash Farzan
https://arxiv.org/abs/2506.21982
Dynamic Parameter Memory: Temporary LoRA-Enhanced LLM for Long-Sequence Emotion Recognition in Conversation
Jialong Mai, Xiaofen Xing, Yawei Li, Zhipeng Li, Jingyuan Xing, Xiangmin Xu
https://arxiv.org/abs/2507.09076
The BRS Cohomology of the Wess Zumino Chiral Scalar supersymmetric model with exotic pairs and exotic triplets (E2)
John A. Dixon
https://arxiv.org/abs/2507.14174 https://
Bounds of Shannon entropy and Extropy and their application in exploring the extreme value behavior of a large set of data
Konstantinos Zografos
https://arxiv.org/abs/2507.13656
AmpLyze: A Deep Learning Model for Predicting the Hemolytic Concentration
Peng Qiu, Hanqi Feng, Barnabas Poczos
https://arxiv.org/abs/2507.08162 https://…
Stereo sound event localization and detection based on PSELDnet pretraining and BiMamba sequence modeling
Wenmiao Gao, Yang Xiao
https://arxiv.org/abs/2506.13455
Unsupervised deep learning model for fast energy layer pre-selection of delivery-efficient proton arc therapy plan optimization of nasopharyngeal carcinoma
Bohan Yang, Gang Liu, Rirao Dao, Yujia Qian, Ke Shi, Anke Tang, Yong Luo, Jingnan Liu
https://arxiv.org/abs/2506.15803
Covariance Decomposition for Distance Based Species Tree Estimation
Georgios Aliatimis, Ruriko Yoshida, Burak Boyak, James Grant
https://arxiv.org/abs/2506.16425
A framework for modeling the evolution of young stellar objects
Theo Richardson, Adam Ginsburg, Erik Rosolowsky, Joshua Peltonen, R\'emy Indebetouw
https://arxiv.org/abs/2507.16944
Sequence Modeling for Time-Optimal Quadrotor Trajectory Optimization with Sampling-based Robustness Analysis
Katherine Mao, Hongzhan Yu, Ruipeng Zhang, Igor Spasojevic, M Ani Hsieh, Sicun Gao, Vijay Kumar
https://arxiv.org/abs/2506.13915
Nonparametric predictive inference for discrete data via Metropolis-adjusted Dirichlet sequences
Davide Agnoletto, Tommaso Rigon, David B. Dunson
https://arxiv.org/abs/2507.08629 …
Predicting function of evolutionarily implausible DNA sequences
Shiyu Jiang, Xuyin Liu, Zitong Jerry Wang
https://arxiv.org/abs/2506.10271 https://
RNAMunin: A Deep Machine Learning Model for Non-coding RNA Discovery
Lauren Lui, Torben Nielsen
https://arxiv.org/abs/2507.11950 https://
Mixture of Raytraced Experts
Andrea Perin, Giacomo Lagomarsini, Claudio Gallicchio, Giuseppe Nuti
https://arxiv.org/abs/2507.12419 https://
Simulation-trained conditional normalizing flows for likelihood approximation: a case study in stress regulation kinetics in yeast
Pedro Pessoa, Juan Andres Martinez, Vincent Vandenbroucke, Frank Delvigne, Steve Press\'e
https://arxiv.org/abs/2506.09374
Dictionary Learning Based Regularization in Quantitative MRI: A Nested Alternating Optimization Framework
Guozhi Dong, Michael Hinterm\"uller, Clemens Sirotenko
https://arxiv.org/abs/2506.11977
EnerBridge-DPO: Energy-Guided Protein Inverse Folding with Markov Bridges and Direct Preference Optimization
Dingyi Rong, Haotian Lu, Wenzhuo Zheng, Fan Zhang, Shuangjia Zheng, Ning Liu
https://arxiv.org/abs/2506.09496
Enhancing Stereo Sound Event Detection with BiMamba and Pretrained PSELDnet
Wenmiao Gao, Han Yin
https://arxiv.org/abs/2507.09570 https://
Multimodal Modeling of CRISPR-Cas12 Activity Using Foundation Models and Chromatin Accessibility Data
Azim Dehghani Amirabad, Yanfei Zhang, Artem Moskalev, Sowmya Rajesh, Tommaso Mansi, Shuwei Li, Mangal Prakash, Rui Liao
https://arxiv.org/abs/2506.11182