
2025-09-10 10:29:21
General Demographic Foundation Models for Enhancing Predictive Performance Across Diseases
Li-Chin Chen, Ji-Tian Sheu, Yuh-Jue Chuang
https://arxiv.org/abs/2509.07330 https://…
General Demographic Foundation Models for Enhancing Predictive Performance Across Diseases
Li-Chin Chen, Ji-Tian Sheu, Yuh-Jue Chuang
https://arxiv.org/abs/2509.07330 https://…
BIR-Adapter: A Low-Complexity Diffusion Model Adapter for Blind Image Restoration
Cem Eteke, Alexander Griessel, Wolfgang Kellerer, Eckehard Steinbach
https://arxiv.org/abs/2509.06904
Nature of Data in Pre-Trained Large Language Models
https://fpf.org/blog/nature-of-data-in-pre-trained-large-language-models/
@…
Classical Neural Networks on Quantum Devices via Tensor Network Disentanglers: A Case Study in Image Classification
Borja Aizpurua, Sukhbinder Singh, Rom\'an Or\'us
https://arxiv.org/abs/2509.06653
Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction
Jun-You Wang, Li Su
https://arxiv.org/abs/2507.04776 https:/…
Pronunciation-Lexicon Free Training for Phoneme-based Crosslingual ASR via Joint Stochastic Approximation
Saierdaer Yusuyin, Te Ma, Hao Huang, Zhijian Ou
https://arxiv.org/abs/2507.06249
SPENCER: Self-Adaptive Model Distillation for Efficient Code Retrieval
Wenchao Gu, Zongyi Lyu, Yanlin Wang, Hongyu Zhang, Cuiyun Gao, Michael R. Lyu
https://arxiv.org/abs/2508.00546
Convolutions are Competitive with Transformers for Encrypted Traffic Classification with Pre-training
Chungang Lin, Weiyao Zhang, Tianyu Zuo, Chao Zha, Yilong Jiang, Ruiqi Meng, Haitong Luo, Xuying Meng, Yujun Zhang
https://arxiv.org/abs/2508.02001
fact check AI at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-checked Claim Retrieval
Pranshu Rastogi
https://arxiv.org/abs/2508.03475 https://a…
Efficient Video-to-Audio Generation via Multiple Foundation Models Mapper
Gehui Chen, Guan'an Wang, Xiaowen Huang, Jitao Sang
https://arxiv.org/abs/2509.04957 https://
New pre-print! #ai
**Universal pre-training by iterated random computation.**
⌨️🐒 A monkey behind a typewriter will produce the collected works of Shakespeare eventually.
💻🐒 But what if we put a monkey behind a computer?
⌨️🐒 needs to be lucky enough to type all characters of all of Shakespeare correctly. 💻🐒 only needs to be lucky enough to type a program for Shakespeare.
Kronos: A Foundation Model for the Language of Financial Markets
Yu Shi, Zongliang Fu, Shuo Chen, Bohan Zhao, Wei Xu, Changshui Zhang, Jian Li
https://arxiv.org/abs/2508.02739 h…
LobRA: Multi-tenant Fine-tuning over Heterogeneous Data
Sheng Lin, Fangcheng Fu, Haoyang Li, Hao Ge, Xuanyu Wang, Jiawen Niu, Yaofeng Tu, Bin Cui
https://arxiv.org/abs/2509.01193
Fine-tuning physics-informed neural networks for cavity flows using coordinate transformation
Ryuta Takao, Satoshi Ii
https://arxiv.org/abs/2508.01122 https://
Finetuning AI Foundation Models to Develop Subgrid-Scale Parameterizations: A Case Study on Atmospheric Gravity Waves
Aman Gupta, Aditi Sheshadri, Sujit Roy, Johannes Schmude, Vishal Gaur, Wei Ji Leong, Manil Maskey, Rahul Ramachandran
https://arxiv.org/abs/2509.03816
Uncertainty Quantification for Large-Scale Deep Networks via Post-StoNet Modeling
Yan Sun, Faming Liang
https://arxiv.org/abs/2508.01217 https://arxiv.org/…
Improving Pre-Trained Vision-Language-Action Policies with Model-Based Search
Cyrus Neary, Omar G. Younis, Artur Kuramshin, Ozgur Aslan, Glen Berseth
https://arxiv.org/abs/2508.12211
Fluid Antenna Port Prediction based on Large Language Models
Yali Zhang, Haifan Yin, Weidong Li, Emil Bjornson, Merouane Debbah
https://arxiv.org/abs/2509.01121 https://
How Robust is Model Editing after Fine-Tuning? An Empirical Study on Text-to-Image Diffusion Models
Feng He, Zhenyang Liu, Marco Valentino, Zhixue Zhao
https://arxiv.org/abs/2506.18428
Using Scaling Laws for Data Source Utility Estimation in Domain-Specific Pre-Training
Oleksiy Ostapenko, Charles Guille-Escuret, Luke Kumar, Max Tian, Denis Kocetkov, Gopeshh Subbaraj, Raymond Li, Joel Lamy-Poirier, Sebastien Paquet, Torsten Scholak
https://arxiv.org/abs/2507.22250
HITSZ's End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track
Xuchen Wei, Yangxin Wu, Yaoyin Zhang, Henglyu Liu, Kehai Chen, Xuefeng Bai, Min Zhang
https://arxiv.org/abs/2507.19616
Smart Contract Intent Detection with Pre-trained Programming Language Model
Youwei Huang, Jianwen Li, Sen Fang, Yao Li, Peng Yang, Bin Hu, Tao Zhang
https://arxiv.org/abs/2508.20086
Pre-trained Transformer-models using chronic invasive electrophysiology for symptom decoding without patient-individual training
Timon Merk, Saeed Salehi, Richard M. Koehler, Qiming Cui, Maria Olaru, Amelia Hahn, Nicole R. Provenza, Simon Little, Reza Abbasi-Asl, Phil A. Starr, Wolf-Julian Neumann
https://arxiv.org/abs/2508.10160
CoPS: Conditional Prompt Synthesis for Zero-Shot Anomaly Detection
Qiyu Chen, Zhen Qu, Wei Luo, Haiming Yao, Yunkang Cao, Yuxin Jiang, Yinan Duan, Huiyuan Luo, Chengkan Lv, Zhengtao Zhang
https://arxiv.org/abs/2508.03447
Migration as a Probe: A Generalizable Benchmark Framework for Specialist vs. Generalist Machine-Learned Force Fields in Doped Materials
Yi Cao, Paulette Clancy
https://arxiv.org/abs/2509.00090
QPART: Adaptive Model Quantization and Dynamic Workload Balancing for Accuracy-aware Edge Inference
Xiangchen Li, Saeid Ghafouri, Bo Ji, Hans Vandierendonck, Deepu John, Dimitrios S. Nikolopoulos
https://arxiv.org/abs/2506.23934
Speaker-Conditioned Phrase Break Prediction for Text-to-Speech with Phoneme-Level Pre-trained Language Model
Dong Yang, Yuki Saito, Takaaki Saeki, Tomoki Koriyama, Wataru Nakata, Detai Xin, Hiroshi Saruwatari
https://arxiv.org/abs/2509.00675
ELIXIR: Efficient and LIghtweight model for eXplaIning Recommendations
Ben Kabongo, Vincent Guigue, Pirmin Lemberger
https://arxiv.org/abs/2508.20312 https://
After training, we finetune on real-world data. We observe that the models that have been pre-trained with noise converge very quickly compared to a baseline which is trained from scratch.
Moreover, on the other datasets, the UP models retain their zero-shot performance during finetuning. This suggests that there may be a generalization benefit to using a UP model.
All this is at the expense of much longer training, but that cost can be amortized over many tasks.
LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models
Delong Ran, Xinlei He, Tianshuo Cong, Anyu Wang, Qi Li, Xiaoyun Wang
https://arxiv.org/abs/2507.18302
eccDNAMamba: A Pre-Trained Model for Ultra-Long eccDNA Sequence Analysis
Zhenke Liu, Jien Li, Ziqi Zhang
https://arxiv.org/abs/2506.18940 https://
chDzDT: Word-level morphology-aware language model for Algerian social media text
Abdelkrime Aries
https://arxiv.org/abs/2509.01772 https://arxiv.org/pdf/2…
Aptamer-protein interaction prediction model based on transformer
Zhichao Yan, Yue Kang, Buyong Ma
https://arxiv.org/abs/2506.16084 https://
Fast Training-free Perceptual Image Compression
Ziran Zhu, Tongda Xu, Minye Huang, Dailan He, Xingtong Ge, Xinjie Zhang, Ling Li, Yan Wang
https://arxiv.org/abs/2506.16102
MedCAL-Bench: A Comprehensive Benchmark on Cold-Start Active Learning with Foundation Models for Medical Image Analysis
Ning Zhu, Xiaochuan Ma, Shaoting Zhang, Guotai Wang
https://arxiv.org/abs/2508.03441
DiffractGPT: Atomic Structure Determination from X-ray Diffraction Patterns using Generative Pre-trained Transformer
Kamal Choudhary
https://arxiv.org/abs/2508.08349 https://
Hybrid-Sep: Language-queried audio source separation via pre-trained Model Fusion and Adversarial Diffusion Training
Jianyuan Feng, Guangzheng Li, Yangfei Xu
https://arxiv.org/abs/2506.16833
HyperCLOVA X THINK Technical Report
NAVER Cloud HyperCLOVA X Team
https://arxiv.org/abs/2506.22403 https://arxiv.org/pdf/2506.22403…
StyleMM: Stylized 3D Morphable Face Model via Text-Driven Aligned Image Translation
Seungmi Lee, Kwan Yun, Junyong Noh
https://arxiv.org/abs/2508.11203 https://
LaVA-Man: Learning Visual Action Representations for Robot Manipulation
Chaoran Zhu, Hengyi Wang, Yik Lung Pang, Changjae Oh
https://arxiv.org/abs/2508.19391 https://
Finetuning Stellar Spectra Foundation Models with LoRA
Xiaosheng Zhao, Yuan-Sen Ting, Alexander S. Szalay, Yang Huang
https://arxiv.org/abs/2507.20972 https://
LLMulator: Generalizable Cost Modeling for Dataflow Accelerators with Input-Adaptive Control Flow
Kaiyan Chang, Wenlong Zhu, Shengwen Liang, Huawei Li, Ying Wang
https://arxiv.org/abs/2508.17826
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Yukang Cao, Chenyang Si, Jinghao Wang, Ziwei Liu
https://arxiv.org/abs/2507.01953 ht…
BrainSymphony: A Transformer-Driven Fusion of fMRI Time Series and Structural Connectivity
Moein Khajehnejad, Forough Habibollahi, Adeel Razi
https://arxiv.org/abs/2506.18314
Arabic Hate Speech Identification and Masking in Social Media using Deep Learning Models and Pre-trained Models Fine-tuning
Salam Thabet Doghmash, Motaz Saad
https://arxiv.org/abs/2507.23661
MixedG2P-T5: G2P-free Speech Synthesis for Mixed-script texts using Speech Self-Supervised Learning and Language Model
Joonyong Park, Daisuke Saito, Nobuaki Minematsu
https://arxiv.org/abs/2509.01391
Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability
Xiaoxu Zhu, Junhua Li
https://arxiv.org/abs/2507.17851 https://arxiv.org/pdf/…
Your AI, Not Your View: The Bias of LLMs in Investment Analysis
Hoyoung Lee, Junhyuk Seo, Suhwan Park, Junhyeong Lee, Wonbin Ahn, Chanyeol Choi, Alejandro Lopez-Lira, Yongjae Lee
https://arxiv.org/abs/2507.20957
Guided Unconditional and Conditional Generative Models for Super-Resolution and Inference of Quasi-Geostrophic Turbulence
Anantha Narayanan Suresh Babu, Akhil Sadam, Pierre F. J. Lermusiaux
https://arxiv.org/abs/2507.00719
Cross-Modality Controlled Molecule Generation with Diffusion Language Model
Yunzhe Zhang, Yifei Wang, Khanh Vinh Nguyen, Pengyu Hong
https://arxiv.org/abs/2508.14748 https://
Replaced article(s) found for physics.geo-ph. https://arxiv.org/list/physics.geo-ph/new
[1/1]:
- PRIME-DP: Pre-trained Integrated Model for Earthquake Data Processing
Ziye Yu, Yuqi Cai, Weitao Wang, Yanru An, Lu Li, Yueyang Xia, Yunpeng Zhang
GANet-Seg: Adversarial Learning for Brain Tumor Segmentation with Hybrid Generative Models
Qifei Cui, Xinyu Lu
https://arxiv.org/abs/2506.21245 https://
On the Effect of Token Merging on Pre-trained Models for Code
Mootez Saad, Hao Li, Tushar Sharma, Ahmed E. Hassan
https://arxiv.org/abs/2507.14423 https://…
Anatomy of a Machine Learning Ecosystem: 2 Million Models on Hugging Face
Benjamin Laufer, Hamidah Oderinwale, Jon Kleinberg
https://arxiv.org/abs/2508.06811 https://
FlowVLA: Thinking in Motion with a Visual Chain of Thought
Zhide Zhong, Haodong Yan, Junfeng Li, Xiangchen Liu, Xin Gong, Wenxuan Song, Jiayi Chen, Haoang Li
https://arxiv.org/abs/2508.18269
LMAR: Language Model Augmented Retriever for Domain-specific Knowledge Indexing
Yao Zhao, Yantian Ding, Zhiyue Zhang, Dapeng Yao, Yanxun Xu
https://arxiv.org/abs/2508.05672 http…
Adapting Language Models to Indonesian Local Languages: An Empirical Study of Language Transferability on Zero-Shot Settings
Rifki Afina Putri
https://arxiv.org/abs/2507.01645
Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models
Chao Li, Jiawei Fan, Anbang Yao
https://arxiv.org/abs/2506.18251 https://
In-Context Learning as Nonparametric Conditional Probability Estimation: Risk Bounds and Optimality
Chenrui Liu, Falong Tan, Chuanlong Xie, Yicheng Zeng, Lixing Zhu
https://arxiv.org/abs/2508.08673
Staining and locking computer vision models without retraining
Oliver J. Sutton, Qinghua Zhou, George Leete, Alexander N. Gorban, Ivan Y. Tyukin
https://arxiv.org/abs/2507.22000
Computer Vision for Real-Time Monkeypox Diagnosis on Embedded Systems
Jacob M. Delgado-L\'opez, Ricardo A. Morell-Rodriguez, Sebasti\'an O. Espinosa-Del Rosario, Wilfredo E. Lugo-Beauchamp
https://arxiv.org/abs/2507.17123
Effective Fine-Tuning of Vision Transformers with Low-Rank Adaptation for Privacy-Preserving Image Classification
Haiwei Lin, Shoko Imaizumi, Hitoshi Kiya
https://arxiv.org/abs/2507.11943
Adversarial Defence without Adversarial Defence: Enhancing Language Model Robustness via Instance-level Principal Component Removal
Yang Wang, Chenghao Xiao, Yizhi Li, Stuart E. Middleton, Noura Al Moubayed, Chenghua Lin
https://arxiv.org/abs/2507.21750
Efficient Vocal-Conditioned Music Generation via Soft Alignment Attention and Latent Diffusion
Hei Shing Cheung, Boya Zhang
https://arxiv.org/abs/2507.19991 https://
Foundation Model-Aided Deep Reinforcement Learning for RIS-Assisted Wireless Communication
Mohammad Ghassemi, Sara Farrag Mobarak, Han Zhang, Ali Afana, Akram Bin Sediq, Melike Erol-Kantarci
https://arxiv.org/abs/2506.09855
Time-Aware One Step Diffusion Network for Real-World Image Super-Resolution
Tainyi Zhang, Zheng-Peng Duan, Peng-Tao Jiang, Bo Li, Ming-Ming Cheng, Chun-Le Guo, Chongyi Li
https://arxiv.org/abs/2508.16557
Back to the Features: DINO as a Foundation for Video World Models
Federico Baldassarre, Marc Szafraniec, Basile Terver, Vasil Khalidov, Francisco Massa, Yann LeCun, Patrick Labatut, Maximilian Seitzer, Piotr Bojanowski
https://arxiv.org/abs/2507.19468
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu, Qinghao Hu, Shang Yang, Haocheng Xi, Junyu Chen, Song Han, Han Cai
https://arxiv.org/abs/2508.15884
Personalized Product Search Ranking: A Multi-Task Learning Approach with Tabular and Non-Tabular Data
Lalitesh Morishetti, Abhay Kumar, Jonathan Scott, Kaushiki Nag, Gunjan Sharma, Shanu Vashishtha, Rahul Sridhar, Rohit Chatter, Kannan Achan
https://arxiv.org/abs/2508.09636
ACME: Adaptive Customization of Large Models via Distributed Systems
Ziming Dai, Chao Qiu, Fei Gao, Yunfeng Zhao, Xiaofei Wang
https://arxiv.org/abs/2507.14802
Composition and Alignment of Diffusion Models using Constrained Learning
Shervin Khalafi, Ignacio Hounie, Dongsheng Ding, Alejandro Ribeiro
https://arxiv.org/abs/2508.19104 http…
Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection
Bing Han, Anbai Jiang, Xinhu Zheng, Wei-Qiang Zhang, Jia Liu, Pingyi Fan, Yanmin Qian
https://arxiv.org/abs/2508.12230
Generalizable Detection of Audio Deepfakes
Jose A. Lopez, Georg Stemmer, H\'ector Cordourier Maruri
https://arxiv.org/abs/2507.01750 https://
DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures
Benno Uthayasooriyar, Antoine Ly, Franck Vermet, Caio Corro
https://arxiv.org/abs/2507.08606
Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving
Luke Rowe, Rodrigue de Schaetzen, Roger Girgis, Christopher Pal, Liam Paull
https://arxiv.org/abs/2506.11234
An Empirical Study of Knowledge Distillation for Code Understanding Tasks
Ruiqi Wang, Zezhou Yang, Cuiyun Gao, Xin Xia, Qing Liao
https://arxiv.org/abs/2508.15423 https://
EditP23: 3D Editing via Propagation of Image Prompts to Multi-View
Roi Bar-On, Dana Cohen-Bar, Daniel Cohen-Or
https://arxiv.org/abs/2506.20652 https://
Amortized In-Context Mixed Effect Transformer Models: A Zero-Shot Approach for Pharmacokinetics
C\'esar Ali Ojeda Marin, Wilhelm Huisinga, Purity Kavwele, Niklas Hartung
https://arxiv.org/abs/2508.15659
Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training
Woojin Chung, Jeonghoon Kim
https://arxiv.org/abs/2508.15390 https://arxiv.org/pdf…
Efficient Code Embeddings from Code Generation Models
Daria Kryvosheieva, Saba Sturua, Michael G\"unther, Scott Martens, Han Xiao
https://arxiv.org/abs/2508.21290 https://
CLIP-HandID: Vision-Language Model for Hand-Based Person Identification
Nathanael L. Baisa, Babu Pallam, Amudhavel Jayavel
https://arxiv.org/abs/2506.12447
UNICON: UNIfied CONtinual Learning for Medical Foundational Models
Mohammad Areeb Qazi, Munachiso S Nwadike, Ibrahim Almakky, Mohammad Yaqub, Numan Saeed
https://arxiv.org/abs/2508.14024
Do Not Mimic My Voice: Speaker Identity Unlearning for Zero-Shot Text-to-Speech
Taesoo Kim, Jinju Kim, Dongchan Kim, Jong Hwan Ko, Gyeong-Moon Park
https://arxiv.org/abs/2507.20140
Multi-Lingual Implicit Discourse Relation Recognition with Multi-Label Hierarchical Learning
Nelson Filipe Costa, Leila Kosseim
https://arxiv.org/abs/2508.20712 https://
Replaced article(s) found for cs.SE. https://arxiv.org/list/cs.SE/new
[1/1]:
- "I see models being a whole other thing": An Empirical Study of Pre-Trained Model Naming Conventi...
Wenxin Jiang, Mingyu Kim, Chingwo Cheung, Heesoo Kim, George K. Thiruvathukal, James C. Davis
…
Implementing Adaptations for Vision AutoRegressive Model
Kaif Shaikh, Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic
https://arxiv.org/abs/2507.11441 …
Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation
Zhen Han, Mattias Teye, Derek Yadgaroff, Judith B\"utepage
https://arxiv.org/abs/2507.18352
Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation
Weiting Tan, Jiachen Lian, Hirofumi Inaguma, Paden Tomasello, Philipp Koehn, Xutai Ma
https://arxiv.org/abs/2508.16188
Bounding Distributional Shifts in World Modeling through Novelty Detection
Eric Jing, Abdeslam Boularias
https://arxiv.org/abs/2508.06096 https://arxiv.org…
Adaptively Robust LLM Inference Optimization under Prediction Uncertainty
Zixi Chen, Yinyu Ye, Zijie Zhou
https://arxiv.org/abs/2508.14544 https://arxiv.or…
Transfer learning optimization based on evolutionary selective fine tuning
Jacinto Colan, Ana Davila, Yasuhisa Hasegawa
https://arxiv.org/abs/2508.15367 https://
Label-Efficient Chest X-ray Diagnosis via Partial CLIP Adaptation
Heet Nitinkumar Dalsania
https://arxiv.org/abs/2507.07254 https://a…
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
Jiacheng Chen, Ramin Mehran, Xuhui Jia, Saining Xie, Sanghyun Woo
https://arxiv.org/abs/2506.17450
In-Context Decision Making for Optimizing Complex AutoML Pipelines
Amir Rezaei Balef, Katharina Eggensperger
https://arxiv.org/abs/2508.13657 https://arxiv…
ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signal
Yucong Zhang, Juan Liu, Ming Li
https://arxiv.org/abs/2508.14689 https://arxiv.org/p…
Towards Unveiling Predictive Uncertainty Vulnerabilities in the Context of the Right to Be Forgotten
Wei Qian, Chenxu Zhao, Yangyi Li, Wenqian Ye, Mengdi Huai
https://arxiv.org/abs/2508.07458
Audio Inpanting using Discrete Diffusion Model
Tali Dror, Iftach Shoham, Moshe Buchris, Oren Gal, Haim Permuter, Gilad Katz, Eliya Nachmani
https://arxiv.org/abs/2507.08333
Projected Coupled Diffusion for Test-Time Constrained Joint Generation
Hao Luan, Yi Xian Goh, See-Kiong Ng, Chun Kai Ling
https://arxiv.org/abs/2508.10531 https://
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Ziyue Li, Yang Li, Tianyi Zhou
https://arxiv.org/abs/2507.07996 https://arxiv.org/pdf/2507.07996 https://arxiv.org/html/2507.07996
arXiv:2507.07996v1 Announce Type: new
Abstract: Can a pretrained neural network adapt its architecture to different inputs without any finetuning? Do we need all layers for simple tasks, and are they adequate for challenging tasks? We found that the layers of a pretrained large language model (LLM) can be manipulated as separate modules to build a better and even shallower model customized for each test sample. In particular, each layer from the pretrained model can be skipped/pruned or repeated multiple times as recurrent neural networks (RNN), and stacked with others in arbitrary orders, yielding a chain-of-layers (CoLa) per sample. This compositional space greatly expands the scope of existing works on looped/recurrent pretrained modules, layer pruning, or early-exit networks. We develop a Monte Carlo Tree Search (MCTS) protocol to explore and identify the optimal CoLa for each sample from math and commonsense reasoning benchmarks. Compared to a static model of a fixed depth, CoLa allows shortcut paths (fast thinking), recurrence of the same layer(s) (slow thinking), and combining both, offering more flexible, dynamic architectures for different inputs. We conduct an extensive analysis of the MCTS-optimized CoLa, which leads to two key findings: (1) For >75% of samples with correct predictions by the original LLM, we can find shorter CoLa, suggesting a large space for improving inference efficiency; (2) For >60% of samples with originally incorrect predictions, we can identify CoLa achieving correct predictions, suggesting a large space of performance enhancement. Our results highlight the shortcomings of using a fixed architecture of pre-trained LLMs for inference on different samples and pave the way to unlock the generalization power of test-time depth adaptation.
toXiv_bot_toot
Cross-lingual Few-shot Learning for Persian Sentiment Analysis with Incremental Adaptation
Farideh Majidi, Ziaeddin Beheshtifard
https://arxiv.org/abs/2507.11634