
2025-06-17 09:56:33
Device-Cloud Collaborative Correction for On-Device Recommendation
Tianyu Zhan, Shengyu Zhang, Zheqi Lv, Jieming Zhu, Jiwei Li, Fan Wu, Fei Wu
https://arxiv.org/abs/2506.12687
Device-Cloud Collaborative Correction for On-Device Recommendation
Tianyu Zhan, Shengyu Zhang, Zheqi Lv, Jieming Zhu, Jiwei Li, Fan Wu, Fei Wu
https://arxiv.org/abs/2506.12687
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
Luke Rivard, Sun Sun, Hongyu Guo, Wenhu Chen, Yuntian Deng
https://arxiv.org/abs/2507.08800
Towards Bio-Inspired Robotic Trajectory Planning via Self-Supervised RNN
Miroslav Cibula, Krist\'ina Malinovsk\'a, Matthias Kerzel
https://arxiv.org/abs/2507.02171
This https://arxiv.org/abs/2505.22083 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_qu…
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Ziyue Li, Yang Li, Tianyi Zhou
https://arxiv.org/abs/2507.07996 https://arxiv.org/pdf/2507.07996 https://arxiv.org/html/2507.07996
arXiv:2507.07996v1 Announce Type: new
Abstract: Can a pretrained neural network adapt its architecture to different inputs without any finetuning? Do we need all layers for simple tasks, and are they adequate for challenging tasks? We found that the layers of a pretrained large language model (LLM) can be manipulated as separate modules to build a better and even shallower model customized for each test sample. In particular, each layer from the pretrained model can be skipped/pruned or repeated multiple times as recurrent neural networks (RNN), and stacked with others in arbitrary orders, yielding a chain-of-layers (CoLa) per sample. This compositional space greatly expands the scope of existing works on looped/recurrent pretrained modules, layer pruning, or early-exit networks. We develop a Monte Carlo Tree Search (MCTS) protocol to explore and identify the optimal CoLa for each sample from math and commonsense reasoning benchmarks. Compared to a static model of a fixed depth, CoLa allows shortcut paths (fast thinking), recurrence of the same layer(s) (slow thinking), and combining both, offering more flexible, dynamic architectures for different inputs. We conduct an extensive analysis of the MCTS-optimized CoLa, which leads to two key findings: (1) For >75% of samples with correct predictions by the original LLM, we can find shorter CoLa, suggesting a large space for improving inference efficiency; (2) For >60% of samples with originally incorrect predictions, we can identify CoLa achieving correct predictions, suggesting a large space of performance enhancement. Our results highlight the shortcomings of using a fixed architecture of pre-trained LLMs for inference on different samples and pave the way to unlock the generalization power of test-time depth adaptation.
toXiv_bot_toot
Hybrid Approach for Electricity Price Forecasting using AlexNet and LSTM
Bosubabu Sambana, Kotamsetty Geethika Devi, Bandi Rajeswara Reddy, Galeti Mohammad Hussain, Gownivalla Siddartha
https://arxiv.org/abs/2506.23504
Time Resolution Independent Operator Learning
Diab W. Abueidda, Mbebo Nonna, Panos Pantidis, Mostafa E. Mobasher
https://arxiv.org/abs/2507.02524 https://
Sequence-to-Sequence Models with Attention Mechanistically Map to the Architecture of Human Memory Search
Nikolaus Salvatore, Qiong Zhang
https://arxiv.org/abs/2506.17424