Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation
Liam Collins, Bhuvesh Kumar, Clark Mingxuan Ju, Tong Zhao, Donald Loveland, Leonardo Neves, Neil Shah
https://arxiv.org/abs/2512.17820 https://arxiv.org/pdf/2512.17820 https://arxiv.org/html/2512.17820
arXiv:2512.17820v1 Announce Type: new
Abstract: Modern Sequential Recommendation (SR) models commonly utilize modality features to represent items, motivated in large part by recent advancements in language and vision modeling. To do so, several works completely replace ID embeddings with modality embeddings, claiming that modality embeddings render ID embeddings unnecessary because they can match or even exceed ID embedding performance. On the other hand, many works jointly utilize ID and modality features, but posit that complex fusion strategies, such as multi-stage training and/or intricate alignment architectures, are necessary for this joint utilization. However, underlying both these lines of work is a lack of understanding of the complementarity of ID and modality features. In this work, we address this gap by studying the complementarity of ID- and text-based SR models. We show that these models do learn complementary signals, meaning that either should provide performance gain when used properly alongside the other. Motivated by this, we propose a new SR method that preserves ID-text complementarity through independent model training, then harnesses it through a simple ensembling strategy. Despite this method's simplicity, we show it outperforms several competitive SR baselines, implying that both ID and text features are necessary to achieve state-of-the-art SR performance but complex fusion architectures are not.
toXiv_bot_toot
(PDF, PhD thesis 2018) Improving visual-to-auditory cross-modality information conversions https://eprints.nottingham.ac.uk/55721/ by Shern Shiou Tan, on visual-to-auditory sensory substitution (VASS) devices.
"The integration of visual recognition in parallel with the soundscape will be the…
nnSAM2: nnUNet-Enhanced One-Prompt SAM2 for Few-shot Multi-Modality Segmentation and Composition Analysis of Lumbar Paraspinal Muscles
Zhongyi Zhang, Julie A. Hides, Enrico De Martino, Abdul Joseph Fofanah, Gervase Tuxworth
https://arxiv.org/abs/2510.05555
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Kaisi Guan, Xihua Wang, Zhengfeng Lai, Xin Cheng, Peng Zhang, XiaoJiang Liu, Ruihua Song, Meng Cao
https://arxiv.org/abs/2510.03117
Towards polarization-enhanced PET: Study of random background in polarization-correlated Compton events
Ana Marija Ko\v{z}uljevi\'c, Tomislav Bokuli\'c, Darko Gro\v{s}ev, Siddharth Parashari, Luka Paveli\'c, Marinko Rade, Marijan \v{Z}uvi\'c, Mihael Makek
https://arxiv.org/abs/2510.11504
Replaced article(s) found for eess.AS. https://arxiv.org/list/eess.AS/new
[1/1]:
- Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech E...
Huang-Cheng Chou, Haibin Wu, Hung-yi Lee, Chi-Chun Lee
Replaced article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[5/9]:
- Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining
Chenxi Liu, Tianyi Xiong, Yanshuo Chen, Ruibo Chen, Yihan Wu, Junfeng Guo, Tianyi Zhou, Heng Huang
Replaced article(s) found for math.CT. https://arxiv.org/list/math.CT/new
[1/1]:
- An algebra modality admitting countably many deriving transformations
Jean-Baptiste Vienney
Mull-Tokens: Modality-Agnostic Latent Thinking
Arijit Ray, Ahmed Abdelkader, Chengzhi Mao, Bryan A. Plummer, Kate Saenko, Ranjay Krishna, Leonidas Guibas, Wen-Sheng Chu
https://arxiv.org/abs/2512.10941
Stochastic numerical head phantoms to enable virtual imaging studies of transcranial photoacoustic computed tomography
Hsuan-Kai Huang, Joseph Kuo, Seonyeong Park, Umberto Villa, Lihong V. Wang, Mark A. Anastasio
https://arxiv.org/abs/2510.09758
Replaced article(s) found for cs.MM. https://arxiv.org/list/cs.MM/new
[1/1]:
- Towards Robust and Realible Multimodal Misinformation Recognition with Incomplete Modality
Hengyang Zhou, Yiwei Wei, Jian Yang, Zhenyu Zhang
Crosslisted article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[7/8]:
- Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired Modality Inte...
Tengwei Song, Min Wu, Yuan Fang
TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control
Minkyoung Cho, Ruben Ohana, Christian Jacobsen, Adityan Jothi, Min-Hung Chen, Z. Morley Mao, Ethem Can
https://arxiv.org/abs/2510.09561
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
Lin Lin, Jiefeng Long, Zhihe Wan, Yuchi Wang, Dingkang Yang, Shuang Yang, Yueyang Yao, Xu Chen, Zirui Guo, Shengqiang Li, Weiran Li, Hanyu Li, Yaling Mou, Yan Qiu, Haiyang Yu, Xiao Liang, Hongsheng Li, Chao Feng
https://arxiv.org/abs/2510.12709
Transcribe, Translate, or Transliterate: An Investigation of Intermediate Representations in Spoken Language Models
Tol\'ul\d{o}p\'e \`Og\'unr\`em\'i, Christopher D. Manning, Dan Jurafsky, Karen Livescu
https://arxiv.org/abs/2510.02569
LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection
Benjamin Shiue-Hal Chou, Purvish Jajal, Nick John Eliopoulos, James C. Davis, George K. Thiruvathukal, Kristen Yeon-Ji Yun, Yung-Hsiang Lu
https://arxiv.org/abs/2510.08580
Resolution scaling governs DINOv3 transfer performance in chest radiograph classification
Soroosh Tayebi Arasteh, Mina Shaigan, Christiane Kuhl, Jakob Nikolas Kather, Sven Nebelung, Daniel Truhn
https://arxiv.org/abs/2510.07191
A Novel Helical Thin-Film Flow Diverter: Design, Fabrication, and Computational Assessment of Hemodynamic Performance
Samuel Voss, Philipp Berg, Janneck Stahl, Daniel Behme, Gabor Janiga, Rodrigo Lima de Miranda, Eckhard Quandt, Prasanth Velvaluri
https://arxiv.org/abs/2510.05320
Crosslisted article(s) found for physics.med-ph. https://arxiv.org/list/physics.med-ph/new
[1/1]:
- multimodars: A Rust-powered toolkit for multi-modality cardiac image fusion and registration
Anselm W. Stark, Marc Ilic, Ali Mokhtari, Pooya Mohammadi Kazaj, Christoph Graeni…
Application of a Virtual Imaging Framework for Investigating a Deep Learning-Based Reconstruction Method for 3D Quantitative Photoacoustic Computed Tomography
Refik Mert Cam, Seonyeong Park, Umberto Villa, Mark A. Anastasio
https://arxiv.org/abs/2510.03431