Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csLG_bot@mastoxiv.page
2025-12-22 10:34:10

Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation
Liam Collins, Bhuvesh Kumar, Clark Mingxuan Ju, Tong Zhao, Donald Loveland, Leonardo Neves, Neil Shah
arxiv.org/abs/2512.17820 arxiv.org/pdf/2512.17820 arxiv.org/html/2512.17820
arXiv:2512.17820v1 Announce Type: new
Abstract: Modern Sequential Recommendation (SR) models commonly utilize modality features to represent items, motivated in large part by recent advancements in language and vision modeling. To do so, several works completely replace ID embeddings with modality embeddings, claiming that modality embeddings render ID embeddings unnecessary because they can match or even exceed ID embedding performance. On the other hand, many works jointly utilize ID and modality features, but posit that complex fusion strategies, such as multi-stage training and/or intricate alignment architectures, are necessary for this joint utilization. However, underlying both these lines of work is a lack of understanding of the complementarity of ID and modality features. In this work, we address this gap by studying the complementarity of ID- and text-based SR models. We show that these models do learn complementary signals, meaning that either should provide performance gain when used properly alongside the other. Motivated by this, we propose a new SR method that preserves ID-text complementarity through independent model training, then harnesses it through a simple ensembling strategy. Despite this method's simplicity, we show it outperforms several competitive SR baselines, implying that both ID and text features are necessary to achieve state-of-the-art SR performance but complex fusion architectures are not.
toXiv_bot_toot

@arXiv_mathCT_bot@mastoxiv.page
2025-10-07 07:45:06

An algebra modality admitting countably many deriving transformations
Jean-Baptiste Vienney
arxiv.org/abs/2510.03953 arxiv.org/pdf/2510.039…

@arXiv_csAI_bot@mastoxiv.page
2025-10-15 10:15:31

Artificial Intelligence Virtual Cells: From Measurements to Decisions across Modality, Scale, Dynamics, and Evaluation
Chengpeng Hu, Calvin Yu-Chian Chen
arxiv.org/abs/2510.12498

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 10:24:51

Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language Models
Bajian Xiang, Shuaijiang Zhao, Tingwei Guo, Wei Zou
arxiv.org/abs/2510.12116

@arXiv_csSD_bot@mastoxiv.page
2025-10-14 11:34:48

Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap
KiHyun Nam, Jongmin Choi, Hyeongkeun Lee, Jungwoo Heo, Joon Son Chung
arxiv.org/abs/2510.11330

@seeingwithsound@mas.to
2025-12-13 14:33:38

(PDF, PhD thesis 2018) Improving visual-to-auditory cross-modality information conversions eprints.nottingham.ac.uk/55721/ by Shern Shiou Tan, on visual-to-auditory sensory substitution (VASS) devices.
"The integration of visual recognition in parallel with the soundscape will be the…

@arXiv_csCV_bot@mastoxiv.page
2025-10-06 10:01:39

Med-K2N: Flexible K-to-N Modality Translation for Medical Image Synthesis
Feng Yuan, Yifan Gao, Yuehua Ye, Haoyue Li, Xin Gao
arxiv.org/abs/2510.02815

@arXiv_csIR_bot@mastoxiv.page
2025-10-14 10:00:58

Self-Supervised Representation Learning with ID-Content Modality Alignment for Sequential Recommendation
Donglin Zhou, Weike Pan, Zhong Ming
arxiv.org/abs/2510.10556

@arXiv_csGR_bot@mastoxiv.page
2025-10-13 07:52:00

A 3D Generation Framework from Cross Modality to Parameterized Primitive
Yiming Liang, Huan Yu, Zili Wang, Shuyou Zhang, Guodong Yi, Jin Wang, Jianrong Tan
arxiv.org/abs/2510.08656

@arXiv_csLG_bot@mastoxiv.page
2025-10-09 10:41:51

Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired Modality Integration
Tengwei Song, Min Wu, Yuan Fang
arxiv.org/abs/2510.07035

@arXiv_mathAG_bot@mastoxiv.page
2025-10-06 08:46:09

Tjurina Number Jumps and Unimodal Hypersurface Singularities in Positive Characteristic
Hongrui Ma, Aoyu Ying, Huaiqing Zuo
arxiv.org/abs/2510.02619

@arXiv_csMM_bot@mastoxiv.page
2025-10-08 07:41:59

Towards Robust and Realible Multimodal Fake News Detection with Incomplete Modality
Hengyang Zhou, Yiwei Wei, Jian Yang, Zhenyu Zhang
arxiv.org/abs/2510.05839

@arXiv_eessIV_bot@mastoxiv.page
2025-10-08 09:24:39

nnSAM2: nnUNet-Enhanced One-Prompt SAM2 for Few-shot Multi-Modality Segmentation and Composition Analysis of Lumbar Paraspinal Muscles
Zhongyi Zhang, Julie A. Hides, Enrico De Martino, Abdul Joseph Fofanah, Gervase Tuxworth
arxiv.org/abs/2510.05555

@arXiv_astrophGA_bot@mastoxiv.page
2025-10-10 10:02:09

Multi-modal Foundation Model for Cosmological Simulation Data
Bin Xia, Nesar Ramachandra, Azton I. Wells, Salman Habib, John Wise
arxiv.org/abs/2510.07684

@arXiv_csCV_bot@mastoxiv.page
2025-10-06 10:10:39

Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Kaisi Guan, Xihua Wang, Zhengfeng Lai, Xin Cheng, Peng Zhang, XiaoJiang Liu, Ruihua Song, Meng Cao
arxiv.org/abs/2510.03117

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:18:22

Imperceptible Jailbreaking against Large Language Models
Kuofeng Gao, Yiming Li, Chao Du, Xin Wang, Xingjun Ma, Shu-Tao Xia, Tianyu Pang
arxiv.org/abs/2510.05025

@arXiv_csHC_bot@mastoxiv.page
2025-10-09 10:11:11

Exploring the Feasibility of Gaze-Based Navigation Across Path Types
Yichuan Zhang, Liangyuting Zhang, Xuning Hu, Yong Yue, Hai-Ning Liang
arxiv.org/abs/2510.07184

@arXiv_mathAG_bot@mastoxiv.page
2025-10-09 09:20:41

Classification of Lipschitz unimodal function germs
Nhan Nguyen, Maria Ruas, Saurabh Trivedi
arxiv.org/abs/2510.06792 arxiv.org/pdf/2510.06…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:16:09

Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
Sharut Gupta, Shobhita Sundaram, Chenyu Wang, Stefanie Jegelka, Phillip Isola
arxiv.org/abs/2510.08492

@arXiv_csSD_bot@mastoxiv.page
2025-10-10 08:33:28

Personality-Enhanced Multimodal Depression Detection in the Elderly
Honghong Wang, Jing Deng, Rong Zheng
arxiv.org/abs/2510.08004 arxiv.org…

@arXiv_physicsinsdet_bot@mastoxiv.page
2025-10-14 10:38:38

Towards polarization-enhanced PET: Study of random background in polarization-correlated Compton events
Ana Marija Ko\v{z}uljevi\'c, Tomislav Bokuli\'c, Darko Gro\v{s}ev, Siddharth Parashari, Luka Paveli\'c, Marinko Rade, Marijan \v{Z}uvi\'c, Mihael Makek
arxiv.org/abs/2510.11504

@arXiv_eessSP_bot@mastoxiv.page
2025-10-07 10:35:12

Efficient Domain Generalization in Wireless Networks with Scarce Multi-Modal Data
Minsu Kim, Walid Saad, Dour Calin
arxiv.org/abs/2510.04359

@arXiv_eessAS_bot@mastoxiv.page
2025-10-15 12:46:19

Replaced article(s) found for eess.AS. arxiv.org/list/eess.AS/new
[1/1]:
- Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech E...
Huang-Cheng Chou, Haibin Wu, Hung-yi Lee, Chi-Chun Lee

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 16:37:29

Replaced article(s) found for cs.AI. arxiv.org/list/cs.AI/new
[5/9]:
- Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining
Chenxi Liu, Tianyi Xiong, Yanshuo Chen, Ruibo Chen, Yihan Wu, Junfeng Guo, Tianyi Zhou, Heng Huang

@arXiv_mathCT_bot@mastoxiv.page
2025-10-08 12:59:36

Replaced article(s) found for math.CT. arxiv.org/list/math.CT/new
[1/1]:
- An algebra modality admitting countably many deriving transformations
Jean-Baptiste Vienney

@arXiv_csIR_bot@mastoxiv.page
2025-10-14 11:39:48

Characterizing Web Search in The Age of Generative AI
Elisabeth Kirsten, Jost Grosse Perdekamp, Mihir Upadhyay, Krishna P. Gummadi, Muhammad Bilal Zafar
arxiv.org/abs/2510.11560

@arXiv_csCV_bot@mastoxiv.page
2025-12-12 10:44:10

Mull-Tokens: Modality-Agnostic Latent Thinking
Arijit Ray, Ahmed Abdelkader, Chengzhi Mao, Bryan A. Plummer, Kate Saenko, Ranjay Krishna, Leonidas Guibas, Wen-Sheng Chu
arxiv.org/abs/2512.10941

@arXiv_physicsmedph_bot@mastoxiv.page
2025-10-14 09:18:48

Stochastic numerical head phantoms to enable virtual imaging studies of transcranial photoacoustic computed tomography
Hsuan-Kai Huang, Joseph Kuo, Seonyeong Park, Umberto Villa, Lihong V. Wang, Mark A. Anastasio
arxiv.org/abs/2510.09758

@arXiv_csMM_bot@mastoxiv.page
2025-10-15 12:37:23

Replaced article(s) found for cs.MM. arxiv.org/list/cs.MM/new
[1/1]:
- Towards Robust and Realible Multimodal Misinformation Recognition with Incomplete Modality
Hengyang Zhou, Yiwei Wei, Jian Yang, Zhenyu Zhang

@arXiv_csAI_bot@mastoxiv.page
2025-10-09 12:43:40

Crosslisted article(s) found for cs.AI. arxiv.org/list/cs.AI/new
[7/8]:
- Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired Modality Inte...
Tengwei Song, Min Wu, Yuan Fang

@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:38:20

TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control
Minkyoung Cho, Ruben Ohana, Christian Jacobsen, Adityan Jothi, Min-Hung Chen, Z. Morley Mao, Ethem Can
arxiv.org/abs/2510.09561

@arXiv_csIR_bot@mastoxiv.page
2025-10-15 09:56:41

SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
Lin Lin, Jiefeng Long, Zhihe Wan, Yuchi Wang, Dingkang Yang, Shuang Yang, Yueyang Yao, Xu Chen, Zirui Guo, Shengqiang Li, Weiran Li, Hanyu Li, Yaling Mou, Yan Qiu, Haiyang Yu, Xiao Liang, Hongsheng Li, Chao Feng
arxiv.org/abs/2510.12709

@arXiv_csCL_bot@mastoxiv.page
2025-10-06 10:10:59

Transcribe, Translate, or Transliterate: An Investigation of Intermediate Representations in Spoken Language Models
Tol\'ul\d{o}p\'e \`Og\'unr\`em\'i, Christopher D. Manning, Dan Jurafsky, Karen Livescu
arxiv.org/abs/2510.02569

@arXiv_csSD_bot@mastoxiv.page
2025-10-13 08:08:00

LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection
Benjamin Shiue-Hal Chou, Purvish Jajal, Nick John Eliopoulos, James C. Davis, George K. Thiruvathukal, Kristen Yeon-Ji Yun, Yung-Hsiang Lu
arxiv.org/abs/2510.08580

@arXiv_csAI_bot@mastoxiv.page
2025-10-08 07:46:29

Representation Potentials of Foundation Models for Multimodal Alignment: A Survey
Jianglin Lu, Hailing Wang, Yi Xu, Yizhou Wang, Kuo Yang, Yun Fu
arxiv.org/abs/2510.05184

@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:35:30

D-TPT: Dimensional Entropy Maximization for Calibrating Test-Time Prompt Tuning in Vision-Language Models
Jisu Han, Wonjun Hwang
arxiv.org/abs/2510.09473

@arXiv_csIR_bot@mastoxiv.page
2025-10-14 11:05:58

Decoupled Multimodal Fusion for User Interest Modeling in Click-Through Rate Prediction
Alin Fan, Hanqing Li, Sihan Lu, Jingsong Yuan, Jiandong Zhang
arxiv.org/abs/2510.11066

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:20:52

Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization
Omri Uzan, Asaf Yehudai, Roi pony, Eyal Shnarch, Ariel Gera
arxiv.org/abs/2510.05038

@arXiv_csCV_bot@mastoxiv.page
2025-10-09 10:44:31

Resolution scaling governs DINOv3 transfer performance in chest radiograph classification
Soroosh Tayebi Arasteh, Mina Shaigan, Christiane Kuhl, Jakob Nikolas Kather, Sven Nebelung, Daniel Truhn
arxiv.org/abs/2510.07191

@arXiv_csAI_bot@mastoxiv.page
2025-10-06 09:46:49

Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models
Tianren Ma, Mu Zhang, Yibing Wang, Qixiang Ye
arxiv.org/abs/2510.02880

@arXiv_physicsmedph_bot@mastoxiv.page
2025-10-08 08:16:19

A Novel Helical Thin-Film Flow Diverter: Design, Fabrication, and Computational Assessment of Hemodynamic Performance
Samuel Voss, Philipp Berg, Janneck Stahl, Daniel Behme, Gabor Janiga, Rodrigo Lima de Miranda, Eckhard Quandt, Prasanth Velvaluri
arxiv.org/abs/2510.05320

@arXiv_csCV_bot@mastoxiv.page
2025-10-09 10:21:01

Semantic Segmentation Algorithm Based on Light Field and LiDAR Fusion
Jie Luo, Yuxuan Jiang, Xin Jin, Mingyu Liu, Yihui Fan
arxiv.org/abs/2510.06687

@arXiv_physicsmedph_bot@mastoxiv.page
2025-10-09 11:45:37

Crosslisted article(s) found for physics.med-ph. arxiv.org/list/physics.med-ph/
[1/1]:
- multimodars: A Rust-powered toolkit for multi-modality cardiac image fusion and registration
Anselm W. Stark, Marc Ilic, Ali Mokhtari, Pooya Mohammadi Kazaj, Christoph Graeni…

@arXiv_physicsmedph_bot@mastoxiv.page
2025-10-07 08:40:22

Application of a Virtual Imaging Framework for Investigating a Deep Learning-Based Reconstruction Method for 3D Quantitative Photoacoustic Computed Tomography
Refik Mert Cam, Seonyeong Park, Umberto Villa, Mark A. Anastasio
arxiv.org/abs/2510.03431