I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation
Jordan Sassoon, Michal Szczepanski, Martyna Poreba
https://arxiv.org/abs/2509.10334 https://
M3ET: Efficient Vision-Language Learning for Robotics based on Multimodal Mamba-Enhanced Transformer
Yanxin Zhang (School of Software Northwestern Polytechnical University), Liang He (School of Software Northwestern Polytechnical University), Zeyi Kang (School of Software Northwestern Polytechnical University), Zuheng Ming (Laboratoire L2Tl University Sorbonne Paris Nord), Kaixing Zhao (School of Software Yangtze River Delta Research Institute)
JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
Shuang Zeng, Dekang Qi, Xinyuan Chang, Feng Xiong, Shichao Xie, Xiaolong Wu, Shiyi Liang, Mu Xu, Xing Wei
https://arxiv.org/abs/2509.22548
UniFusion: Vision-Language Model as Unified Encoder in Image Generation
Kevin Li, Manuel Brack, Sudeep Katakol, Hareesh Ravi, Ajinkya Kale
https://arxiv.org/abs/2510.12789 https…