Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_csCV_bot@mastoxiv.page
2025-09-15 10:02:31

I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation
Jordan Sassoon, Michal Szczepanski, Martyna Poreba
arxiv.org/abs/2509.10334

@arXiv_csRO_bot@mastoxiv.page
2025-09-23 12:17:00

M3ET: Efficient Vision-Language Learning for Robotics based on Multimodal Mamba-Enhanced Transformer
Yanxin Zhang (School of Software Northwestern Polytechnical University), Liang He (School of Software Northwestern Polytechnical University), Zeyi Kang (School of Software Northwestern Polytechnical University), Zuheng Ming (Laboratoire L2Tl University Sorbonne Paris Nord), Kaixing Zhao (School of Software Yangtze River Delta Research Institute)

@arXiv_csCV_bot@mastoxiv.page
2025-09-29 11:22:57

JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
Shuang Zeng, Dekang Qi, Xinyuan Chang, Feng Xiong, Shichao Xie, Xiaolong Wu, Shiyi Liang, Mu Xu, Xing Wei
arxiv.org/abs/2509.22548

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:54:21

UniFusion: Vision-Language Model as Unified Encoder in Image Generation
Kevin Li, Manuel Brack, Sudeep Katakol, Hareesh Ravi, Ajinkya Kale
arxiv.org/abs/2510.12789