Tootfinder

Opt-in global Mastodon full text search. Join the index!

@Techmeme@techhub.social
2025-10-15 13:12:18

Apple updates the Vision Pro with an M5 chip, rendering 10% more pixels, and a Dual Knit Band for a more comfortable fit, available on October 22 for $3,499 (Apple)
apple.com/newsroom/2025/10/app

@seeingwithsound@mas.to
2025-08-14 15:24:23

The Verge: These smart glasses use AI to help low-vision users #Envision Al…

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 09:58:31

VARCO-VISION-2.0 Technical Report
Young-rok Cha, Jeongho Ju, SunYoung Park, Jong-Hyeon Lee, Younghyun Yu, Youngjune Kim
arxiv.org/abs/2509.10105

@Techmeme@techhub.social
2025-10-15 14:04:42

Apple to start selling Vision Pro-compatible PlayStation VR2 Sense controllers for $250 on November 11 and the $130 Logitech Muse digital pen on October 22 (Zac Hall/9to5Mac)
9to5mac.com/2025/10/15/apple-v

The U.S. Department of Education has pulled funding for programs in eight states
aimed at supporting students who have both hearing and vision loss,
a move that could affect some of the country’s most vulnerable students.
The programs are considered vital in those states but represent only a little over $1 million a year in federal money.
Nonetheless, they got caught in the Trump administration’s attacks on diversity, equity and inclusion,
with an Education Dep…

@heiseonline@social.heise.de
2025-10-15 03:11:00

Withings bringt smarten Blutdruckmesser mit HD-Display zur einfachen Bedienung
Der Farbbildschirm des BPM Vision unterstützt den Nutzer beim Messvorgang von Blutdruck und EKG. Das smarte Gerät mit extra langer Akkulaufzeit kostet 180 Euro.

@mia@hcommons.social
2025-08-13 16:04:08

I read UCL's Vision for Inclusive Innovation and Transformation: 'our vision for a ‘data empowered society’ is one in which these advances enrich our society and enable us to make informed, inclusive decisions about technological advances'

@cosmos4u@scicomm.xyz
2025-10-14 14:36:55

Image reconstruction with the #JWST Interferometer / AMIGO - a Data-Driven Calibration of the JWST Interferometer: arxiv.org/abs/2510.10924 / arxiv.org/abs/2510.09806 -> How we sharpened the James Webb telescope’s vision from a million kilometres away: theconversation.com/how-we-sha -> thread bsky.app/profile/benjaminpope.

@arXiv_csRO_bot@mastoxiv.page
2025-09-16 11:27:26

Enhancing Generalization in Vision-Language-Action Models by Preserving Pretrained Representations
Shresth Grover, Akshay Gopalkrishnan, Bo Ai, Henrik I. Christensen, Hao Su, Xuanlin Li
arxiv.org/abs/2509.11417

@arXiv_csHC_bot@mastoxiv.page
2025-07-16 08:56:51

Static or Temporal? Semantic Scene Simplification to Aid Wayfinding in Immersive Simulations of Bionic Vision
Justin M. Kasowski, Apurv Varshney, Michael Beyeler
arxiv.org/abs/2507.10813

@arXiv_csAI_bot@mastoxiv.page
2025-09-16 10:42:06

Cross-Platform Scaling of Vision-Language-Action Models from Edge to Cloud GPUs
Amir Taherin, Juyi Lin, Arash Akbari, Arman Akbari, Pu Zhao, Weiwei Chen, David Kaeli, Yanzhi Wang
arxiv.org/abs/2509.11480

@macandi@social.heise.de
2025-10-15 13:19:00

Mit Apple M5: MacBook Pro und iPad Pro 2025 sagen Hallo
Der M5 gibt sein Debüt in Macs, iPads und der Vision Pro. Apple verspricht mehr Leistung, primär für KI-Aufgaben. M5 Pro und M5 Max fehlen allerdings.

@arXiv_csCL_bot@mastoxiv.page
2025-09-16 12:18:27

Spec-LLaVA: Accelerating Vision-Language Models with Dynamic Tree-Based Speculative Decoding
Mingxiao Huo, Jiayi Zhang, Hewei Wang, Jinfeng Xu, Zheyu Chen, Huilin Tai, Yijun Chen
arxiv.org/abs/2509.11961

@arXiv_qbioNC_bot@mastoxiv.page
2025-09-16 09:01:17

Residual Gaze Behavior During Navigation in Blindness and Low Vision
Junchi Feng, Fernanda Garcia-Pina, Mahya Beheshti, Todd E Hudson, William Seiple, John-Ross Rizzo
arxiv.org/abs/2509.11530

@arXiv_quantph_bot@mastoxiv.page
2025-10-15 10:19:41

Hybrid Vision Transformer and Quantum Convolutional Neural Network for Image Classification
Mingzhu Wang, Yun Shang
arxiv.org/abs/2510.12291

@arXiv_csCR_bot@mastoxiv.page
2025-07-16 09:59:51

Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities
Yiting Qu, Michael Backes, Yang Zhang
arxiv.org/abs/2507.11155

@arXiv_eessIV_bot@mastoxiv.page
2025-09-16 10:04:07

Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning
Jin Yang, Daniel S. Marcus, Aristeidis Sotiras
arxiv.org/abs/2509.10784

@arXiv_csCV_bot@mastoxiv.page
2025-09-16 12:39:37

Lost in Embeddings: Information Loss in Vision-Language Models
Wenyan Li, Raphael Tang, Chengzu Li, Caiqi Zhang, Ivan Vuli\'c, Anders S{\o}gaard
arxiv.org/abs/2509.11986

@arXiv_csRO_bot@mastoxiv.page
2025-07-16 10:00:41

Uncertainty Aware Mapping for Vision-Based Underwater Robots
Abhimanyu Bhowmik, Mohit Singh, Madhushree Sannigrahi, Martin Ludvigsen, Kostas Alexis
arxiv.org/abs/2507.10991

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:37:38

ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
Yuhang Li, Chenchen Zhang, Ruilin Lv, Ao Liu, Ken Deng, Yuanxing Zhang, Jiaheng Liu, Wiggin Zhou, Bo Zhou
arxiv.org/abs/2510.11498

@arXiv_csIR_bot@mastoxiv.page
2025-09-15 09:26:31

A Research Vision for Web Search on Emerging Topics
Alisa Rieger, Stefan Dietze, Ran Yu
arxiv.org/abs/2509.10212 arxiv.org/pdf/2509.10212…

@ErikJonker@mastodon.social
2025-08-15 11:23:53

This robot is still quite bad at folding towels but think about the amount of computation, the vision, models, algorithms that are necessary to do this, it is very impressive.
#helix #laundry #ai

@Techmeme@techhub.social
2025-10-14 04:40:51

Nvidia says it is donating the Vera Rubin NVL144 server rack architecture to the Open Compute Project and outlines its vision for "gigawatt AI factories" (Mike Wheatley/SiliconANGLE)
siliconangle.com/2025/10/13/nv

@arXiv_csAI_bot@mastoxiv.page
2025-09-16 08:08:46

Understanding AI Evaluation Patterns: How Different GPT Models Assess Vision-Language Descriptions
Sajjad Abdoli, Rudi Cilibrasi, Rima Al-Shikh
arxiv.org/abs/2509.10707

@arXiv_csCL_bot@mastoxiv.page
2025-09-15 09:43:41

Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning
Haiyang Yu, Yuchuan Wu, Fan Shi, Lei Liao, Jinghui Lu, Xiaodong Ge, Han Wang, Minghan Zhuo, Xuecheng Wu, Xiang Fei, Hao Feng, Guozhi Tang, An-Lan Wang, Hanshen Zhu, Yangfan He, Quanhuan Liang, Liyuan Meng, Chao Feng, Can Huang, Jingqun Tang, Bin Li

@arXiv_csCY_bot@mastoxiv.page
2025-09-16 07:33:06

The main factors in student satisfaction with a campus environment: A mixed approach vs. a quantitative approach
Mohammed Eddaou
arxiv.org/abs/2509.10571

@arXiv_csCV_bot@mastoxiv.page
2025-07-16 10:38:31

Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation
Zhen Xu, Hongyu Zhou, Sida Peng, Haotong Lin, Haoyu Guo, Jiahao Shao, Peishan Yang, Qinglin Yang, Sheng Miao, Xingyi He, Yifan Wang, Yue Wang, Ruizhen Hu, Yiyi Liao, Xiaowei Zhou, Hujun Bao
arxiv.org/abs/2507.11540

@arXiv_csHC_bot@mastoxiv.page
2025-09-15 09:09:11

Inclusive by design: Developing Barrier-Free Authentication for Blind and Low Vision Users through the ALIAS Project
Clara Toussaint (CeRCA), Benjamin Chateau (CeRCA), Pierre-Guillaume Gourio-Jewell (CeRCA), Emilie Bonnefoy (CeRCA), Nicolas Louveton (CeRCA)
arxiv.org/abs/2509.10043

@arXiv_csRO_bot@mastoxiv.page
2025-07-16 10:20:41

All Eyes, no IMU: Learning Flight Attitude from Vision Alone
Jesse J. Hagenaars, Stein Stroobants, Sander M. Bohte, Guido C. H. E. De Croon
arxiv.org/abs/2507.11302

@arXiv_eessIV_bot@mastoxiv.page
2025-09-16 10:23:47

The Microwave Rainbow: How Geometry Paints Colours in Microwave Vision
Huizhang Yang
arxiv.org/abs/2509.11099 arxiv.org/pdf/2509.11099

Following public outcry,
the U.S. Department of Education has restored funding for students who have both hearing and vision loss,
about a month after cutting it.
But rather than sending the money directly to the four programs that are part of a national network helping students who are deaf and blind, a condition known as deafblindness,
the department has instead rerouted the grants to a different organization
The Trump administration targeted the programs in …

@macandi@social.heise.de
2025-10-13 11:35:00

Ganze Basketball-Spiele der NBA bald immersiv auf der Vision Pro
Apple wird zusammen mit einem US-Kabelanbieter erstmals komplette Basketballpartien mit der Blackmagic URSA abfilmen. Das Problem sind Copyright-Restriktionen.

@arXiv_csAI_bot@mastoxiv.page
2025-07-16 09:28:11

Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander
Li Wang, Qizhen Wu, Lei Chen
arxiv.org/abs/2507.11079

@seeingwithsound@mas.to
2025-08-11 13:35:32

It is unclear to what extent Wicab's BrainPort Vision Pro tongue display (sensory substitution device) is still on the market wicab.com/brainport-vision-pro because there has been very little news about it since around 2019, judging also from their own website. Note that the BrainPor…

@Techmeme@techhub.social
2025-08-16 06:30:57

Sources say Meta's chaotic culture and lack of vision have led to AI brain drain; Meta strongly denies it has had issues with talent and retention (Rashi Shrivastava/Forbes)
forbes.com/sites/rashishrivast

@arXiv_csRO_bot@mastoxiv.page
2025-10-15 09:13:52

Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Fuhao Li, Wenxuan Song, Han Zhao, Jingbo Wang, Pengxiang Ding, Donglin Wang, Long Zeng, Haoang Li
arxiv.org/abs/2510.12276

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 10:02:31

I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation
Jordan Sassoon, Michal Szczepanski, Martyna Poreba
arxiv.org/abs/2509.10334

@arXiv_csIR_bot@mastoxiv.page
2025-08-15 07:34:32

Bridging Modality Gaps in e-Commerce Products via Vision-Language Alignment
Yipeng Zhang, Hongju Yu, Aritra Mandal, Canran Xu, Qunzhi Zhou, Zhe Wu
arxiv.org/abs/2508.10116

@arXiv_csCR_bot@mastoxiv.page
2025-10-14 12:04:28

TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models
Zonghuan Xu, Xiang Zheng, Xingjun Ma, Yu-Gang Jiang
arxiv.org/abs/2510.10932

@arXiv_csRO_bot@mastoxiv.page
2025-07-16 09:47:31

EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks
Joohwan Seo, Arvind Kruthiventy, Soomi Lee, Megan Teng, Xiang Zhang, Seoyeon Choi, Jongeun Choi, Roberto Horowitz
arxiv.org/abs/2507.10961

@arXiv_csCV_bot@mastoxiv.page
2025-07-16 10:34:41

Implementing Adaptations for Vision AutoRegressive Model
Kaif Shaikh, Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic
arxiv.org/abs/2507.11441

@seeingwithsound@mas.to
2025-09-13 14:32:14

(video) NOVA: Artificial vision using brain implants pbslearningmedia.org/resource/ on the I…

Screenshot from the video, showing blind brain implant recipient Brian Bussard and in the background Phil Troyk.
@arXiv_csRO_bot@mastoxiv.page
2025-08-15 09:05:52

Super LiDAR Reflectance for Robotic Perception
Wei Gao, Jie Zhang, Mingle Zhao, Zhiyuan Zhang, Shu Kong, Maani Ghaffari, Dezhen Song, Cheng-Zhong Xu, Hui Kong
arxiv.org/abs/2508.10398

@arXiv_eessIV_bot@mastoxiv.page
2025-07-16 07:43:31

Comparative Analysis of Vision Transformers and Traditional Deep Learning Approaches for Automated Pneumonia Detection in Chest X-Rays
Gaurav Singh
arxiv.org/abs/2507.10589

@arXiv_csCV_bot@mastoxiv.page
2025-09-16 12:44:47

LoRA-fine-tuned Large Vision Models for Automated Assessment of Post-SBRT Lung Injury
M. Bolhassani, B. Veasey, E. Daugherty, S. Keltner, N. Kumar, N. Dunlap, A. Amini
arxiv.org/abs/2509.12155

@macandi@social.heise.de
2025-08-14 07:41:00

Prozessor-Leaks: Diese SoCs und SiPs plant Apple in den kommenden Geräten
Welche Chips kommen in neue Modelle von HomePod mini, Apple TV, iPad mini und Vision Pro? Was ist mit dem Studio Display 2? Ein Code-Experte findet Hinweise.

@arXiv_csCL_bot@mastoxiv.page
2025-08-14 09:51:32

VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
Lingjie Jiang, Shaohan Huang, Xun Wu, Yixia Li, Dongdong Zhang, Furu Wei
arxiv.org/abs/2508.09945

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 10:01:41

Detecting Text Manipulation in Images using Vision Language Models
Vidit Vidit, Pavel Korshunov, Amir Mohammadi, Christophe Ecabert, Ketan Kotwal, S\'ebastien Marcel
arxiv.org/abs/2509.10278

@arXiv_csRO_bot@mastoxiv.page
2025-07-16 07:46:41

Vision Language Action Models in Robotic Manipulation: A Systematic Review
Muhayy Ud Din, Waseem Akram, Lyes Saad Saoud, Jan Rosell, Irfan Hussain
arxiv.org/abs/2507.10672

@Techmeme@techhub.social
2025-10-15 03:20:53

Reducto, which uses OCR with vision language models to convert complex documents into inputs for LLMs, raised a $75M Series B led by a16z at a $600M valuation (Stephanie Palazzolo/The Information)
theinformation.com/articles/st

@arXiv_csCV_bot@mastoxiv.page
2025-09-16 12:41:27

A Computer Vision Pipeline for Individual-Level Behavior Analysis: Benchmarking on the Edinburgh Pig Dataset
Haiyu Yang, Enhong Liu, Jennifer Sun, Sumit Sharma, Meike van Leerdam, Sebastien Franceschini, Puchun Niu, Miel Hostens
arxiv.org/abs/2509.12047

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:48:01

On the Use of Hierarchical Vision Foundation Models for Low-Cost Human Mesh Recovery and Pose Estimation
Shuhei Tarashima, Yushan Wang, Norio Tagawa
arxiv.org/abs/2510.12660

@arXiv_csRO_bot@mastoxiv.page
2025-08-15 08:43:32

ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
Wenxuan Song, Ziyang Zhou, Han Zhao, Jiayi Chen, Pengxiang Ding, Haodong Yan, Yuxin Huang, Feilong Tang, Donglin Wang, Haoang Li
arxiv.org/abs/2508.10333

@arXiv_csCV_bot@mastoxiv.page
2025-09-16 12:43:47

Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models
Pu Jian, Junhong Wu, Wei Sun, Chen Wang, Shuo Ren, Jiajun Zhang
arxiv.org/abs/2509.12132

@arXiv_csRO_bot@mastoxiv.page
2025-08-15 09:31:42

CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model
Zhuoyuan Yu, Yuxing Long, Zihan Yang, Chengyan Zeng, Hongwei Fan, Jiyao Zhang, Hao Dong
arxiv.org/abs/2508.10416

@Techmeme@techhub.social
2025-10-15 13:17:49

Apple debuts its M5 chip, with a 10-core GPU, a Neural Accelerator in each core, enabling 4x the performance of M4, and a 10-core CPU with six efficiency cores (Hartley Charlton/MacRumors)
macrumors.com/2025/10/15/apple

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:47:01

Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space
Chao Chen, Zhixin Ma, Yongqi Li, Yupeng Hu, Yinwei Wei, Wenjie Li, Liqiang Nie
arxiv.org/abs/2510.12603

@arXiv_csRO_bot@mastoxiv.page
2025-10-15 09:52:31

Two-stream network-driven vision-based tactile sensor for object feature extraction and fusion perception
Muxing Huang, Zibin Chen, Weiliang Xu, Zilan Li, Yuanzhi Zhou, Guoyuan Zhou, Wenjing Chen, Xinming Li
arxiv.org/abs/2510.12528

@arXiv_csCV_bot@mastoxiv.page
2025-08-15 10:23:22

From Diagnosis to Improvement: Probing Spatio-Physical Reasoning in Vision Language Models
Tiancheng Han, Yunfei Gao, Yong Li, Wuzhou Yu, Qiaosheng Zhang, Wenqi Shao
arxiv.org/abs/2508.10770

@arXiv_csRO_bot@mastoxiv.page
2025-09-16 11:00:17

DreamNav: A Trajectory-Based Imaginative Framework for Zero-Shot Vision-and-Language Navigation
Yunheng Wang, Yuetong Fang, Taowen Wang, Yixiao Feng, Yawen Tan, Shuning Zhang, Peiran Liu, Yiding Ji, Renjing Xu
arxiv.org/abs/2509.11197

@arXiv_csRO_bot@mastoxiv.page
2025-09-15 09:40:41

GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation
Hang Yin, Haoyu Wei, Xiuwei Xu, Wenxuan Guo, Jie Zhou, Jiwen Lu
arxiv.org/abs/2509.10454

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:54:31

ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Long Cui, Weiyun Wang, Jie Shao, Zichen Wen, Gen Luo, Linfeng Zhang, Yanting Zhang, Yu Qiao, Wenhai Wang
arxiv.org/abs/2510.12793

@Techmeme@techhub.social
2025-10-10 19:15:50

Some LA Lakers games will be live streamed in the Apple Immersive format in the NBA app and the Spectrum SportsNet app on Vision Pro during the 2025-26 season (Jacob Krol/TechRadar)
techradar.com/streaming/entert…

@arXiv_csRO_bot@mastoxiv.page
2025-07-16 10:23:01

From Production Logistics to Smart Manufacturing: The Vision for a New RoboCup Industrial League
Supun Dissanayaka, Alexander Ferrein, Till Hofmann, Kosuke Nakajima, Mario Sanz-Lopez, Jesus Savage, Daniel Swoboda, Matteo Tschesche, Wataru Uemura, Tarik Viehmann, Shohei Yasuda
arxiv.org/abs/2507.11402

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 12:04:07

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[3/3]:
- Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation
Li, Du, Yu, Li, Zhao, Liu, Jiang, Zhu, Huang

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:54:21

UniFusion: Vision-Language Model as Unified Encoder in Image Generation
Kevin Li, Manuel Brack, Sudeep Katakol, Hareesh Ravi, Ajinkya Kale
arxiv.org/abs/2510.12789

@arXiv_csCV_bot@mastoxiv.page
2025-08-15 10:20:22

Beyond conventional vision: RGB-event fusion for robust object detection in dynamic traffic scenarios
Zhanwen Liu, Yujing Sun, Yang Wang, Nan Yang, Shengbo Eben Li, Xiangmo Zhao
arxiv.org/abs/2508.10704

@arXiv_csRO_bot@mastoxiv.page
2025-08-15 08:50:22

Few-shot Vision-based Human Activity Recognition with MLLM-based Visual Reinforcement Learning
Wenqi Zheng, Yutaka Arakawa
arxiv.org/abs/2508.10371

@arXiv_csCV_bot@mastoxiv.page
2025-09-16 12:44:07

3DViT-GAT: A Unified Atlas-Based 3D Vision Transformer and Graph Learning Framework for Major Depressive Disorder Detection Using Structural MRI Data
Nojod M. Alotaibi, Areej M. Alhothali, Manar S. Ali
arxiv.org/abs/2509.12143

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:49:41

Personalized Federated Fine-Tuning of Vision Foundation Models for Healthcare
Adam Tupper, Christian Gagn\'e
arxiv.org/abs/2510.12741 a…

@arXiv_csRO_bot@mastoxiv.page
2025-09-16 11:57:47

Igniting VLMs toward the Embodied Space
Andy Zhai, Brae Liu, Bruno Fang, Chalse Cai, Ellie Ma, Ethan Yin, Hao Wang, Hugo Zhou, James Wang, Lights Shi, Lucy Liang, Make Wang, Qian Wang, Roy Gan, Ryan Yu, Shalfun Li, Starrick Liu, Sylas Chen, Vincent Chen, Zach Xu
arxiv.org/abs/2509.11766

@arXiv_csCV_bot@mastoxiv.page
2025-09-16 12:44:17

Open-ended Hierarchical Streaming Video Understanding with Vision Language Models
Hyolim Kang, Yunsu Park, Youngbeom Yoo, Yeeun Choi, Seon Joo Kim
arxiv.org/abs/2509.12145

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 07:53:01

Data or Language Supervision: What Makes CLIP Better than DINO?
Yiming Liu, Yuhui Zhang, Dhruba Ghosh, Ludwig Schmidt, Serena Yeung-Levy
arxiv.org/abs/2510.11835

@arXiv_csRO_bot@mastoxiv.page
2025-09-16 12:08:27

Embodied Navigation Foundation Model
Jiazhao Zhang, Anqi Li, Yunpeng Qi, Minghan Li, Jiahang Liu, Shaoan Wang, Haoran Liu, Gengze Zhou, Yuze Wu, Xingxing Li, Yuxin Fan, Wenjun Li, Zhibo Chen, Fei Gao, Qi Wu, Zhizheng Zhang, He Wang
arxiv.org/abs/2509.12129

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 10:02:01

Adversarial robustness through Lipschitz-Guided Stochastic Depth in Neural Networks
Laith Nayal, Mahmoud Mousatat, Bader Rasheed
arxiv.org/abs/2509.10298

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:51:01

E-MoFlow: Learning Egomotion and Optical Flow from Event Data via Implicit Regularization
Wenpu Li, Bangyan Liao, Yi Zhou, Qi Xu, Pian Wan, Peidong Liu
arxiv.org/abs/2510.12753

@arXiv_csRO_bot@mastoxiv.page
2025-08-14 09:35:42

Vision-driven River Following of UAV via Safe Reinforcement Learning using Semantic Dynamics Model
Zihan Wang, Nina Mahmoudian
arxiv.org/abs/2508.09971

@arXiv_csCV_bot@mastoxiv.page
2025-08-12 08:43:23

Large Language Models Facilitate Vision Reflection in Image Classification
Guoyuan An, JaeYoon Kim, SungEui Yoon
arxiv.org/abs/2508.06525 a…

@arXiv_csCV_bot@mastoxiv.page
2025-08-14 10:16:22

LLMC : Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit
Chengtao Lv, Bilang Zhang, Yang Yong, Ruihao Gong, Yushi Huang, Shiqiao Gu, Jiajun Wu, Yumeng Shi, Jinyang Guo, Wenya Wang
arxiv.org/abs/2508.09981

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 13:46:08

EvoCAD: Evolutionary CAD Code Generation with Vision Language Models
Tobias Preintner, Weixuan Yuan, Adrian K\"onig, Thomas B\"ack, Elena Raponi, Niki van Stein
arxiv.org/abs/2510.11631

@arXiv_csCV_bot@mastoxiv.page
2025-08-14 09:38:22

Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users
Jeffri Murrugarra-LLerena, Haoran Niu, K. Suzanne Barber, Hal Daum\'e III, Yang Trista Cao, Paola Cascante-Bonilla
arxiv.org/abs/2508.09245

@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:41:30

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation
Shaoqi Dong, Chaoyou Fu, Haihan Gao, Yi-Fan Zhang, Chi Yan, Chu Wu, Xiaoyu Liu, Yunhang Shen, Jing Huo, Deqiang Jiang, Haoyu Cao, Yang Gao, Xing Sun, Ran He, Caifeng Shan
arxiv.org/abs/2510.09607

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 22:05:25

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[8/8]:
- TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor Cores
Liao, Ding, Cui, Gong, Hu, Wang, Li, Zhang, Wang, Fu

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 22:05:05

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[7/8]:
- MultiCOIN: Multi-Modal COntrollable Video INbetweening
Tanveer, Zhou, Niklaus, Amiri, Zhang, Singh, Zhao

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 22:04:45

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[6/8]:
- GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning
Mustansar Fiaz, Hiyam Debary, Paolo Fraccaro, Danda Paudel, Luc Van Gool, Fahad Khan, Salman Khan

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 22:04:25

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[5/8]:
- Context Guided Transformer Entropy Modeling for Video Compression
Junlong Tong, Wei Zhang, Yaohui Jin, Xiaoyu Shen

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 22:04:05

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[4/8]:
- Boosting Adversarial Transferability via Commonality-Oriented Gradient Optimization
Yanting Gao, Yepeng Liu, Junming Liu, Qi Zhang, Hongyun Zhang, Duoqian Miao, Cairong Zhao

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 22:03:45

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[3/8]:
- Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou, Feng Hong, Jiaan Luo, Jiangchao Yao, Dongsheng Li, Bo Han, Ya Zhang, Yanfeng Wang

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 22:03:25

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[2/8]:
- Multimodal Alignment and Fusion: A Survey
Songtao Li, Hao Tang

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 22:03:06

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[1/8]:
- Invariant Feature Learning for Generalized Long-Tailed Classification
Kaihua Tang, Mingyuan Tao, Jiaxin Qi, Zhenguang Liu, Hanwang Zhang

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 16:15:04

Crosslisted article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[3/3]:
- Adversarial Attacks Leverage Interference Between Features in Superposition
Edward Stevinson, Lucas Prieto, Melih Barsbey, Tolga Birdal

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 16:14:50

Crosslisted article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[2/3]:
- ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 16:14:34

Crosslisted article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[1/3]:
- Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
Rinaldi, Panariello, Salici, Liu, Ciccone, Porrello, Calderara

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 14:34:41

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[5/5]:
- Modular Embedding Recomposition for Incremental Learning
Panariello, Frascaroli, Buzzega, Bonicelli, Porrello, Calderara

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 14:34:29

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[4/5]:
- J-RAS: Enhancing Medical Image Segmentation via Retrieval-Augmented Joint Training
Salma J. Ahmed, Emad A. Mohammed, Azam Asilian Bidgoli

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 14:34:17

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[3/5]:
- STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes
Keishi Ishihara, Kento Sasaki, Tsubasa Takahashi, Daiki Shiono, Yu Yamaguchi

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 14:34:05

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[2/5]:
- Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) ...
Marin, et al.

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 14:33:53

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[1/5]:
- Enhancing Representations through Heterogeneous Self-Supervised Learning
Zhong-Yu Li, Bo-Wen Yin, Yongxiang Liu, Li Liu, Ming-Ming Cheng

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 11:50:48

Crosslisted article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[1/1]:
- SeeingSounds: Learning Audio-to-Visual Alignment via Text
Carnemolla, Pennisi, Russo, Palazzo, Giordano, Spampinato

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 12:03:55

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[2/3]:
- Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image
Pufan Li, Bi'an Du, Wei Hu