Apple updates the Vision Pro with an M5 chip, rendering 10% more pixels, and a Dual Knit Band for a more comfortable fit, available on October 22 for $3,499 (Apple)
https://www.apple.com/newsroom/2025/10/apple-vision-pro-upgraded-with-th…
Apple to start selling Vision Pro-compatible PlayStation VR2 Sense controllers for $250 on November 11 and the $130 Logitech Muse digital pen on October 22 (Zac Hall/9to5Mac)
https://9to5mac.com/2025/10/15/apple-v
The U.S. Department of Education has pulled funding for programs in eight states
aimed at supporting students who have both hearing and vision loss,
a move that could affect some of the country’s most vulnerable students.
The programs are considered vital in those states but represent only a little over $1 million a year in federal money.
Nonetheless, they got caught in the Trump administration’s attacks on diversity, equity and inclusion,
with an Education Dep…
Withings bringt smarten Blutdruckmesser mit HD-Display zur einfachen Bedienung
Der Farbbildschirm des BPM Vision unterstützt den Nutzer beim Messvorgang von Blutdruck und EKG. Das smarte Gerät mit extra langer Akkulaufzeit kostet 180 Euro.
I read UCL's Vision for Inclusive Innovation and Transformation: 'our vision for a ‘data empowered society’ is one in which these advances enrich our society and enable us to make informed, inclusive decisions about technological advances' ht…
Enhancing Generalization in Vision-Language-Action Models by Preserving Pretrained Representations
Shresth Grover, Akshay Gopalkrishnan, Bo Ai, Henrik I. Christensen, Hao Su, Xuanlin Li
https://arxiv.org/abs/2509.11417
Static or Temporal? Semantic Scene Simplification to Aid Wayfinding in Immersive Simulations of Bionic Vision
Justin M. Kasowski, Apurv Varshney, Michael Beyeler
https://arxiv.org/abs/2507.10813
Cross-Platform Scaling of Vision-Language-Action Models from Edge to Cloud GPUs
Amir Taherin, Juyi Lin, Arash Akbari, Arman Akbari, Pu Zhao, Weiwei Chen, David Kaeli, Yanzhi Wang
https://arxiv.org/abs/2509.11480
Mit Apple M5: MacBook Pro und iPad Pro 2025 sagen Hallo
Der M5 gibt sein Debüt in Macs, iPads und der Vision Pro. Apple verspricht mehr Leistung, primär für KI-Aufgaben. M5 Pro und M5 Max fehlen allerdings.
https://w…
Spec-LLaVA: Accelerating Vision-Language Models with Dynamic Tree-Based Speculative Decoding
Mingxiao Huo, Jiayi Zhang, Hewei Wang, Jinfeng Xu, Zheyu Chen, Huilin Tai, Yijun Chen
https://arxiv.org/abs/2509.11961
Residual Gaze Behavior During Navigation in Blindness and Low Vision
Junchi Feng, Fernanda Garcia-Pina, Mahya Beheshti, Todd E Hudson, William Seiple, John-Ross Rizzo
https://arxiv.org/abs/2509.11530
Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning
Jin Yang, Daniel S. Marcus, Aristeidis Sotiras
https://arxiv.org/abs/2509.10784
Lost in Embeddings: Information Loss in Vision-Language Models
Wenyan Li, Raphael Tang, Chengzu Li, Caiqi Zhang, Ivan Vuli\'c, Anders S{\o}gaard
https://arxiv.org/abs/2509.11986
Uncertainty Aware Mapping for Vision-Based Underwater Robots
Abhimanyu Bhowmik, Mohit Singh, Madhushree Sannigrahi, Martin Ludvigsen, Kostas Alexis
https://arxiv.org/abs/2507.10991
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
Yuhang Li, Chenchen Zhang, Ruilin Lv, Ao Liu, Ken Deng, Yuanxing Zhang, Jiaheng Liu, Wiggin Zhou, Bo Zhou
https://arxiv.org/abs/2510.11498
This robot is still quite bad at folding towels but think about the amount of computation, the vision, models, algorithms that are necessary to do this, it is very impressive.
#helix #laundry #ai
Nvidia says it is donating the Vera Rubin NVL144 server rack architecture to the Open Compute Project and outlines its vision for "gigawatt AI factories" (Mike Wheatley/SiliconANGLE)
https://siliconangle.com/2025/10/13/nv
Understanding AI Evaluation Patterns: How Different GPT Models Assess Vision-Language Descriptions
Sajjad Abdoli, Rudi Cilibrasi, Rima Al-Shikh
https://arxiv.org/abs/2509.10707 …
Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning
Haiyang Yu, Yuchuan Wu, Fan Shi, Lei Liao, Jinghui Lu, Xiaodong Ge, Han Wang, Minghan Zhuo, Xuecheng Wu, Xiang Fei, Hao Feng, Guozhi Tang, An-Lan Wang, Hanshen Zhu, Yangfan He, Quanhuan Liang, Liyuan Meng, Chao Feng, Can Huang, Jingqun Tang, Bin Li
https://
Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation
Zhen Xu, Hongyu Zhou, Sida Peng, Haotong Lin, Haoyu Guo, Jiahao Shao, Peishan Yang, Qinglin Yang, Sheng Miao, Xingyi He, Yifan Wang, Yue Wang, Ruizhen Hu, Yiyi Liao, Xiaowei Zhou, Hujun Bao
https://arxiv.org/abs/2507.11540
Inclusive by design: Developing Barrier-Free Authentication for Blind and Low Vision Users through the ALIAS Project
Clara Toussaint (CeRCA), Benjamin Chateau (CeRCA), Pierre-Guillaume Gourio-Jewell (CeRCA), Emilie Bonnefoy (CeRCA), Nicolas Louveton (CeRCA)
https://arxiv.org/abs/2509.10043
All Eyes, no IMU: Learning Flight Attitude from Vision Alone
Jesse J. Hagenaars, Stein Stroobants, Sander M. Bohte, Guido C. H. E. De Croon
https://arxiv.org/abs/2507.11302
Following public outcry,
the U.S. Department of Education has restored funding for students who have both hearing and vision loss,
about a month after cutting it.
But rather than sending the money directly to the four programs that are part of a national network helping students who are deaf and blind, a condition known as deafblindness,
the department has instead rerouted the grants to a different organization
The Trump administration targeted the programs in …
Ganze Basketball-Spiele der NBA bald immersiv auf der Vision Pro
Apple wird zusammen mit einem US-Kabelanbieter erstmals komplette Basketballpartien mit der Blackmagic URSA abfilmen. Das Problem sind Copyright-Restriktionen.
It is unclear to what extent Wicab's BrainPort Vision Pro tongue display (sensory substitution device) is still on the market https://www.wicab.com/brainport-vision-pro because there has been very little news about it since around 2019, judging also from their own website. Note that the BrainPor…
Sources say Meta's chaotic culture and lack of vision have led to AI brain drain; Meta strongly denies it has had issues with talent and retention (Rashi Shrivastava/Forbes)
https://www.forbes.com/sites/rashishrivast
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Fuhao Li, Wenxuan Song, Han Zhao, Jingbo Wang, Pengxiang Ding, Donglin Wang, Long Zeng, Haoang Li
https://arxiv.org/abs/2510.12276
Bridging Modality Gaps in e-Commerce Products via Vision-Language Alignment
Yipeng Zhang, Hongju Yu, Aritra Mandal, Canran Xu, Qunzhi Zhou, Zhe Wu
https://arxiv.org/abs/2508.10116
EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks
Joohwan Seo, Arvind Kruthiventy, Soomi Lee, Megan Teng, Xiang Zhang, Seoyeon Choi, Jongeun Choi, Roberto Horowitz
https://arxiv.org/abs/2507.10961
Super LiDAR Reflectance for Robotic Perception
Wei Gao, Jie Zhang, Mingle Zhao, Zhiyuan Zhang, Shu Kong, Maani Ghaffari, Dezhen Song, Cheng-Zhong Xu, Hui Kong
https://arxiv.org/abs/2508.10398
LoRA-fine-tuned Large Vision Models for Automated Assessment of Post-SBRT Lung Injury
M. Bolhassani, B. Veasey, E. Daugherty, S. Keltner, N. Kumar, N. Dunlap, A. Amini
https://arxiv.org/abs/2509.12155 …
Prozessor-Leaks: Diese SoCs und SiPs plant Apple in den kommenden Geräten
Welche Chips kommen in neue Modelle von HomePod mini, Apple TV, iPad mini und Vision Pro? Was ist mit dem Studio Display 2? Ein Code-Experte findet Hinweise.
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
Lingjie Jiang, Shaohan Huang, Xun Wu, Yixia Li, Dongdong Zhang, Furu Wei
https://arxiv.org/abs/2508.09945
Detecting Text Manipulation in Images using Vision Language Models
Vidit Vidit, Pavel Korshunov, Amir Mohammadi, Christophe Ecabert, Ketan Kotwal, S\'ebastien Marcel
https://arxiv.org/abs/2509.10278
Vision Language Action Models in Robotic Manipulation: A Systematic Review
Muhayy Ud Din, Waseem Akram, Lyes Saad Saoud, Jan Rosell, Irfan Hussain
https://arxiv.org/abs/2507.10672
Reducto, which uses OCR with vision language models to convert complex documents into inputs for LLMs, raised a $75M Series B led by a16z at a $600M valuation (Stephanie Palazzolo/The Information)
https://www.theinformation.com/articles/startup-using-ai-tran…
A Computer Vision Pipeline for Individual-Level Behavior Analysis: Benchmarking on the Edinburgh Pig Dataset
Haiyu Yang, Enhong Liu, Jennifer Sun, Sumit Sharma, Meike van Leerdam, Sebastien Franceschini, Puchun Niu, Miel Hostens
https://arxiv.org/abs/2509.12047
On the Use of Hierarchical Vision Foundation Models for Low-Cost Human Mesh Recovery and Pose Estimation
Shuhei Tarashima, Yushan Wang, Norio Tagawa
https://arxiv.org/abs/2510.12660
ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
Wenxuan Song, Ziyang Zhou, Han Zhao, Jiayi Chen, Pengxiang Ding, Haodong Yan, Yuxin Huang, Feilong Tang, Donglin Wang, Haoang Li
https://arxiv.org/abs/2508.10333
Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models
Pu Jian, Junhong Wu, Wei Sun, Chen Wang, Shuo Ren, Jiajun Zhang
https://arxiv.org/abs/2509.12132 …
CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model
Zhuoyuan Yu, Yuxing Long, Zihan Yang, Chengyan Zeng, Hongwei Fan, Jiyao Zhang, Hao Dong
https://arxiv.org/abs/2508.10416
Apple debuts its M5 chip, with a 10-core GPU, a Neural Accelerator in each core, enabling 4x the performance of M4, and a 10-core CPU with six efficiency cores (Hartley Charlton/MacRumors)
https://www.macrumors.com/2025/10/15/apple-unveils-m5-chip-with-n…
Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space
Chao Chen, Zhixin Ma, Yongqi Li, Yupeng Hu, Yinwei Wei, Wenjie Li, Liqiang Nie
https://arxiv.org/abs/2510.12603
Two-stream network-driven vision-based tactile sensor for object feature extraction and fusion perception
Muxing Huang, Zibin Chen, Weiliang Xu, Zilan Li, Yuanzhi Zhou, Guoyuan Zhou, Wenjing Chen, Xinming Li
https://arxiv.org/abs/2510.12528
From Diagnosis to Improvement: Probing Spatio-Physical Reasoning in Vision Language Models
Tiancheng Han, Yunfei Gao, Yong Li, Wuzhou Yu, Qiaosheng Zhang, Wenqi Shao
https://arxiv.org/abs/2508.10770
DreamNav: A Trajectory-Based Imaginative Framework for Zero-Shot Vision-and-Language Navigation
Yunheng Wang, Yuetong Fang, Taowen Wang, Yixiao Feng, Yawen Tan, Shuning Zhang, Peiran Liu, Yiding Ji, Renjing Xu
https://arxiv.org/abs/2509.11197
GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation
Hang Yin, Haoyu Wei, Xiuwei Xu, Wenxuan Guo, Jie Zhou, Jiwen Lu
https://arxiv.org/abs/2509.10454
ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Long Cui, Weiyun Wang, Jie Shao, Zichen Wen, Gen Luo, Linfeng Zhang, Yanting Zhang, Yu Qiao, Wenhai Wang
https://arxiv.org/abs/2510.12793
Some LA Lakers games will be live streamed in the Apple Immersive format in the NBA app and the Spectrum SportsNet app on Vision Pro during the 2025-26 season (Jacob Krol/TechRadar)
https://www.techradar.com/streaming/entert…
From Production Logistics to Smart Manufacturing: The Vision for a New RoboCup Industrial League
Supun Dissanayaka, Alexander Ferrein, Till Hofmann, Kosuke Nakajima, Mario Sanz-Lopez, Jesus Savage, Daniel Swoboda, Matteo Tschesche, Wataru Uemura, Tarik Viehmann, Shohei Yasuda
https://arxiv.org/abs/2507.11402
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/3]:
- Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation
Li, Du, Yu, Li, Zhao, Liu, Jiang, Zhu, Huang
UniFusion: Vision-Language Model as Unified Encoder in Image Generation
Kevin Li, Manuel Brack, Sudeep Katakol, Hareesh Ravi, Ajinkya Kale
https://arxiv.org/abs/2510.12789 https…
Beyond conventional vision: RGB-event fusion for robust object detection in dynamic traffic scenarios
Zhanwen Liu, Yujing Sun, Yang Wang, Nan Yang, Shengbo Eben Li, Xiangmo Zhao
https://arxiv.org/abs/2508.10704
3DViT-GAT: A Unified Atlas-Based 3D Vision Transformer and Graph Learning Framework for Major Depressive Disorder Detection Using Structural MRI Data
Nojod M. Alotaibi, Areej M. Alhothali, Manar S. Ali
https://arxiv.org/abs/2509.12143
Igniting VLMs toward the Embodied Space
Andy Zhai, Brae Liu, Bruno Fang, Chalse Cai, Ellie Ma, Ethan Yin, Hao Wang, Hugo Zhou, James Wang, Lights Shi, Lucy Liang, Make Wang, Qian Wang, Roy Gan, Ryan Yu, Shalfun Li, Starrick Liu, Sylas Chen, Vincent Chen, Zach Xu
https://arxiv.org/abs/2509.11766
Open-ended Hierarchical Streaming Video Understanding with Vision Language Models
Hyolim Kang, Yunsu Park, Youngbeom Yoo, Yeeun Choi, Seon Joo Kim
https://arxiv.org/abs/2509.12145
Data or Language Supervision: What Makes CLIP Better than DINO?
Yiming Liu, Yuhui Zhang, Dhruba Ghosh, Ludwig Schmidt, Serena Yeung-Levy
https://arxiv.org/abs/2510.11835 https:/…
Embodied Navigation Foundation Model
Jiazhao Zhang, Anqi Li, Yunpeng Qi, Minghan Li, Jiahang Liu, Shaoan Wang, Haoran Liu, Gengze Zhou, Yuze Wu, Xingxing Li, Yuxin Fan, Wenjun Li, Zhibo Chen, Fei Gao, Qi Wu, Zhizheng Zhang, He Wang
https://arxiv.org/abs/2509.12129
E-MoFlow: Learning Egomotion and Optical Flow from Event Data via Implicit Regularization
Wenpu Li, Bangyan Liao, Yi Zhou, Qi Xu, Pian Wan, Peidong Liu
https://arxiv.org/abs/2510.12753
LLMC : Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit
Chengtao Lv, Bilang Zhang, Yang Yong, Ruihao Gong, Yushi Huang, Shiqiao Gu, Jiajun Wu, Yumeng Shi, Jinyang Guo, Wenya Wang
https://arxiv.org/abs/2508.09981
EvoCAD: Evolutionary CAD Code Generation with Vision Language Models
Tobias Preintner, Weixuan Yuan, Adrian K\"onig, Thomas B\"ack, Elena Raponi, Niki van Stein
https://arxiv.org/abs/2510.11631
Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users
Jeffri Murrugarra-LLerena, Haoran Niu, K. Suzanne Barber, Hal Daum\'e III, Yang Trista Cao, Paola Cascante-Bonilla
https://arxiv.org/abs/2508.09245
VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation
Shaoqi Dong, Chaoyou Fu, Haihan Gao, Yi-Fan Zhang, Chi Yan, Chu Wu, Xiaoyu Liu, Yunhang Shen, Jing Huo, Deqiang Jiang, Haoyu Cao, Yang Gao, Xing Sun, Ran He, Caifeng Shan
https://arxiv.org/abs/2510.09607
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[8/8]:
- TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor Cores
Liao, Ding, Cui, Gong, Hu, Wang, Li, Zhang, Wang, Fu
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[7/8]:
- MultiCOIN: Multi-Modal COntrollable Video INbetweening
Tanveer, Zhou, Niklaus, Amiri, Zhang, Singh, Zhao
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[6/8]:
- GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning
Mustansar Fiaz, Hiyam Debary, Paolo Fraccaro, Danda Paudel, Luc Van Gool, Fahad Khan, Salman Khan
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[5/8]:
- Context Guided Transformer Entropy Modeling for Video Compression
Junlong Tong, Wei Zhang, Yaohui Jin, Xiaoyu Shen
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/8]:
- Boosting Adversarial Transferability via Commonality-Oriented Gradient Optimization
Yanting Gao, Yepeng Liu, Junming Liu, Qi Zhang, Hongyun Zhang, Duoqian Miao, Cairong Zhao
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/8]:
- Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou, Feng Hong, Jiaan Luo, Jiangchao Yao, Dongsheng Li, Bo Han, Ya Zhang, Yanfeng Wang
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/8]:
- Multimodal Alignment and Fusion: A Survey
Songtao Li, Hao Tang
https://
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/8]:
- Invariant Feature Learning for Generalized Long-Tailed Classification
Kaihua Tang, Mingyuan Tao, Jiaxin Qi, Zhenguang Liu, Hanwang Zhang
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/3]:
- Adversarial Attacks Leverage Interference Between Features in Superposition
Edward Stevinson, Lucas Prieto, Melih Barsbey, Tolga Birdal
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/3]:
- ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/3]:
- Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
Rinaldi, Panariello, Salici, Liu, Ciccone, Porrello, Calderara
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[5/5]:
- Modular Embedding Recomposition for Incremental Learning
Panariello, Frascaroli, Buzzega, Bonicelli, Porrello, Calderara
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/5]:
- J-RAS: Enhancing Medical Image Segmentation via Retrieval-Augmented Joint Training
Salma J. Ahmed, Emad A. Mohammed, Azam Asilian Bidgoli
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/5]:
- STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes
Keishi Ishihara, Kento Sasaki, Tsubasa Takahashi, Daiki Shiono, Yu Yamaguchi
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/5]:
- Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) ...
Marin, et al.
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/5]:
- Enhancing Representations through Heterogeneous Self-Supervised Learning
Zhong-Yu Li, Bo-Wen Yin, Yongxiang Liu, Li Liu, Ming-Ming Cheng
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/1]:
- SeeingSounds: Learning Audio-to-Visual Alignment via Text
Carnemolla, Pennisi, Russo, Palazzo, Giordano, Spampinato
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/3]:
- Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image
Pufan Li, Bi'an Du, Wei Hu