Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csCV_bot@mastoxiv.page
2025-08-19 12:08:10

Precise Action-to-Video Generation Through Visual Action Prompts
Yuang Wang, Chao Wen, Haoyu Guo, Sida Peng, Minghan Qin, Hujun Bao, Xiaowei Zhou, Ruizhen Hu
arxiv.org/abs/2508.13104

@arXiv_csHC_bot@mastoxiv.page
2025-09-19 09:31:41

UMind: A Unified Multitask Network for Zero-Shot M/EEG Visual Decoding
Chengjian Xu, Yonghao Song, Zelin Liao, Haochuan Zhang, Qiong Wang, Qingqing Zheng
arxiv.org/abs/2509.14772

@seeingwithsound@mas.to
2025-09-20 18:42:27

(2024) Visual neuroprostheses for impaired human nervous system: State-of-the-art and future outlook #BCI

Visual pathway of the human visual system.
@arXiv_csCL_bot@mastoxiv.page
2025-09-19 10:28:51

V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models
Qidong Wang, Junjie Hu, Ming Jiang
arxiv.org/abs/2509.14837

@UP8@mastodon.social
2025-08-18 18:12:24

⏱️ Researchers discover how the human brain organizes its visual memories through precise neural timing
medicalxpress.com/news/2025-07

@arXiv_csSD_bot@mastoxiv.page
2025-09-19 10:09:31

From Hype to Insight: Rethinking Large Language Model Integration in Visual Speech Recognition
Rishabh Jain, Naomi Harte
arxiv.org/abs/2509.14880

@arXiv_qbioNC_bot@mastoxiv.page
2025-09-19 09:00:51

Mouse vs. AI: A Neuroethological Benchmark for Visual Robustness and Neural Alignment
Marius Schneider, Joe Canzano, Jing Peng, Yuchen Hou, Spencer LaVere Smith, Michael Beyeler
arxiv.org/abs/2509.14446

@Techmeme@techhub.social
2025-09-16 11:10:59

Microsoft adds auto model selection to Visual Studio Code that primarily favors Claude Sonnet 4 over GPT-5 for paid GitHub Copilot users (Tom Warren/The Verge)
theverge.com/report/778641/mic

@ErikJonker@mastodon.social
2025-08-19 10:50:45

Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency
qwenlm.github.io/blog/qwen-ima

@arXiv_csRO_bot@mastoxiv.page
2025-09-19 09:48:21

BEV-ODOM2: Enhanced BEV-based Monocular Visual Odometry with PV-BEV Fusion and Dense Flow Supervision for Ground Robots
Yufei Wei, Wangtao Lu, Sha Lu, Chenxiao Hu, Fuzhang Han, Rong Xiong, Yue Wang
arxiv.org/abs/2509.14636

@laurentperrinet@neuromatch.social
2025-08-18 15:51:27

🧠 Excited to share our latest research presented at #CNS2025 in beautiful Firenze, Italy!
"Population decoding of visual motion direction in V1 marmoset monkey: effects of uncertainty"
Our work explores how populations of neurons in the primary visual cortex (V1) of marmoset monkeys encode visual motion direction, with a particular focus on understanding how uncertainty influences…

@Migurski@mastodon.social
2025-08-18 17:03:41

Visual aid for Boeing Power-Sat presentation, 1975 Via 70sscifiart.tumblr.com/post/79

@arXiv_eessIV_bot@mastoxiv.page
2025-08-20 09:13:40

Automated Cervical Cancer Detection through Visual Inspection with Acetic Acid in Resource-Poor Settings with Lightweight Deep Learning Models Deployed on an Android Device
Leander Melroy Maben, Keerthana Prasad, Shyamala Guruvare, Vidya Kudva, P C Siddalingaswamy
arxiv.org/abs/2508.13253

@poppastring@dotnet.social
2025-09-17 19:35:27

A post from the archive 📫:
Using Visual Studio to search objects in a memory dump
poppastring.com/blog/using-vis

@arXiv_csCR_bot@mastoxiv.page
2025-08-19 11:38:10

Unlearning Comparator: A Visual Analytics System for Comparative Evaluation of Machine Unlearning Methods
Jaeung Lee, Suhyeon Yu, Yurim Jang, Simon S. Woo, Jaemin Jo
arxiv.org/abs/2508.12730

@seeingwithsound@mas.to
2025-08-19 07:49:42

(LinkedIn) Revision Implant is despite its name already quickly looking for markets beyond visual prostheses linkedin.com/posts/revision-im

@arXiv_csHC_bot@mastoxiv.page
2025-09-19 08:31:51

Sensing the Shape of Data: Non-Visual Exploration of Statistical Concepts in Histograms with Blind and Low-Vision Learners
Sanchita S. Kamath, Omar Khan, Aziz N Zeidieh, JooYoung Seo
arxiv.org/abs/2509.14452

@arXiv_csCV_bot@mastoxiv.page
2025-09-19 10:29:21

Leveraging Geometric Visual Illusions as Perceptual Inductive Biases for Vision Models
Haobo Yang, Minghao Guo, Dequan Yang, Wenyu Wang
arxiv.org/abs/2509.15156

@arXiv_csCL_bot@mastoxiv.page
2025-08-20 09:34:40

AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings
Haoxuan Li, Wei Song, Aofan Liu, Peiwu Qin
arxiv.org/abs/2508.13606

@arXiv_eessAS_bot@mastoxiv.page
2025-09-19 08:47:41

Diffusion-Based Unsupervised Audio-Visual Speech Separation in Noisy Environments with Noise Prior
Yochai Yemini, Rami Ben-Ari, Sharon Gannot, Ethan Fetaya
arxiv.org/abs/2509.14379

@arXiv_eessSY_bot@mastoxiv.page
2025-08-20 09:03:40

Model-based Multi-object Visual Tracking: Identification and Standard Model Limitations
Jan Krej\v{c}\'i, Oliver Kost, Yuxuan Xia, Lennart Svensson, Ond\v{r}ej Straka
arxiv.org/abs/2508.13647

@fanf@mendeddrum.org
2025-08-20 20:42:03

from my link log —
Game math: precise control over numeric springing.
allenchou.net/2015/04/game-mat
saved 2025-05-21

@arXiv_csSD_bot@mastoxiv.page
2025-08-20 07:44:19

Leveraging Mamba with Full-Face Vision for Audio-Visual Speech Enhancement
Rong Chao, Wenze Ren, You-Jin Li, Kuo-Hsuan Hung, Sung-Feng Huang, Szu-Wei Fu, Wen-Huang Cheng, Yu Tsao
arxiv.org/abs/2508.13624

@arXiv_condmatsoft_bot@mastoxiv.page
2025-08-18 08:57:10

Large-scale dynamics in visual quorum sensing chiral suspensions
Yuxin Zhou, Qingqing Yin, Shubhadip Nayak, Poulami Bag, Pulak K. Ghosh, Yunyun Li, Fabio Marchesoni
arxiv.org/abs/2508.11254

@blakes7bot@mas.torpidity.net
2025-08-20 12:18:34

Series C, Episode 07 - Children of Auron
C.A. ONE: For what reason?
FRANTON: Conquest.
C.A. ONE: That's ridiculous.
FRANTON: Surely it would be safer to wait for the Liberator. At least we can trust them, and we know they're coming, Zelda heard.
blake.torpidity.net/m/307/223

Claude 3.7 describes the image as: "The image appears to be from a classic British television production, likely from the late 1970s or early 1980s based on the visual quality and aesthetic. 

The scene shows an elderly person with white hair wearing a dark jacket with teal/green elements and decorative trim. The individual has a serious, contemplative expression and appears to be in a conversation within what looks like an interior setting with light-colored walls or panels visible in the back…
@arXiv_csSE_bot@mastoxiv.page
2025-09-16 11:03:17

VisDocSketcher: Towards Scalable Visual Documentation with Agentic Systems
Lu\'is F. Gomes, Xin Zhou, David Lo, Rui Abreu
arxiv.org/abs/2509.11942

@arXiv_csRO_bot@mastoxiv.page
2025-07-18 08:41:12

ASC-SW: Atrous strip convolution network with sliding windows for visual-assisted map navigation
Cheng Liu, Fan Zhu, Yaoyu Zhuang Zhinan Chen Jiefeng Tang
arxiv.org/abs/2507.12744

@arXiv_csCV_bot@mastoxiv.page
2025-08-19 12:06:00

Omni Survey for Multimodality Analysis in Visual Object Tracking
Zhangyong Tang, Tianyang Xu, Xuefeng Zhu, Hui Li, Shaochuan Zhao, Tao Zhou, Chunyang Cheng, Xiaojun Wu, Josef Kittler
arxiv.org/abs/2508.13000

@seeingwithsound@mas.to
2025-08-17 21:23:36

Neuralink for visual prosthesis #Neuralink

@arXiv_csIR_bot@mastoxiv.page
2025-07-17 08:36:10

Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker
Rachna Saxena, Abhijeet Kumar, Suresh Shanmugam
arxiv.org/abs/2507.12378

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:46:20

EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
Ashish Seth, Utkarsh Tyagi, Ramaneswaran Selvakumar, Nishit Anand, Sonal Kumar, Sreyan Ghosh, Ramani Duraiswami, Chirag Agarwal, Dinesh Manocha
arxiv.org/abs/2508.12687

@Techmeme@techhub.social
2025-08-20 16:51:05

Google's new AI features for Pixel include Magic Cue for contextual suggestions across apps, Voice Translate for calls, and Visual Overlays for the camera (Sarah Perez/TechCrunch)
techcrunch.com/2025/08/20/goog

@leftsidestory@mstdn.social
2025-09-19 00:30:01

On The Road - To Xi’An/ Geometry 📐
在路上 - 去西安/ 几何 📐
📷 Pentax MX
🎞️Fujifilm Neopan F, expired 1993
#filmphotography #Photography #blackandwhite

FUJIFILM NEOPAN F (FF)

English Alt Text: A black-and-white photograph taken from beneath a long bridge. The bridge stretches into the distance, supported by evenly spaced vertical columns that descend into a calm body of water. The water is so still that it mirrors the columns and underside of the bridge perfectly, creating a symmetrical visual effect. Along the top edge of the bridge are traditional lantern-style lamp posts, possibly East Asian in design, adding a cultural touch to the otherw…
FUJIFILM NEOPAN F (FF)

English Alt Text: A black-and-white image of a modern architectural structure beside a large reflective pool. The building features angular overhangs and vertical columns, casting dramatic shadows. The water below reflects the entire structure, doubling its visual impact. In the background, trees and another uniquely designed building—possibly a museum or cultural center—add context. A lone person walks near the water’s edge, providing scale and a human element. The comp…
FUJIFILM NEOPAN F (FF)

English Alt Text: A black-and-white photo showing a modern bridge with stylized support columns extending into the distance. The bridge spans a body of water that reflects the structure and sky, creating a symmetrical scene. In the background stands a large, tiered tower resembling traditional East Asian pagodas, adding cultural depth. The image uses strong perspective lines and mirrored reflections to emphasize geometry and balance. The contrast between the sleek bridge…
@arXiv_csCV_bot@mastoxiv.page
2025-08-20 10:16:30

VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization
Jailing Lin, Shu Jiang, Qingyuan Zeng, Zhenzhong Wang, Min Jiang
arxiv.org/abs/2508.13792

@arXiv_csHC_bot@mastoxiv.page
2025-09-18 09:23:21

Py maidr: Bridging Visual and Non-Visual Data Experiences Through a Unified Python Framework
JooYoung Seo, Saairam Venkatesh, Daksh Pokar, Sanchita Kamath, Krishna Anandan Ganesan
arxiv.org/abs/2509.13532

@arXiv_csSD_bot@mastoxiv.page
2025-08-19 09:46:50

Cross-Modal Knowledge Distillation with Multi-Level Data Augmentation for Low-Resource Audio-Visual Sound Event Localization and Detection
Qing Wang, Ya Jiang, Hang Chen, Sabato Marco Siniscalchi, Jun Du, Jianqing Gao
arxiv.org/abs/2508.12334

@arXiv_csCL_bot@mastoxiv.page
2025-08-18 09:48:00

Dataset Creation for Visual Entailment using Generative AI
Rob Reijtenbach, Suzan Verberne, Gijs Wijnholds
arxiv.org/abs/2508.11605 arxiv.o…

@seeingwithsound@mas.to
2025-07-19 19:52:16

Optimizing electrical stimulation parameters to enhance visual cortex activation in retina degeneration rats #BCI

@arXiv_csRO_bot@mastoxiv.page
2025-09-19 09:09:21

Learning Discrete Abstractions for Visual Rearrangement Tasks Using Vision-Guided Graph Coloring
Abhiroop Ajith, Constantinos Chamzas
arxiv.org/abs/2509.14460

@arXiv_csCV_bot@mastoxiv.page
2025-08-20 10:17:10

Unsupervised Urban Tree Biodiversity Mapping from Street-Level Imagery Using Spatially-Aware Visual Clustering
Diaa Addeen Abuhani, Marco Seccaroni, Martina Mazzarello, Imran Zualkernan, Fabio Duarte, Carlo Ratti
arxiv.org/abs/2508.13814

@Techmeme@techhub.social
2025-08-20 16:18:02

Google unveils the $999 Pixel 10 Pro and $1,199 10 Pro XL, with 6.3" and 6.8" OLED displays, Tensor G5 chips, Zoned UFS storage, available on August 28 (Ben Schoon/9to5Google)
9to5google.com/2025/08/20/goog

@arXiv_qbioNC_bot@mastoxiv.page
2025-08-19 09:37:20

Synchronization and semantization in deep spiking networks
Jonas Oberste-Frielinghaus, Anno C. Kurth, Julian G\"oltz, Laura Kriener, Junji Ito, Mihai A. Petrovici, Sonja Gr\"un
arxiv.org/abs/2508.12975

@arXiv_csSD_bot@mastoxiv.page
2025-09-19 10:09:51

Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic event Classification
Yuanjian Chen, Yang Xiao, Jinjie Huang
arxiv.org/abs/2509.14893

@arXiv_csAI_bot@mastoxiv.page
2025-08-20 11:23:47

Crosslisted article(s) found for cs.AI. arxiv.org/list/cs.AI/new
[4/6]:
- End-to-End Audio-Visual Learning for Cochlear Implant Sound Coding in Noisy Environments
Meng-Ping Lin, Enoch Hsin-Ho Huang, Shao-Yi Chien, Yu Tsao

@poppastring@dotnet.social
2025-09-10 03:31:25

Just published 🚀: Visual Studio Insiders
#visualstudio

@arXiv_csRO_bot@mastoxiv.page
2025-08-19 11:29:30

Manipulate-to-Navigate: Reinforcement Learning with Visual Affordances and Manipulability Priors
Yuying Zhang, Joni Pajarinen
arxiv.org/abs/2508.13151

@arXiv_csHC_bot@mastoxiv.page
2025-08-19 09:48:30

fCrit: A Visual Explanation System for Furniture Design Creative Support
Vuong Nguyen, Gabriel Vigliensoni
arxiv.org/abs/2508.12416 arxiv.o…

@seeingwithsound@mas.to
2025-09-18 09:12:54

Visual image reconstruction from brain activity via latent representation annualreviews.org/content/jour by @…

Psychological measurement of subjective visual experiences through image reconstruction. (a) Mapping of brain, stimulus, and mind. Dots represent instances of visual experience (e.g., an image, perception, and corresponding brain activity). Veridical perception assumes that the mind accurately represents stimuli. The brain–mind mapping is considered fixed, while the brain–stimulus relationship is empirically identified. (b) Nonveridical perception (e.g., mental imagery, attentional modulation, …
@arXiv_csCL_bot@mastoxiv.page
2025-07-18 09:48:32

Multi-Agent Synergy-Driven Iterative Visual Narrative Synthesis
Wang Xi, Quan Shi, Tian Yu, Yujie Peng, Jiayi Sun, Mengxing Ren, Zenghui Ding, Ningguang Yao
arxiv.org/abs/2507.13285

@arXiv_csHC_bot@mastoxiv.page
2025-09-19 09:03:51

VisMoDAl: Visual Analytics for Evaluating and Improving Corruption Robustness of Vision-Language Models
Huanchen Wang, Wencheng Zhang, Zhiqiang Wang, Zhicong Lu, Yuxin Ma
arxiv.org/abs/2509.14571

@arXiv_csRO_bot@mastoxiv.page
2025-08-18 09:36:10

Visual Perception Engine: Fast and Flexible Multi-Head Inference for Robotic Vision Tasks
Jakub {\L}ucki, Jonathan Becktor, Georgios Georgakis, Robert Royce, Shehryar Khattak
arxiv.org/abs/2508.11584

@arXiv_csCV_bot@mastoxiv.page
2025-07-18 10:20:02

Leveraging Pre-Trained Visual Models for AI-Generated Video Detection
Keerthi Veeramachaneni, Praveen Tirupattur, Amrit Singh Bedi, Mubarak Shah
arxiv.org/abs/2507.13224

@arXiv_csAI_bot@mastoxiv.page
2025-08-20 09:54:40

V2P: From Background Suppression to Center Peaking for Robust GUI Grounding Task
Jikai Chen, Long Chen, Dong Wang, Leilei Gan, Chenyi Zhuang, Jinjie Gu
arxiv.org/abs/2508.13634

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:25:01

Teacher-Guided Pseudo Supervision and Cross-Modal Alignment for Audio-Visual Video Parsing
Yaru Chen, Ruohao Guo, Liting Gao, Yang Xiang, Qingyu Luo, Zhenbo Li, Wenwu Wang
arxiv.org/abs/2509.14097

@arXiv_csRO_bot@mastoxiv.page
2025-07-18 09:10:52

FFI-VTR: Lightweight and Robust Visual Teach and Repeat Navigation based on Feature Flow Indicator and Probabilistic Motion Planning
Jikai Wang, Yunqi Cheng, Zonghai Chen
arxiv.org/abs/2507.12800

@seeingwithsound@mas.to
2025-09-17 15:05:02

How tree shrews see the world - A compressed hierarchy for visual form processing in the tree shrew #neuroscience

@arXiv_csCV_bot@mastoxiv.page
2025-08-19 12:07:40

Checkmate: interpretable and explainable RSVQA is the endgame
Lucrezia Tosato, Christel Tartini Chappuis, Syrielle Montariol, Flora Weissgerber, Sylvain Lobry, Devis Tuia
arxiv.org/abs/2508.13086

@arXiv_csHC_bot@mastoxiv.page
2025-08-20 08:17:50

Visuo-Tactile Feedback with Hand Outline Styles for Modulating Affective Roughness Perception
Minju Baeck, Yoonseok Shin, Dooyoung Kim, Hyunjin Lee, Sang Ho Yoon, Woontack Woo
arxiv.org/abs/2508.13504

@Techmeme@techhub.social
2025-09-11 02:01:46

Microsoft releases its first preview of Visual Studio 2026, the first major update since November 2021, with a new look and deeper AI integration (Tim Anderson/The Register)
theregister.com/2025/09/10/vis

@arXiv_csCV_bot@mastoxiv.page
2025-08-18 09:53:10

OpenConstruction: A Systematic Synthesis of Open Visual Datasets for Data-Centric Artificial Intelligence in Construction Monitoring
Ruoxin Xiong, Yanyu Wang, Jiannan Cai, Kaijian Liu, Yuansheng Zhu, Pingbo Tang, Nora El-Gohary
arxiv.org/abs/2508.11482

@seeingwithsound@mas.to
2025-09-20 16:23:45

Neurons and Pixels neurotechreports.com/pages/pub by James Cavuoto on Neuralink Blindsight and the graveyard of commercial failures: Optobionics, Retina Implant, Second Sight, Pixium Vision and others.

@arXiv_csRO_bot@mastoxiv.page
2025-08-18 08:01:30

Robust Online Calibration for UWB-Aided Visual-Inertial Navigation with Bias Correction
Yizhi Zhou, Jie Xu, Jiawei Xia, Zechen Hu, Weizi Li, Xuan Wang
arxiv.org/abs/2508.10999

@arXiv_csCV_bot@mastoxiv.page
2025-09-19 10:31:41

Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation
Xiaoyu Yue, Zidong Wang, Yuqing Wang, Wenlong Zhang, Xihui Liu, Wanli Ouyang, Lei Bai, Luping Zhou
arxiv.org/abs/2509.15185

@Techmeme@techhub.social
2025-08-18 20:01:33

Google says users created 100M videos using its AI filmmaking tool Flow since its May launch; Flow leverages Veo 3 and focuses on maintaining visual consistency (Katelyn Chedraoui/CNET)
cnet.com/tech/services-and-sof

@arXiv_csCL_bot@mastoxiv.page
2025-09-19 10:20:41

UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets
Pengyu Wang, Shaojun Zhou, Chenkun Tan, Xinghao Wang, Wei Huang, Zhen Ye, Zhaowei Li, Botian Jiang, Dong Zhang, Xipeng Qiu
arxiv.org/abs/2509.14738

@arXiv_csRO_bot@mastoxiv.page
2025-09-18 10:11:51

BIM Informed Visual SLAM for Construction Monitoring
Asier Bikandi, Miguel Fernandez-Cortizas, Muhammad Shaheer, Ali Tourani, Holger Voos, Jose Luis Sanchez-Lopez
arxiv.org/abs/2509.13972

@arXiv_csCV_bot@mastoxiv.page
2025-09-19 10:26:21

OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation
Bo-Wen Yin, Jiao-Long Cao, Xuying Zhang, Yuming Chen, Ming-Ming Cheng, Qibin Hou
arxiv.org/abs/2509.15096

@arXiv_csHC_bot@mastoxiv.page
2025-07-17 09:44:10

Deconstructing Implicit Beliefs in Visual Data Journalism: Unstable Meanings Behind Data as Truth & Design for Insight
Ke Er Amy Zhang, Jodie Jenkinson, Laura Garrison
arxiv.org/abs/2507.12377

@arXiv_csCV_bot@mastoxiv.page
2025-08-20 10:16:40

A Fully Transformer Based Multimodal Framework for Explainable Cancer Image Segmentation Using Radiology Reports
Enobong Adahada, Isabel Sassoon, Kate Hone, Yongmin Li
arxiv.org/abs/2508.13796

@arXiv_csRO_bot@mastoxiv.page
2025-07-16 10:16:51

Comparison of Localization Algorithms between Reduced-Scale and Real-Sized Vehicles Using Visual and Inertial Sensors
Tobias Kern, Leon Tolksdorf, Christian Birkner
arxiv.org/abs/2507.11241

@arXiv_csHC_bot@mastoxiv.page
2025-09-17 10:22:00

More than Meets the Eye: Understanding the Effect of Individual Objects on Perceived Visual Privacy
Mete Harun Akcay, Siddharth Prakash Rao, Alexandros Bakas, Buse Gul Atli
arxiv.org/abs/2509.13051

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 10:03:01

Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos, Eda B. \"Ozyi\u{g}it
arxiv.org/abs/2509.10345

@arXiv_csCV_bot@mastoxiv.page
2025-09-17 10:53:10

HERO: Rethinking Visual Token Early Dropping in High-Resolution Large Vision-Language Models
Xu Li, Yuxuan Liang, Xiaolei Chen, Yi Zheng, Haotian Chen, Bin Li, Xiangyang Xue
arxiv.org/abs/2509.13067

@arXiv_csRO_bot@mastoxiv.page
2025-07-17 09:56:40

Assessing the Value of Visual Input: A Benchmark of Multimodal Large Language Models for Robotic Path Planning
Jacinto Colan, Ana Davila, Yasuhisa Hasegawa
arxiv.org/abs/2507.12391

@seeingwithsound@mas.to
2025-09-16 12:22:42

Encoding visual stimuli by striatal neurons (in mice) biorxiv.org/content/10.1101/20 "Although visual object encoding is considered a cortical attribute, subcortical areas also contain visual processing circuits."

@arXiv_csRO_bot@mastoxiv.page
2025-09-17 10:35:50

DVDP: An End-to-End Policy for Mobile Robot Visual Docking with RGB-D Perception
Haohan Min, Zhoujian Li, Yu Yang, Jinyu Chen, Shenghai Yuan
arxiv.org/abs/2509.13024

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:22:31

Distractor-Aware Memory-Based Visual Object Tracking
Jovana Videnovic, Matej Kristan, Alan Lukezic
arxiv.org/abs/2509.13864 arxiv.org/pdf/2…

@arXiv_csCV_bot@mastoxiv.page
2025-07-18 10:22:02

$\pi^3$: Scalable Permutation-Equivariant Visual Geometry Learning
Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, Tong He
arxiv.org/abs/2507.13347

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:25:51

An Exploratory Study on Abstract Images and Visual Representations Learned from Them
Haotian Li, Jianbo Jiao
arxiv.org/abs/2509.14149 arxiv…

@arXiv_csCV_bot@mastoxiv.page
2025-08-18 09:53:50

Hierarchical Graph Feature Enhancement with Adaptive Frequency Modulation for Visual Recognition
Feiyue Zhao, Zhichao Zhang
arxiv.org/abs/2508.11497

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:24:41

VSE-MOT: Multi-Object Tracking in Low-Quality Video Scenes Guided by Visual Semantic Enhancement
Jun Du, Weiwei Xing, Ming Li, Fei Richard Yu
arxiv.org/abs/2509.14060

@arXiv_csRO_bot@mastoxiv.page
2025-07-18 09:23:22

LaViPlan : Language-Guided Visual Path Planning with RLVR
Hayeon Oh
arxiv.org/abs/2507.12911 arxiv.org/pdf/2507.12911…

@arXiv_csCV_bot@mastoxiv.page
2025-08-18 09:52:30

Inside Knowledge: Graph-based Path Generation with Explainable Data Augmentation and Curriculum Learning for Visual Indoor Navigation
Daniel Airinei, Elena Burceanu, Marius Leordeanu
arxiv.org/abs/2508.11446

@arXiv_csRO_bot@mastoxiv.page
2025-08-18 08:35:20

GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Kelin Yu, Sheng Zhang, Harshit Soora, Furong Huang, Heng Huang, Pratap Tokekar, Ruohan Gao
arxiv.org/abs/2508.11049

@arXiv_csCV_bot@mastoxiv.page
2025-09-16 12:43:47

Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models
Pu Jian, Junhong Wu, Wei Sun, Chen Wang, Shuo Ren, Jiajun Zhang
arxiv.org/abs/2509.12132

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:21:31

Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
Weihang Wang, Xinhao Li, Ziyue Wang, Yan Pang, Jielei Zhang, Peiyi Li, Qiang Zhang, Longwen Gao
arxiv.org/abs/2509.13836

@arXiv_csCV_bot@mastoxiv.page
2025-07-17 10:27:10

Describe Anything Model for Visual Question Answering on Text-rich Images
Yen-Linh Vu, Dinh-Thang Duong, Truong-Binh Duong, Anh-Khoi Nguyen, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Jianhua Xing, Xingjian Li, Tianyang Wang, Ulas Bagci, Min Xu
arxiv.org/abs/2507.12441

@arXiv_csCV_bot@mastoxiv.page
2025-09-17 10:52:50

Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models
Yan Chen, Long Li, Teng Xi, Long Zeng, Jingdong Wang
arxiv.org/abs/2509.13031

@arXiv_csCV_bot@mastoxiv.page
2025-09-19 10:33:31

Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation
Luca Bartolomei, Enrico Mannocci, Fabio Tosi, Matteo Poggi, Stefano Mattoccia
arxiv.org/abs/2509.15224

@arXiv_csCV_bot@mastoxiv.page
2025-09-19 10:22:31

PRISM: Product Retrieval In Shopping Carts using Hybrid Matching
Arda Kabadayi, Senem Velipasalar, Jiajing Chen
arxiv.org/abs/2509.14985 ar…

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:23:41

Can Current AI Models Count What We Mean, Not What They See? A Benchmark and Systematic Evaluation
Gia Khanh Nguyen, Yifeng Huang, Minh Hoai
arxiv.org/abs/2509.13939

@arXiv_csCV_bot@mastoxiv.page
2025-07-18 10:20:22

VITA: Vision-to-Action Flow Matching Policy
Dechen Gao, Boqi Zhao, Andrew Lee, Ian Chuang, Hanchu Zhou, Hang Wang, Zhe Zhao, Junshan Zhang, Iman Soltani
arxiv.org/abs/2507.13231

@arXiv_csCV_bot@mastoxiv.page
2025-08-20 10:20:40

RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation
Tianyi Niu, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal
arxiv.org/abs/2508.13968

@arXiv_csCV_bot@mastoxiv.page
2025-08-20 10:15:30

Mitigating Cross-Image Information Leakage in LVLMs for Multi-Image Tasks
Yeji Park, Minyoung Lee, Sanghyuk Chun, Junsuk Choe
arxiv.org/abs/2508.13744

@arXiv_csCV_bot@mastoxiv.page
2025-07-18 10:22:32

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Senqiao Yang, Junyi Li, Xin Lai, Bei Yu, Hengshuang Zhao, Jiaya Jia
arxiv.org/abs/2507.13348

@arXiv_csCV_bot@mastoxiv.page
2025-08-20 10:14:10

HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
Keliang Li, Hongze Shen, Hao Shi, Ruibing Hou, Hong Chang, Jie Huang, Chenghao Jia, Wen Wang, Yiling Wu, Dongmei Jiang, Shiguang Shan, Xilin Chen
arxiv.org/abs/2508.13692

@arXiv_csCV_bot@mastoxiv.page
2025-09-19 10:22:11

SPATIALGEN: Layout-guided 3D Indoor Scene Generation
Chuan Fang, Heng Li, Yixun Liang, Jia Zheng, Yongsen Mao, Yuan Liu, Rui Tang, Zihan Zhou, Ping Tan
arxiv.org/abs/2509.14981

@arXiv_csCV_bot@mastoxiv.page
2025-08-19 12:07:20

Eyes on the Image: Gaze Supervised Multimodal Learning for Chest X-ray Diagnosis and Report Generation
Tanjim Islam Riju, Shuchismita Anwar, Saman Sarker Joy, Farig Sadeque, Swakkhar Shatabda
arxiv.org/abs/2508.13068

@arXiv_csCV_bot@mastoxiv.page
2025-08-20 10:15:10

Enhancing Targeted Adversarial Attacks on Large Vision-Language Models through Intermediate Projector Guidance
Yiming Cao, Yanjie Li, Kaisheng Liang, Yuni Lai, Bin Xiao
arxiv.org/abs/2508.13739

@arXiv_csCV_bot@mastoxiv.page
2025-08-18 09:55:50

Controlling Multimodal LLMs via Reward-guided Decoding
Oscar Ma\~nas, Pierluca D'Oro, Koustuv Sinha, Adriana Romero-Soriano, Michal Drozdzal, Aishwarya Agrawal
arxiv.org/abs/2508.11616