Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csCR_bot@mastoxiv.page
2025-06-18 08:45:05

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments
Xuan Wang, Siyuan Liang, Zhe Liu, Yi Yu, Yuliang Lu, Xiaochun Cao, Ee-Chien Chang
arxiv.org/abs/2506.13205

@arXiv_csHC_bot@mastoxiv.page
2025-06-18 08:22:13

FEWSim: A Visual Analytic Framework for Exploring the Nexus of Food-Energy-Water Simulations
Fan Lei, David A. Sampson, Jiayi Hong, Yuxin Ma, Giuseppe Mascaro, Dave White, Rimjhim Agarwal, Ross Maciejewski
arxiv.org/abs/2506.14056

@tezoatlipoca@mas.to
2025-06-18 17:03:11

#Windows11's new #StickyNotes app (a thinly disguised #OneNote) is fucking annoying. Since it is so-called "smart", and it attempts to provide context around the "source" of your note, its window is…

Windows 11 sticky notes app showing a note I just made. It captured the fact that before switching TO it, I was working in a browser window.
If I tab AWAY, unless I blow it open to its own little subwindow, the note itself closes (which I guess is reasonable). And for some reason the "+Note" and "Screenshot" buttons now appear. The addition of these buttons now shifts my list of notes down, visually disruptive.
If I tab to another application (or back to a browser as here), that "Recent notes" label changes to "Your notes from <Vivaldi icon> <name of webpage> - Vivaldi". 
Which is yet ANOTHER visual disruption out of the corner of my eye.
And there's a setting called "Remember the source: Automatically capture the active window information to help you remember better." with a checkbox. 

If you uncheck this box, the stupid label change remains, all that stops is where the top of your note has a snip of `Source: <what window you had open>`. The visual disruption of that label changing, and having the +Note/Screenshot buttons appear and disappear all the time is annoying AF.
@arXiv_csRO_bot@mastoxiv.page
2025-06-18 08:58:09

Narrate2Nav: Real-Time Visual Navigation with Implicit Language Reasoning in Human-Centric Environments
Amirreza Payandeh, Anuj Pokhrel, Daeun Song, Marcos Zampieri, Xuesu Xiao
arxiv.org/abs/2506.14233

@arXiv_csCV_bot@mastoxiv.page
2025-06-17 09:23:51

ViSTA: Visual Storytelling using Multi-modal Adapters for Text-to-Image Diffusion Models
Sibo Dong, Ismail Shaheen, Maggie Shen, Rupayan Mallick, Sarah Adel Bargal
arxiv.org/abs/2506.12198

@arXiv_csIR_bot@mastoxiv.page
2025-06-18 08:22:48

XGraphRAG: Interactive Visual Analysis for Graph-based Retrieval-Augmented Generation
Ke Wang, Bo Pan, Yingchaojie Feng, Yuwei Wu, Jieyi Chen, Minfeng Zhu, Wei Chen
arxiv.org/abs/2506.13782

@arXiv_csSE_bot@mastoxiv.page
2025-06-19 08:36:38

An Empirical Study of Bugs in Data Visualization Libraries
Weiqi Lu, Yongqiang Tian, Xiaohan Zhong, Haoyang Ma, Zhenyang Xu, Shing-Chi Cheung, Chengnian Sun
arxiv.org/abs/2506.15084

@poppastring@dotnet.social
2025-05-14 19:35:25

A post from the archive 📫:
Find the address of an object in Visual Studio
poppastring.com/blog/find-the-

@denmanrooke@social.coop
2025-05-17 22:30:13

Oh hey, it's #ScreenshotSaturday , here's the latest mockup from the video game adaptation of Battle of Tarot I'm working on with some lovely people.
Everything is still a work-in-progress and not final at all, but trying to get the visual style nailed down.
#GameDev

Screenshot mockup of a card game using tarot cards. Art style is a start contrasted one with an ink and watercolour look. View is top down, with tarot cards can be seen arranged on a battlefield.
@arXiv_eessIV_bot@mastoxiv.page
2025-06-17 10:04:00

Audio-Visual Driven Compression for Low-Bitrate Talking Head Videos
Riku Takahashi, Ryugo Morita, Jinjia Zhou
arxiv.org/abs/2506.13419

@arXiv_csHC_bot@mastoxiv.page
2025-06-19 08:19:54

Navigating High-Dimensional Backstage: A Guide for Exploring Literature for the Reliable Use of Dimensionality Reduction
Hyeon Jeon, Hyunwook Lee, Yun-Hsin Kuo, Taehyun Yang, Daniel Archambault, Sungahn Ko, Takanori Fujiwara, Kwan-Liu Ma, Jinwook Seo
arxiv.org/abs/2506.14820

@Techmeme@techhub.social
2025-06-09 17:50:50

Apple says Visual Intelligence will now be able to search on-screen content in addition to analyzing real world objects, expanding on camera search (Tim Hardwick/MacRumors)
macrumors.com/2025/06/09/ios-2

@macandi@social.heise.de
2025-06-10 07:11:00

KI bei Apple: Sorry wegen Siri, bessere Visual Intelligence, mehr ChatGPT
Apple hat auf der WWDC 2025 Details zur verschobenen Siri genannt. Außerdem gibt es mehr Visual Intelligence und ChatGPT-Integration.

@arXiv_csGR_bot@mastoxiv.page
2025-06-18 08:22:27

GHAR: GeoPose-based Handheld Augmented Reality for Architectural Positioning, Manipulation and Visual Exploration
Sabahat Israr, Dawar Khan, Zhanglin Cheng, Mukhtaj Khan, Kiyoshi Kiyokawa
arxiv.org/abs/2506.14414

@blakes7bot@mas.torpidity.net
2025-06-18 09:17:39

Series B, Episode 09 - Countdown
VETNOR: All right, come with me, we're searching the next level.
PROVINE: Right, sir.
[Teleport section. Avon and Grant are suited up]
AVON: You adjust the temperature with this [Points to knob on suit] You all set?
blake.torpidity.net/m/209/356

Claude 3.7 describes the image as: "This image appears to be from a vintage science fiction television production, likely from the late 1970s or early 1980s based on the visual style and filming quality. 

The scene shows two individuals in a sparse, utilitarian setting with plain walls. Both are wearing similar olive-green high-necked uniforms or jumpsuits that have a military or institutional appearance. One person is visible in profile on the left, while another is facing forward on the righ…
@livia@sciences.social
2025-05-19 08:46:43

If you are working in #word with a reference manager like #zotero or #mendeley, you probably get the Error 4707 since five days.
Here's how to fix it:

Visual Basic for Applications
Run-time error '4707':
Undefined dialogue record field
@arXiv_csAI_bot@mastoxiv.page
2025-06-18 08:04:43

What's in the Box? Reasoning about Unseen Objects from Multimodal Cues
Lance Ying, Daniel Xu, Alicia Zhang, Katherine M. Collins, Max H. Siegel, Joshua B. Tenenbaum
arxiv.org/abs/2506.14212

@arXiv_csRO_bot@mastoxiv.page
2025-06-18 08:39:08

GRaD-Nav : Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics
Qianzhong Chen, Naixiang Gao, Suning Huang, JunEn Low, Timothy Chen, Jiankai Sun, Mac Schwager
arxiv.org/abs/2506.14009

@arXiv_csCV_bot@mastoxiv.page
2025-06-17 09:48:40

Feature Complementation Architecture for Visual Place Recognition
Weiwei Wang, Meijia Wang, Haoyi Wang, Wenqiang Guo, Jiapan Guo, Changming Sun, Lingkun Ma, Weichuan Zhang
arxiv.org/abs/2506.12401

@tml@urbanists.social
2025-06-17 10:10:47

Sigh, Visual Studio Installer has the feature to "rollback" to a previous version with just one click. But only for one step back. (For me, it was from 17.14.5 to 17.14.2.) When you have done that, it doesn't offer any further rollbacks. Irritating. #VisualStudio
Edit: But oh well, I figured out another way to work around my problem.

@arXiv_csCR_bot@mastoxiv.page
2025-06-16 07:29:39

Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models
Jinming Wen, Xinyi Wu, Shuai Zhao, Yanhao Jia, Yuwen Li
arxiv.org/abs/2506.11521

@arXiv_csDC_bot@mastoxiv.page
2025-06-18 08:12:32

D\'ej\`a Vu: Efficient Video-Language Query Engine with Learning-based Inter-Frame Computation Reuse
Jinwoo Hwang, Daeun Kim, Sangyeop Lee, Yoonsung Kim, Guseul Heo, Hojoon Kim, Yunseok Jeong, Tadiwos Meaza, Eunhyeok Park, Jeongseob Ahn, Jongse Park
arxiv.org/abs/2506.14107

@seeingwithsound@mas.to
2025-06-04 18:21:19

The neurological basis for non-visual illustration intellectdiscover.com/content/ "this article poses the question of whether there is a theoretical precedent for creating visual imagery in the minds of blind peopl…

@arXiv_astrophHE_bot@mastoxiv.page
2025-06-19 14:43:19

Replaced article(s) found for astro-ph.HE. arxiv.org/list/astro-ph.HE/new
[1/1]:
- Exploring blazars through sonification. Visual and auditory insights into multifrequency variability
Gustavo Magallanes-Guij\'on, Sergio Mendoza

@arXiv_csRO_bot@mastoxiv.page
2025-06-18 08:34:07

ATK: Automatic Task-driven Keypoint Selection for Robust Policy Learning
Yunchu Zhang, Shubham Mittal, Zhengyu Zhang, Liyiming Ke, Siddhartha Srinivasa, Abhishek Gupta
arxiv.org/abs/2506.13867

@arXiv_qbioNC_bot@mastoxiv.page
2025-06-18 10:10:43

Learning From the Past with Cascading Eligibility Traces
Tokiniaina Raharison Ralambomihanta, Ivan Anokhin, Roman Pogodin, Samira Ebrahimi Kahou, Jonathan Cornford, Blake Aaron Richards
arxiv.org/abs/2506.14598

@blakes7bot@mas.torpidity.net
2025-06-18 18:24:27

Series D, Episode 01 - Rescue
TARRANT: He will if Orac's working. Now come on. We're wasting time. [starts to climb]
[Dayna does not follow, but continues to explore the small room. A hatch slides open in the floor.]
DAYNA: I knew it. Tarrant.
blake.torpidity.net/m/401/422

Claude 3.7 describes the image as: "This image shows a person in what appears to be a science fiction costume or uniform. They're wearing a distinctive outfit with a gray/black base and cream-colored panels on the front, with contrasting trim and a belt. The person is holding what looks like a prop weapon or futuristic gun. 

The setting appears to be a minimalist interior with plain walls, typical of science fiction television productions from the late 1970s or early 1980s based on the visual …
@arXiv_csMM_bot@mastoxiv.page
2025-06-19 08:25:19

Omnidirectional Video Super-Resolution using Deep Learning
Arbind Agrahari Baniya, Tsz-Kwan Lee, Peter W. Eklund, Sunil Aryal
arxiv.org/abs/2506.14803

@arXiv_mathGT_bot@mastoxiv.page
2025-06-13 08:33:00

Visual metrics on boundaries of hyperbolic spaces
Emily Stark
arxiv.org/abs/2506.10108 arxiv.org/pdf/2506.10108

@arXiv_csCR_bot@mastoxiv.page
2025-06-17 10:28:33

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments
Xuan Wang, Siyuan Liang, Zhe Liu, Yi Yu, Yuliang Lu, Xiaochun Cao, Ee-Chien Chang
arxiv.org/abs/2506.13205

@arXiv_csCL_bot@mastoxiv.page
2025-06-17 09:20:39

Focusing on Students, not Machines: Grounded Question Generation and Automated Answer Grading
G\'er\^ome Meyer, Philip Breuer
arxiv.org/abs/2506.12066

@poppastring@dotnet.social
2025-05-17 03:37:38

Just published 🚀: A Kind of Blue
#visualstudio

@arXiv_csSD_bot@mastoxiv.page
2025-06-12 07:57:21

Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction
Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li
arxiv.org/abs/2506.09792

@arXiv_csRO_bot@mastoxiv.page
2025-06-18 08:51:09

AMPLIFY: Actionless Motion Priors for Robot Learning from Videos
Jeremy A. Collins, Lor\'and Cheng, Kunal Aneja, Albert Wilcox, Benjamin Joffe, Animesh Garg
arxiv.org/abs/2506.14198

@arXiv_eessIV_bot@mastoxiv.page
2025-06-18 08:48:42

Breaking the Multi-Enhancement Bottleneck: Domain-Consistent Quality Enhancement for Compressed Images
Qunliang Xing, Mai Xu, Jing Yang, Shengxi Li
arxiv.org/abs/2506.14152

@arXiv_csCV_bot@mastoxiv.page
2025-06-16 10:31:39

VGR: Visual Grounded Reasoning
Jiacong Wang, Zijiang Kang, Haochen Wang, Haiyong Jiang, Jiawen Li, Bohong Wu, Ya Wang, Jiao Ran, Xiao Liang, Chao Feng, Jun Xiao
arxiv.org/abs/2506.11991

@arXiv_eessAS_bot@mastoxiv.page
2025-06-12 08:17:01

A Study on Speech Assessment with Visual Cues
Shafique Ahmed, Ryandhimas E. Zezario, Nasir Saleem, Amir Hussain, Hsin-Min Wang, Yu Tsao
arxiv.org/abs/2506.09549

@DiverDoc@mstdn.ca
2025-05-14 16:25:50

However if I towel dry in the shower before getting out, and I am drying my face and hair, I instantly lose visual reference since the towel covers my eyes. So now I only have two points in space unless I use one hand on the towel, and one hand on a grab bar. (3)

@arXiv_csGR_bot@mastoxiv.page
2025-06-18 08:23:30

SkinCells: Sparse Skinning using Voronoi Cells
Egor Larionov, Igor Santesteban, Hsiao-yu Chen, Gene Lin, Philipp Herholz, Ryan Goldade, Ladislav Kavan, Doug Roble, Tuur Stuyck
arxiv.org/abs/2506.14714

@poppastring@dotnet.social
2025-06-11 19:35:30

A post from the archive 📫:
Using Visual Studio to search objects in a memory dump
poppastring.com/blog/using-vis

@arXiv_csHC_bot@mastoxiv.page
2025-06-19 08:20:24

Insights Informed Generative AI for Design: Incorporating Real-world Data for Text-to-Image Output
Richa Gupta, Alexander Htet Kyaw
arxiv.org/abs/2506.15008

@seeingwithsound@mas.to
2025-06-07 14:53:25

Visual mental imagery and #aphantasia lesions map onto a convergent brain network medrxiv.org/content/10.1101/20 by @…

@kurtsh@mastodon.social
2025-06-08 17:42:32

Wow. Total surprise for me... this game looks amazing... & free to play for Xbox Game Pass!
✅ Clockwork Revolution: inXile on Time Travel, Visual Reactivity, that Foulmouthed Doll, and More — Exclusive Interview - Xbox Wire
news.xbox.com/en-us/2025/06/0…

@arXiv_condmatsoft_bot@mastoxiv.page
2025-06-12 09:11:01

Binary Mixtures of Intelligent Active Brownian Particles with Visual Perception
Rajendra Singh Negi, Roland G. Winkler, Gerhard Gompper
arxiv.org/abs/2506.09698

@michabbb@social.vivaldi.net
2025-06-15 10:27:34

• 🔧 Extensible architecture via #PHP and #JavaScript plugins with lazy loading capabilities for performance
• 🎨 Visual #CMS functionality enabling live website content editing and drag & dr…

@arXiv_csCV_bot@mastoxiv.page
2025-06-18 09:10:45

3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting
Yuke Xing, Jiarui Wang, Peizhi Niu, Wenjie Huang, Guangtao Zhai, Yiling Xu
arxiv.org/abs/2506.14642

@arXiv_astrophEP_bot@mastoxiv.page
2025-06-11 08:46:15

Characterization of the Visual Binary TOI-6883AB and its dynamical implications for the planetary companion TOI-6883Ab
G. Conzo, F. Campos, F. Conti, I. Sharp
arxiv.org/abs/2506.08798

@arXiv_eessIV_bot@mastoxiv.page
2025-06-19 08:42:52

ABC: Adaptive BayesNet Structure Learning for Computational Scalable Multi-task Image Compression
Yufeng Zhang, Wenrui Dai, Hang Yu, Shizhan Liu, Junhui Hou, Jianguo Li, Weiyao Lin
arxiv.org/abs/2506.15228

@arXiv_csLG_bot@mastoxiv.page
2025-06-05 10:56:53

This arxiv.org/abs/2505.16933 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…

@arXiv_csSE_bot@mastoxiv.page
2025-06-17 11:08:01

DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models
Yunnong Chen, Shixian Ding, YingYing Zhang, Wenkai Chen, Jinzhou Du, Lingyun Sun, Liuqing Chen
arxiv.org/abs/2506.13663

@Mediagazer@mstdn.social
2025-06-04 10:10:51

How Vox built a huge YouTube presence, becoming an incubator that invented its own visual language, as ex-staffers like Johnny Harris grow their own channels (Simon Owens/The Long Story with Simon Owens)
thelongstory.substack.com/p/wh

@arXiv_csCV_bot@mastoxiv.page
2025-06-18 09:36:53

ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM
Yujun Wang, Jinhe Bi, Yunpu Ma, Soeren Pirk
arxiv.org/abs/2506.14766

@blakes7bot@mas.torpidity.net
2025-06-17 06:18:40

#Blakes7 Series B, Episode 02 - Shadow
ZEN: Information. Main visual is available. [displays Space City on screen.]
VILA: So?
ZEN: You expressed a desire to see what it is like.
blake.torpidity.net/m/202…

Claude Sonnet 4.0 describes the image as: "I can see this is a scene showing someone in white clothing examining what appears to be a complex technological device or control panel with a transparent casing. The device contains intricate circuitry, wiring, and small lights, suggesting it's some kind of advanced computer or navigation system. The setting appears to be inside a spaceship or advanced facility, with sleek metallic surfaces and futuristic architecture visible in the background. The p…
@arXiv_csHC_bot@mastoxiv.page
2025-06-19 08:22:39

Foundation of Affective Computing and Interaction
Changzeng Fu
arxiv.org/abs/2506.15497 arxiv.org/pdf/2506.15497

@arXiv_csMM_bot@mastoxiv.page
2025-06-13 07:53:50

Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics
Yi-Chun Chen
arxiv.org/abs/2506.10008

@arXiv_csSD_bot@mastoxiv.page
2025-06-17 09:59:29

ViSAGe: Video-to-Spatial Audio Generation
Jaeyeon Kim, Heeseung Yun, Gunhee Kim
arxiv.org/abs/2506.12199 arxiv.org/pd…

@arXiv_csRO_bot@mastoxiv.page
2025-06-13 08:43:00

In-Hand Object Pose Estimation via Visual-Tactile Fusion
Felix Nonnengie{\ss}er, Alap Kshirsagar, Boris Belousov, Jan Peters
arxiv.org/abs/2506.10787

@inthehands@hachyderm.io
2025-06-10 02:47:02

People keep making the same mistake, again and again and again and again forever, of thinking that it is syntax that makes software development hard.
Oh honey.
Re this from @mathaetaes:
infosec.exchange/@mathaetaes/1
(P.S. Visual coding is actually really cool, and IMO an underexplored PL design space — but is very much coding, and very much tricky for the same reasons as any other kind of coding.)

@arXiv_csGR_bot@mastoxiv.page
2025-06-18 08:21:10

ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies
Jinyan Yuan, Bangbang Yang, Keke Wang, Panwang Pan, Lin Ma, Xuehai Zhang, Xiao Liu, Zhaopeng Cui, Yuewen Ma
arxiv.org/abs/2506.14315

@arXiv_csCV_bot@mastoxiv.page
2025-06-18 09:07:50

VisText-Mosquito: A Multimodal Dataset and Benchmark for AI-Based Mosquito Breeding Site Detection and Reasoning
Md. Adnanul Islam, Md. Faiyaz Abdullah Sayeedi, Md. Asaduzzaman Shuvo, Muhammad Ziaur Rahman, Shahanur Rahman Bappy, Raiyan Rahman, Swakkhar Shatabda
arxiv.org/abs/2506.14629

@arXiv_csHC_bot@mastoxiv.page
2025-06-19 08:19:19

See What I Mean? CUE: A Cognitive Model of Understanding Explanations
Tobias Labarta, Nhi Hoang, Katharina Weitz, Wojciech Samek, Sebastian Lapuschkin, Leander Weber
arxiv.org/abs/2506.14775

@arXiv_csCR_bot@mastoxiv.page
2025-06-12 07:21:41

DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt
Yitong Zhang, Jia Li, Liyi Cai, Ge Li
arxiv.org/abs/2506.09353

@arXiv_csRO_bot@mastoxiv.page
2025-06-18 08:46:29

GAF: Gaussian Action Field as a Dvnamic World Model for Robotic Mlanipulation
Ying Chai, Litao Deng, Ruizhi Shao, Jiajun Zhang, Liangjun Xing, Hongwen Zhang, Yebin Liu
arxiv.org/abs/2506.14135

@seeingwithsound@mas.to
2025-06-09 20:45:48

To ChatGPT: Brain implants in visual cortex such as Neuralink Blindsight cannot directly convey visual textures, shading and smooth surfaces, because simultaneously activating many electrodes above phosphene threshold would cause seizures. Any solutions? chatgpt.com/share/68469685-71c

@arXiv_eessIV_bot@mastoxiv.page
2025-06-13 09:07:30

A novel visual data-based diagnostic approach for estimation of regime transition in pool boiling
Pranay Nirapure, Ayushman Singh, Srikanth Rangarajan, Bahgat Sammakia
arxiv.org/abs/2506.10832

@DiverDoc@mstdn.ca
2025-05-14 16:23:43

We all keep our balance, our awareness of position in space, by analysing feedback we get from receivers in our skin and joints (proprioceptors) and from our visual assessment of our position in space. I am a below knee #amputee, so I have lost the position sensors in my foot, ankle, and leg. I gauge the location of my leg in space through my knee and secondarily, through my hip. 🧵

@arXiv_csCV_bot@mastoxiv.page
2025-06-18 09:15:11

Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models
Ling Li, Yao Zhou, Yuxuan Liang, Fugee Tsung, Jiaheng Wei
arxiv.org/abs/2506.14674

@arXiv_csCV_bot@mastoxiv.page
2025-06-17 09:32:39

UniDet-D: A Unified Dynamic Spectral Attention Model for Object Detection under Adverse Weathers
Yuantao Wang, Haowei Yang, Wei Zhang, Shijian Lu
arxiv.org/abs/2506.12324

@arXiv_csSD_bot@mastoxiv.page
2025-06-17 10:18:09

Video-Guided Text-to-Music Generation Using Public Domain Movie Collections
Haven Kim, Zachary Novack, Weihan Xu, Julian McAuley, Hao-Wen Dong
arxiv.org/abs/2506.12573

@arXiv_qbioNC_bot@mastoxiv.page
2025-06-16 09:34:29

Sparse Autoencoders Bridge The Deep Learning Model and The Brain
Ziming Mao, Jia Xu, Zeqi Zheng, Haofang Zheng, Dabing Sheng, Yaochu Jin, Guoyuan Yang
arxiv.org/abs/2506.11123

@arXiv_csAI_bot@mastoxiv.page
2025-06-03 16:03:11

This arxiv.org/abs/2310.17451 has been replaced.
initial toot: mastoxiv.page/@arXiv_csAI_…

@arXiv_csCV_bot@mastoxiv.page
2025-06-09 10:06:42

Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
Zahra Babaiee, Peyman M. Kiasari, Daniela Rus, Radu Grosu
arxiv.org/abs/2506.06242

@arXiv_csRO_bot@mastoxiv.page
2025-06-16 08:01:29

Control Architecture and Design for a Multi-robotic Visual Servoing System in Automated Manufacturing Environment
Rongfei Li
arxiv.org/abs/2506.11387

@arXiv_csMM_bot@mastoxiv.page
2025-06-13 08:07:20

Can Sound Replace Vision in LLaVA With Token Substitution?
Ali Vosoughi, Jing Bi, Pinxin Liu, Yunlong Tang, Chenliang Xu
arxiv.org/abs/2506.10416

@DiverDoc@mstdn.ca
2025-05-14 16:25:16

Our body always wants to have three reference points since we live in 3D space. This can be a combination of two position sensors, and vision. Here is an interesting situation. When I shower, and am standing up, I place my normal leg and foot on the floor of the shower, and kneel on a shower stool. Combine that with my visual reference of my position in space, and I am golden. (2)

@poppastring@dotnet.social
2025-04-23 19:35:26

A post from the archive 📫:
Debug managed Linux core dumps with Visual Studio
poppastring.com/blog/debug-man

@arXiv_csRO_bot@mastoxiv.page
2025-06-16 07:50:49

Gondola: Grounded Vision Language Planning for Generalizable Robotic Manipulation
Shizhe Chen, Ricardo Garcia, Paul Pacaud, Cordelia Schmid
arxiv.org/abs/2506.11261

@blakes7bot@mas.torpidity.net
2025-06-15 12:15:12

Series B, Episode 03 - Weapon
COSER: Now, even if Security trace us to this planet, they'll assume the ship crashed and we died in the explosion. Did you hear what I said?
RASHEL: Yes.
COSER: Well?
RASHEL: It's a very clever plan, sir.
blake.torpidity.net/m/203/2

Claude Sonnet 4.0 describes the image as: "I can see two people in what appears to be a dramatic scene set in a barren, post-apocalyptic landscape. They're positioned among tall, dried grass with bare, twisted trees in the background, creating a desolate atmosphere. Both figures are wearing dark clothing that fits the somber tone of the setting. The scene has the distinctive visual style of 1970s British science fiction television, with its characteristic lighting and outdoor filming techniques…
@arXiv_csMM_bot@mastoxiv.page
2025-06-17 16:40:02

Replaced article(s) found for cs.MM. arxiv.org/list/cs.MM/new
[1/1]:
Multiverse Through Deepfakes: The MultiFakeVerse Dataset of Person-Centric Visual and Conceptual ...

@arXiv_csHC_bot@mastoxiv.page
2025-06-17 09:38:23

TermSight: Making Service Contracts Approachable
Ziheng Huang, Tal August, Hari Sundaram
arxiv.org/abs/2506.12332 arx…

@seeingwithsound@mas.to
2025-06-13 13:54:39

To ChatGPT: Write a pitch against the use of visual-to-auditory sensory substitution for the blind. chatgpt.com/share/684c2ab3-2c6 "Let's stop romanticizing sensory substitution and start prioritizing solutions that actu…

@arXiv_csSD_bot@mastoxiv.page
2025-06-04 07:42:10

Cross-attention and Self-attention for Audio-visual Speaker Diarization in MISP-Meeting Challenge
Zhaoyang Li, Haodong Zhou, Longjie Luo, Xiaoxiao Li, Yongxin Chen, Lin Li, Qingyang Hong
arxiv.org/abs/2506.02621

@arXiv_csRO_bot@mastoxiv.page
2025-06-13 08:00:00

Innovative Adaptive Imaged Based Visual Servoing Control of 6 DoFs Industrial Robot Manipulators
Rongfei Li, Francis Assadian
arxiv.org/abs/2506.10240

@arXiv_csHC_bot@mastoxiv.page
2025-06-11 07:52:35

Stop Misusing t-SNE and UMAP for Visual Analytics
Hyeon Jeon, Jeongin Park, Sungbok Shin, Jinwook Seo
arxiv.org/abs/2506.08725

@arXiv_csCV_bot@mastoxiv.page
2025-06-17 10:13:53

Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation
Runhao Zeng, Qi Deng, Ronghao Zhang, Shuaicheng Niu, Jian Chen, Xiping Hu, Victor C. M. Leung
arxiv.org/abs/2506.12481

@arXiv_csRO_bot@mastoxiv.page
2025-06-13 08:02:40

A Novel Feedforward Youla Parameterization Method for Avoiding Local Minima in Stereo Image Based Visual Servoing Control
Rongfei Li, Francis Assadian
arxiv.org/abs/2506.10252

@seeingwithsound@mas.to
2025-05-26 20:49:28

Object knowledge representation in the human visual cortex requires a connection with the language system journals.plos.org/plosbiology/ "Our experiments reveal the contribution of the vision-la…

@arXiv_csSD_bot@mastoxiv.page
2025-06-03 07:30:02

$\texttt{AVROBUSTBENCH}$: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time
Sarthak Kumar Maharana, Saksham Singh Kushwaha, Baoming Zhang, Adrian Rodriguez, Songtao Wei, Yapeng Tian, Yunhui Guo
arxiv.org/abs/2506.00358

@arXiv_csCV_bot@mastoxiv.page
2025-06-10 19:03:31

This arxiv.org/abs/2505.18675 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCV_…

@seeingwithsound@mas.to
2025-06-11 20:41:39

Hotel stays of individuals with a visual impairment: a qualitative study with a focus on sensory substitution tandfonline.com/doi/full/10.10 "Sensory substitution devices (SSDs) hold the potential to aid individuals …

@arXiv_csCV_bot@mastoxiv.page
2025-06-10 19:04:31

This arxiv.org/abs/2505.18700 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCV_…

@arXiv_csCV_bot@mastoxiv.page
2025-06-10 19:09:11

This arxiv.org/abs/2506.03589 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCV_…

@arXiv_csRO_bot@mastoxiv.page
2025-06-06 09:47:17

This arxiv.org/abs/2505.03448 has been replaced.
initial toot: mastoxiv.page/@arXiv_csRO_…

@arXiv_csRO_bot@mastoxiv.page
2025-06-03 08:05:41

SEMNAV: A Semantic Segmentation-Driven Approach to Visual Semantic Navigation
Rafael Flor-Rodr\'iguez, Carlos Guti\'errez-\'Alvarez, Francisco Javier Acevedo-Rodr\'iguez, Sergio Lafuente-Arroyo, Roberto J. L\'opez-Sastre
arxiv.org/abs/2506.01418

@arXiv_csCV_bot@mastoxiv.page
2025-06-04 14:53:46

This arxiv.org/abs/2505.19028 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCV_…

@arXiv_csCV_bot@mastoxiv.page
2025-06-09 10:05:32

GenIR: Generative Visual Feedback for Mental Image Retrieval
Diji Yang, Minghao Liu, Chung-Hsiang Lo, Yi Zhang, James Davis
arxiv.org/abs/2506.06220

@arXiv_csCV_bot@mastoxiv.page
2025-06-10 19:05:31

This arxiv.org/abs/2505.21036 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCV_…

@arXiv_csCV_bot@mastoxiv.page
2025-06-10 19:03:11

This arxiv.org/abs/2505.18668 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCV_…

@arXiv_csCV_bot@mastoxiv.page
2025-06-10 19:02:21

This arxiv.org/abs/2505.17132 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCV_…

@arXiv_csCV_bot@mastoxiv.page
2025-06-09 10:09:22

CoMemo: LVLMs Need Image Context with Image Memory
Shi Liu, Weijie Su, Xizhou Zhu, Wenhai Wang, Jifeng Dai
arxiv.org/abs/2506.06279