Tootfinder

Opt-in global Mastodon full text search. Join the index!

@mia@hcommons.social
2025-08-19 13:42:42

On small, local language models: 'In a world increasingly dominated by massive models and opaque APIs, we believe there’s still room for small, transparent, controllable systems. Models you can fine-tune, understand and run on your own terms' turing.ac.uk…

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 09:20:39

MAPF-World: Action World Model for Multi-Agent Path Finding
Zhanjiang Yang, Meng Li, Yang Shen, Yueming Li, Lijun Sun
arxiv.org/abs/2508.12087

@arXiv_csCV_bot@mastoxiv.page
2025-08-19 12:06:30

Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang, Yifan Zhang, Qi Cui, Fei Kang, Biao Jiang, Mengyin An, Yangyang Ren, Baixin Xu, Hao-Xiang Guo, Kaixiong Gong, Cyrus Wu, Wei Li, Xuchen Song, Yang Liu, Eric Li, Yahui Zhou
arxiv.org/abs/2508.13009

@arXiv_csLG_bot@mastoxiv.page
2025-09-19 10:39:11

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
Yujun Zhou, Zhenwen Liang, Haolin Liu, Wenhao Yu, Kishan Panaganti, Linfeng Song, Dian Yu, Xiangliang Zhang, Haitao Mi, Dong Yu
arxiv.org/abs/2509.15194

@arXiv_csSE_bot@mastoxiv.page
2025-09-19 08:23:21

SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems
Xifeng Yao, Dongyu Lang, Wu Zhang, Xintong Guo, Huarui Xie, Yinhao Ni, Ping Liu, Guang Shen, Yi Bai, Dandan Tu, Changzheng Zhang
arxiv.org/abs/2509.14281

@arXiv_csRO_bot@mastoxiv.page
2025-07-18 07:46:02

FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making
Yucen Wang, Rui Yu, Shenghua Wan, Le Gan, De-Chuan Zhan
arxiv.org/abs/2507.12496

@arXiv_csCL_bot@mastoxiv.page
2025-08-19 11:38:40

Leveraging Large Language Models for Predictive Analysis of Human Misery
Bishanka Seal, Rahul Seetharaman, Aman Bansal, Abhilash Nandy
arxiv.org/abs/2508.12669

@Techmeme@techhub.social
2025-07-19 15:10:58

How an open-source approach helped DeepSeek and other Chinese AI companies; Hugging Face: Alibaba's Qwen is now the world's largest open-source AI ecosystem (South China Morning Post)
scmp.com/tech/big-tech/article

@arXiv_csHC_bot@mastoxiv.page
2025-09-19 09:03:51

VisMoDAl: Visual Analytics for Evaluating and Improving Corruption Robustness of Vision-Language Models
Huanchen Wang, Wencheng Zhang, Zhiqiang Wang, Zhicong Lu, Yuxin Ma
arxiv.org/abs/2509.14571

@Mediagazer@mstdn.social
2025-09-19 10:21:04

Q&A with CEO Cristóbal Valenzuela on Runway's "world models" breakthrough, how it differs from typical AI video generation, the Lionsgate partnership, and more (Cristina Criddle/Financial Times)

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:00:30

RLNVR: Reinforcement Learning from Non-Verified Real-World Rewards
Rohit Krishnan, Jon Evans
arxiv.org/abs/2508.12165 arxiv.org/pdf/2508.12…

@arXiv_csCV_bot@mastoxiv.page
2025-08-20 10:15:10

Enhancing Targeted Adversarial Attacks on Large Vision-Language Models through Intermediate Projector Guidance
Yiming Cao, Yanjie Li, Kaisheng Liang, Yuni Lai, Bin Xiao
arxiv.org/abs/2508.13739

@privacity@social.linux.pizza
2025-08-19 19:16:17

Highlights from FPF’s July 2025 Technologist Roundtable: AI Unlearning and Technical Guardrails
fpf.org/blog/highlights-from-f

@arXiv_csCL_bot@mastoxiv.page
2025-08-20 09:36:51

ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions?
Vy Tuong Dang, An Vo, Quang Tau, Duc Dm, Daeyoung Kim
arxiv.org/abs/2508.13680

@arXiv_csLG_bot@mastoxiv.page
2025-09-19 10:31:11

Credit Card Fraud Detection
Iva Popova, Hamza A. A. Gardi
arxiv.org/abs/2509.15044 arxiv.org/pdf/2509.15044

@arXiv_csCR_bot@mastoxiv.page
2025-08-20 07:51:40

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
Yixuan Yang, Daoyuan Wu, Yufan Chen
arxiv.org/abs/2508.13220

@arXiv_csSE_bot@mastoxiv.page
2025-08-19 10:03:50

Strengthening Programming Comprehension in Large Language Models through Code Generation
Xiaoning Ren, Qiang Hu, Wei Ma, Yan Li, Yao Zhang, Lingxiao Jiang, Yinxing Xue
arxiv.org/abs/2508.12620

@arXiv_quantph_bot@mastoxiv.page
2025-08-20 09:37:30

Enhanced Sensitivity and Noise Resilience in Two-Qubit Quantum Magnetometers
S. Nohekhan Shishavan, K. Aghayar Gharehbagh, H. Sedgi Gamichi
arxiv.org/abs/2508.13400

@arXiv_mathOC_bot@mastoxiv.page
2025-08-20 09:22:30

Online Stochastic Packing with General Correlations
Sabri Cetin, Yilun Chen, David A. Goldberg
arxiv.org/abs/2508.13458 arxiv.org/pdf/2508.…

@Techmeme@techhub.social
2025-07-16 04:55:53

Jensen Huang hailed AI models from DeepSeek, Alibaba, and Tencent as "world class" at a Beijing expo and said US licenses for H20 chips "will come very fast" (Reuters)
reuters.com/world/china/nvidia

@arXiv_csIR_bot@mastoxiv.page
2025-08-19 09:30:39

Diagnostic-Guided Dynamic Profile Optimization for LLM-based User Simulators in Sequential Recommendation
Hongyang Liu, Zhu Sun, Tianjun Wei, Yan Wang, Jiajie Zhu, Xinghua Qu
arxiv.org/abs/2508.12645

@arXiv_csIT_bot@mastoxiv.page
2025-08-19 08:58:30

Deep Semantic Inference over the Air: An Efficient Task-Oriented Communication System
Chenyang Wang, Roger Olsson, Stefan Forsstr\"om, Qing He
arxiv.org/abs/2508.12748

@arXiv_csRO_bot@mastoxiv.page
2025-08-19 11:28:30

Grounding Actions in Camera Space: Observation-Centric Vision-Language-Action Policy
Tianyi Zhang, Haonan Duan, Haoran Hao, Yu Qiao, Jifeng Dai, Zhi Hou
arxiv.org/abs/2508.13103

@arXiv_statME_bot@mastoxiv.page
2025-08-19 09:59:00

A Systematic Particle Filter for Estimating Time-Varying Parameters in Advection-Diffusion Equations with Source Terms
Andrea Arnold
arxiv.org/abs/2508.12155

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 11:16:00

G$^2$RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance
Yongxin Guo, Wenbo Deng, Zhenglin Cheng, Xiaoying Tang
arxiv.org/abs/2508.13023

@arXiv_csSD_bot@mastoxiv.page
2025-09-19 07:40:41

Deploying UDM Series in Real-Life Stuttered Speech Applications: A Clinical Evaluation Framework
Eric Zhang (SSHealth Team, AI for Healthcare Laboratory), Li Wei (SSHealth Team, AI for Healthcare Laboratory), Sarah Chen (SSHealth Team, AI for Healthcare Laboratory), Michael Wang (SSHealth Team, AI for Healthcare Laboratory)
arxiv.o…

@arXiv_csCV_bot@mastoxiv.page
2025-07-18 10:19:42

Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models
Arian Mousakhan, Sudhanshu Mittal, Silvio Galesso, Karim Farid, Thomas Brox
arxiv.org/abs/2507.13162

@arXiv_csGR_bot@mastoxiv.page
2025-09-19 08:18:21

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
Chenxi Song, Yanming Yang, Tong Zhao, Ruibo Li, Chi Zhang
arxiv.org/abs/2509.15130

@arXiv_csCL_bot@mastoxiv.page
2025-08-19 11:41:00

CRED-SQL: Enhancing Real-world Large Scale Database Text-to-SQL Parsing through Cluster Retrieval and Execution Description
Shaoming Duan, Zirui Wang, Chuanyi Liu, Zhibin Zhu, Yuhao Zhang, Peiyi Han, Liang Yan, Zewu Penge
arxiv.org/abs/2508.12769

@arXiv_csMA_bot@mastoxiv.page
2025-08-20 07:53:10

Self-Organizing Agent Network for LLM-based Workflow Automation
Yiming Xiong, Jian Wang, Bing Li, Yuhan Zhu, Yuqi Zhao
arxiv.org/abs/2508.13732

@arXiv_condmatsoft_bot@mastoxiv.page
2025-09-19 09:30:31

A General Model for Static Contact Angles
Carlos E Colosqui
arxiv.org/abs/2509.14692 arxiv.org/pdf/2509.14692

@arXiv_mathPR_bot@mastoxiv.page
2025-08-19 10:41:40

Benford behavior resulting from stick and box fragmentation processes
Bruce Fang, Steven J. Miller
arxiv.org/abs/2508.12915 arxiv.org/pdf/2…

@arXiv_eessSP_bot@mastoxiv.page
2025-07-17 07:45:30

Foundation Models for Brain Signals: A Critical Review of Current Progress and Future Directions
Gayal Kuruppu, Neeraj Wagh, Yogatheesan Varatharajah
arxiv.org/abs/2507.11783

@muz4now@mastodon.world
2025-08-08 14:38:03

Check how your mix sounds through multiple popular headphone models with Kali Audio’s new HP-1 Multi-Reference Headphones
#MusicTech #MusicianTips

@arXiv_csAI_bot@mastoxiv.page
2025-08-18 08:39:00

Inclusion Arena: An Open Platform for Evaluating Large Foundation Models with Real-World Apps
Kangyu Wang, Hongliang He, Lin Liu, Ruiqi Liang, Zhenzhong Lan, Jianguo Li
arxiv.org/abs/2508.11452

@arXiv_csSE_bot@mastoxiv.page
2025-08-20 07:48:50

COMPASS: A Multi-Dimensional Benchmark for Evaluating Code Generation in Large Language Models
James Meaden, Micha{\l} Jarosz, Piotr Jod{\l}owski, Grigori Melnik
arxiv.org/abs/2508.13757

@arXiv_csRO_bot@mastoxiv.page
2025-09-19 10:12:41

Robot Control Stack: A Lean Ecosystem for Robot Learning at Scale
Tobias J\"ulg, Pierre Krack, Seongjin Bien, Yannik Blei, Khaled Gamal, Ken Nakahara, Johannes Hechtl, Roberto Calandra, Wolfram Burgard, Florian Walter
arxiv.org/abs/2509.14932

@arXiv_csCE_bot@mastoxiv.page
2025-07-16 08:16:01

Data-Driven Differential Evolution in Tire Industry Extrusion: Leveraging Surrogate Models
Eider Garate-Perez, Kerman L\'opez de Calle-Etxabe, Susana Ferreiro
arxiv.org/abs/2507.11191

@arXiv_csHC_bot@mastoxiv.page
2025-09-19 09:47:51

An Evaluation-Centric Paradigm for Scientific Visualization Agents
Kuangshi Ai, Haichao Miao, Zhimin Li, Chaoli Wang, Shusen Liu
arxiv.org/abs/2509.15160

@Techmeme@techhub.social
2025-08-11 22:30:41

Nvidia debuts new Omniverse SDKs and Cosmos world foundation models for robotics devs, including Cosmos Reason, a 7B-parameter reasoning vision language model (Rebecca Szkutak/TechCrunch)
techcrunch.com/2025/08/11/nvid

@arXiv_csCV_bot@mastoxiv.page
2025-08-19 12:05:10

Breaking Reward Collapse: Adaptive Reinforcement for Open-ended Medical Reasoning with Enhanced Semantic Discrimination
Yizhou Liu, Jingwei Wei, Zizhi Chen, Minghao Han, Xukun Zhang, Keliang Liu, Lihua Zhang
arxiv.org/abs/2508.12957

@arXiv_csIR_bot@mastoxiv.page
2025-08-19 08:21:20

A Large-Scale Web Search Dataset for Federated Online Learning to Rank
Marcel Gregoriadis, Jingwei Kang, Johan Pouwelse
arxiv.org/abs/2508.12353

@arXiv_csAI_bot@mastoxiv.page
2025-09-18 09:08:41

From Next Token Prediction to (STRIPS) World Models -- Preliminary Results
Carlos N\'u\~nez-Molina, Vicen\c{c} G\'omez, Hector Geffner
arxiv.org/abs/2509.13389

@arXiv_csCL_bot@mastoxiv.page
2025-07-18 07:32:32

Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models
Lionel Wong, Katherine M. Collins, Lance Ying, Cedegao E. Zhang, Adrian Weller, Tobias Gersternberg, Timothy O'Donnell, Alexander K. Lew, Jacob D. Andreas, Joshua B. Tenenbaum, Tyler Brooke-Wilson
arxiv.org/abs/2507.12547

@arXiv_csSE_bot@mastoxiv.page
2025-09-19 09:49:41

CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects
Hanyang Guo, Xunjin Zheng, Zihan Liao, Hang Yu, Peng DI, Ziyin Zhang, Hong-Ning Dai
arxiv.org/abs/2509.14856

@arXiv_csRO_bot@mastoxiv.page
2025-07-18 09:55:12

Latent Policy Steering with Embodiment-Agnostic Pretrained World Models
Yiqi Wang, Mrinal Verghese, Jeff Schneider
arxiv.org/abs/2507.13340

@Techmeme@techhub.social
2025-09-18 15:20:43

Q&A with CEO Cristóbal Valenzuela on Runway's "world models" breakthrough, how it differs from typical AI video generation, the Lionsgate partnership, and more (Cristina Criddle/Financial Times)

@arXiv_csSD_bot@mastoxiv.page
2025-09-17 10:02:09

Can Large Audio Language Models Understand Audio Well? Speech, Scene and Events Understanding Benchmark for LALMs
Han Yin, Jung-Woo Choi
arxiv.org/abs/2509.13148

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:21:31

Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
Weihang Wang, Xinhao Li, Ziyue Wang, Yan Pang, Jielei Zhang, Peiyi Li, Qiang Zhang, Longwen Gao
arxiv.org/abs/2509.13836

@arXiv_csIR_bot@mastoxiv.page
2025-07-17 08:06:40

Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control
Anton Klenitskiy, Konstantin Polev, Daria Denisova, Alexey Vasilev, Dmitry Simakov, Gleb Gusev
arxiv.org/abs/2507.12202

@arXiv_csCL_bot@mastoxiv.page
2025-09-18 10:00:21

Large Language Models Discriminate Against Speakers of German Dialects
Minh Duc Bui, Carolin Holtermann, Valentin Hofmann, Anne Lauscher, Katharina von der Wense
arxiv.org/abs/2509.13835

@arXiv_csAI_bot@mastoxiv.page
2025-09-18 07:45:41

Imagined Autocurricula
Ahmet H. G\"uzel, Matthew Thomas Jackson, Jarek Luca Liesen, Tim Rockt\"aschel, Jakob Nicolaus Foerster, Ilija Bogunovic, Jack Parker-Holder
arxiv.org/abs/2509.13341

@arXiv_csRO_bot@mastoxiv.page
2025-08-18 09:08:30

Visuomotor Grasping with World Models for Surgical Robots
Hongbin Lin, Bin Li, Kwok Wai Samuel Au
arxiv.org/abs/2508.11200 arxiv.org/pdf/25…

@arXiv_csSE_bot@mastoxiv.page
2025-07-17 09:34:10

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?
Xinyi He, Qian Liu, Mingzhe Du, Lin Yan, Zhijie Fan, Yiming Huang, Zejian Yuan, Zejun Ma
arxiv.org/abs/2507.12415

@arXiv_csCL_bot@mastoxiv.page
2025-08-20 09:36:30

CRISP: Persistent Concept Unlearning via Sparse Autoencoders
Tomer Ashuach, Dana Arad, Aaron Mueller, Martin Tutek, Yonatan Belinkov
arxiv.org/abs/2508.13650

@arXiv_csCV_bot@mastoxiv.page
2025-08-20 10:18:40

RICO: Two Realistic Benchmarks and an In-Depth Analysis for Incremental Learning in Object Detection
Matthias Neuwirth-Trapp, Maarten Bieshaar, Danda Pani Paudel, Luc Van Gool
arxiv.org/abs/2508.13878

@arXiv_csRO_bot@mastoxiv.page
2025-09-18 10:06:31

PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models
Artem Lykov, Jeffrin Sam, Hung Khang Nguyen, Vladislav Kozlovskiy, Yara Mahmoud, Valerii Serpiva, Miguel Altamirano Cabrera, Mikhail Konenkov, Dzmitry Tsetserukou
arxiv.org/abs/2509.13903

@arXiv_csSE_bot@mastoxiv.page
2025-09-18 08:46:51

Crash Report Enhancement with Large Language Models: An Empirical Study
S M Farah Al Fahim (Peter), Md Nakhla Rafi (Peter), Zeyang Ma (Peter), Dong Jae Kim (Peter), Tse-Hsun (Peter), Chen
arxiv.org/abs/2509.13535

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:21:40

GraphCogent: Overcoming LLMs' Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding
Rongzheng Wang, Qizhi Chen, Yihong Huang, Yizhuo Ma, Muquan Li, Jiakai Li, Ke Qin, Guangchun Luo, Shuang Liang
arxiv.org/abs/2508.12379

@arXiv_csCL_bot@mastoxiv.page
2025-09-19 10:23:51

Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration
Haoran Zhang, Yafu Li, Xuyang Hu, Dongrui Liu, Zhilin Wang, Bo Li, Yu Cheng
arxiv.org/abs/2509.14760

@arXiv_csHC_bot@mastoxiv.page
2025-09-16 10:08:56

The Siren Song of LLMs: How Users Perceive and Respond to Dark Patterns in Large Language Models
Yike Shi (Diane), Qing Xiao (Diane), Qing (Diane), Hu, Hong Shen, Hua Shen
arxiv.org/abs/2509.10830

@arXiv_csCV_bot@mastoxiv.page
2025-09-19 10:18:21

Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification
Xiang Tuo, Xu Xuemiao, Liu Bangzhen, Li Jinyi, Li Yong, He Shengfeng
arxiv.org/abs/2509.14958

@arXiv_csLG_bot@mastoxiv.page
2025-08-15 10:08:12

Driving Accurate Allergen Prediction with Protein Language Models and Generalization-Focused Evaluation
Brian Shing-Hei Wong, Joshua Mincheol Kim, Sin-Hang Fung, Qing Xiong, Kelvin Fu-Kiu Ao, Junkang Wei, Ran Wang, Dan Michelle Wang, Jingying Zhou, Bo Feng, Alfred Sze-Lok Cheng, Kevin Y. Yip, Stephen Kwok-Wing Tsui, Qin Cao
arxiv.o…

@arXiv_csRO_bot@mastoxiv.page
2025-09-18 09:56:31

Dual-Actor Fine-Tuning of VLA Models: A Talk-and-Tweak Human-in-the-Loop Approach
Piaopiao Jin, Qi Wang, Guokang Sun, Ziwen Cai, Pinjia He, Yangwei You
arxiv.org/abs/2509.13774

@Techmeme@techhub.social
2025-08-17 15:55:49

NYC-based Protege, which prepares and sells real-world datasets like lab results and sports footage for AI training, raised a $25M Series A led by Footwork (Natasha Mascarenhas/The Information)
theinformation.com/articles/on

@arXiv_csSE_bot@mastoxiv.page
2025-09-19 09:46:31

On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub
Miku Watanabe, Hao Li, Yutaro Kashiwa, Brittany Reid, Hajimu Iida, Ahmed E. Hassan
arxiv.org/abs/2509.14745

@arXiv_csCV_bot@mastoxiv.page
2025-08-20 10:16:30

VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization
Jailing Lin, Shu Jiang, Qingyuan Zeng, Zhenzhong Wang, Min Jiang
arxiv.org/abs/2508.13792

@arXiv_csCL_bot@mastoxiv.page
2025-08-19 11:42:40

HeteroRAG: A Heterogeneous Retrieval-Augmented Generation Framework for Medical Vision Language Tasks
Zhe Chen, Yusheng Liao, Shuyang Jiang, Zhiyuan Zhu, Haolin Li, Yanfeng Wang, Yu Wang
arxiv.org/abs/2508.12778

@arXiv_csCL_bot@mastoxiv.page
2025-08-20 08:18:59

ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs
Hongxin Ding, Baixiang Huang, Yue Fang, Weibin Liao, Xinke Jiang, Zheng Li, Junfeng Zhao, Yasha Wang
arxiv.org/abs/2508.13514

@arXiv_csHC_bot@mastoxiv.page
2025-09-18 08:56:51

DuetUI: A Bidirectional Context Loop for Human-Agent Co-Generation of Task-Oriented Interfaces
Yuan Xu, Shaowen Xiang, Yizhi Song, Ruoting Sun, Xin Tong
arxiv.org/abs/2509.13444

@arXiv_csCV_bot@mastoxiv.page
2025-09-19 10:25:41

Synthetic-to-Real Object Detection using YOLOv11 and Domain Randomization Strategies
Luisa Torquato Ni\~no, Hamza A. A. Gardi
arxiv.org/abs/2509.15045

@arXiv_csRO_bot@mastoxiv.page
2025-09-17 10:41:40

Empowering Multi-Robot Cooperation via Sequential World Models
Zijie Zhao, Honglei Guo, Shengqian Chen, Kaixuan Xu, Bo Jiang, Yuanheng Zhu, Dongbin Zhao
arxiv.org/abs/2509.13095

@arXiv_csCL_bot@mastoxiv.page
2025-08-20 09:58:40

Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization
Shaohua Duan, Xinze Li, Zhenghao Liu, Xiaoyuan Yi, Yukun Yan, Shuo Wang, Yu Gu, Ge Yu, Maosong Sun
arxiv.org/abs/2508.13993

@arXiv_csCL_bot@mastoxiv.page
2025-08-19 11:38:50

ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction
Xingshan Zeng, Weiwen Liu, Lingzhi Wang, Liangyou Li, Fei Mi, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu
arxiv.org/abs/2508.12685

@arXiv_csRO_bot@mastoxiv.page
2025-07-18 08:23:12

VLMgineer: Vision Language Models as Robotic Toolsmiths
George Jiayuan Gao, Tianyu Li, Junyao Shi, Yihan Li, Zizhe Zhang, Nadia Figueroa, Dinesh Jayaraman
arxiv.org/abs/2507.12644

@arXiv_csCV_bot@mastoxiv.page
2025-09-16 12:46:47

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Yang Zhou, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Haoyu Guo, Zizun Li, Kaijing Ma, Xinyue Li, Yating Wang, Haoyi Zhu, Mingyu Liu, Dingning Liu, Jiange Yang, Zhoujie Fu, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Kaipeng Zhang, Tong He
arxiv.org/abs/2509.12201…

@arXiv_csSE_bot@mastoxiv.page
2025-09-17 08:49:59

When Large Language Models Meet UAVs: How Far Are We?
Yihua Chen, Xingle Que, Jiashuo Zhang, Ting Chen, Guangshun Li, Jiachi Chen
arxiv.org/abs/2509.12795

@arXiv_csCL_bot@mastoxiv.page
2025-09-19 10:37:01

TextMine: LLM-Powered Knowledge Extraction for Humanitarian Mine Action
Chenyue Zhou, G\"urkan Solmaz, Flavio Cirillo, Kiril Gashteovski, Jonathan F\"urst
arxiv.org/abs/2509.15098

@arXiv_csRO_bot@mastoxiv.page
2025-08-18 09:24:20

EvoPSF: Online Evolution of Autonomous Driving Models via Planning-State Feedback
Jiayue Jin, Lang Qian, Jingyu Zhang, Chuanyu Ju, Liang Song
arxiv.org/abs/2508.11453

@arXiv_csCV_bot@mastoxiv.page
2025-08-18 09:52:00

ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving
Jingyu Li, Bozhou Zhang, Xin Jin, Jiankang Deng, Xiatian Zhu, Li Zhang
arxiv.org/abs/2508.11428

@arXiv_csCV_bot@mastoxiv.page
2025-07-18 10:22:32

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Senqiao Yang, Junyi Li, Xin Lai, Bei Yu, Hengshuang Zhao, Jiaya Jia
arxiv.org/abs/2507.13348

@arXiv_csRO_bot@mastoxiv.page
2025-09-18 10:02:41

Agile in the Face of Delay: Asynchronous End-to-End Learning for Real-World Aerial Navigation
Yude Li, Zhexuan Zhou, Huizhe Li, Youmin Gong, Jie Mei
arxiv.org/abs/2509.13816

@arXiv_csAI_bot@mastoxiv.page
2025-09-08 07:39:39

Language-Driven Hierarchical Task Structures as Explicit World Models for Multi-Agent Learning
Brennen Hill
arxiv.org/abs/2509.04731 arxiv.…

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:24:41

VSE-MOT: Multi-Object Tracking in Low-Quality Video Scenes Guided by Visual Semantic Enhancement
Jun Du, Weiwei Xing, Ming Li, Fei Richard Yu
arxiv.org/abs/2509.14060

@arXiv_csCL_bot@mastoxiv.page
2025-07-18 09:59:32

Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes
Tyler Loakman, William Thorne, Chenghua Lin
arxiv.org/abs/2507.13335

@arXiv_csCV_bot@mastoxiv.page
2025-08-15 10:23:22

From Diagnosis to Improvement: Probing Spatio-Physical Reasoning in Vision Language Models
Tiancheng Han, Yunfei Gao, Yong Li, Wuzhou Yu, Qiaosheng Zhang, Wenqi Shao
arxiv.org/abs/2508.10770

@arXiv_csSE_bot@mastoxiv.page
2025-07-16 10:04:11

An Empirical Study of Multi-Agent RAG for Real-World University Admissions Counseling
Anh Nguyen-Duc, Chien Vu Manh, Bao Anh Tran, Viet Phuong Ngo, Luan Le Chi, Anh Quang Nguyen
arxiv.org/abs/2507.11272

@arXiv_csRO_bot@mastoxiv.page
2025-07-16 10:22:31

Acting and Planning with Hierarchical Operational Models on a Mobile Robot: A Study with RAE UPOM
Oscar Lima, Marc Vinci, Sunandita Patra, Sebastian Stock, Joachim Hertzberg, Martin Atzmueller, Malik Ghallab, Dana Nau, Paolo Traverso
arxiv.org/abs/2507.11345

@arXiv_csCL_bot@mastoxiv.page
2025-09-17 10:38:00

Evaluating LLM Alignment on Personality Inference from Real-World Interview Data
Jianfeng Zhu, Julina Maharjan, Xinyu Li, Karin G. Coifman, Ruoming Jin
arxiv.org/abs/2509.13244

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:25:51

An Exploratory Study on Abstract Images and Visual Representations Learned from Them
Haotian Li, Jianbo Jiao
arxiv.org/abs/2509.14149 arxiv…

@arXiv_csCL_bot@mastoxiv.page
2025-09-18 08:49:31

Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMs
Zhuoxuan Zhang, Jinhao Duan, Edward Kim, Kaidi Xu
arxiv.org/abs/2509.13664

@arXiv_csCL_bot@mastoxiv.page
2025-09-17 10:40:30

Towards General Agentic Intelligence via Environment Scaling
Runnan Fang, Shihao Cai, Baixuan Li, Jialong Wu, Guangyu Li, Wenbiao Yin, Xinyu Wang, Xiaobin Wang, Liangcai Su, Zhen Zhang, Shibin Wu, Zhengwei Tao, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou
arxiv.org/abs/2509.13311

@arXiv_csCV_bot@mastoxiv.page
2025-09-18 10:25:41

MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Peng Xu, Shengwu Xiong, Jiajun Zhang, Yaxiong Chen, Bowen Zhou, Chen Change Loy, David A. Clifton, Kyoung Mu Lee, Luc Van Gool, Ruiming He, Ruilin Yao, Xinwei Long, Jirui Huang, Kai Tian, Sa Yang, Yihua Shao, Jin Feng, Yue Zhong, Jiakai Zhou, Cheng Tang, Tianyu Zou, Yifang Zhang, Junming Liang, Guoyou Li, Zhaoxiang Wang, Qiang Zhou, Yichen Zhao, Shili Xiong, Hyeongjin Nam, Jaerin Lee, Jaey…

@arXiv_csCL_bot@mastoxiv.page
2025-08-18 09:13:30

Towards Reliable Multi-Agent Systems for Marketing Applications via Reflection, Memory, and Planning
Lorenzo Jaime Yu Flores, Junyi Shen, Xiaoyuan Gu
arxiv.org/abs/2508.11120

@arXiv_csCV_bot@mastoxiv.page
2025-08-13 10:19:52

VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception
Fuhao Chang, Shuxin Li, Yabei Li, Lei He
arxiv.org/abs/2508.09061 arx…

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 09:58:51

Realism Control One-step Diffusion for Real-World Image Super-Resolution
Zongliang Wu, Siming Zheng, Peng-Tao Jiang, Xin Yuan
arxiv.org/abs/2509.10122

@arXiv_csCV_bot@mastoxiv.page
2025-08-15 10:22:22

EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering
Yanjun Li, Yuqian Fu, Tianwen Qian, Qi'ao Xu, Silong Dai, Danda Pani Paudel, Luc Van Gool, Xiaoling Wang
arxiv.org/abs/2508.10729

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 09:54:11

LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA
Jing Huang, Zhiya Tan, Shutao Gong, Fanwei Zeng, Jianshu Li
arxiv.org/abs/2509.10026

@arXiv_csCV_bot@mastoxiv.page
2025-07-14 10:02:52

DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images
Haoran Sun, Haoyu Bian, Shaoning Zeng, Yunbo Rao, Xu Xu, Lin Mei, Jianping Gou
arxiv.org/abs/2507.08648