
2025-09-22 10:35:31
Language-Instructed Reasoning for Group Activity Detection via Multimodal Large Language Model
Jihua Peng, Qianxiong Xu, Yichen Liu, Chenxi Liu, Cheng Long, Rui Zhao, Ziyue Li
https://arxiv.org/abs/2509.16054
Language-Instructed Reasoning for Group Activity Detection via Multimodal Large Language Model
Jihua Peng, Qianxiong Xu, Yichen Liu, Chenxi Liu, Cheng Long, Rui Zhao, Ziyue Li
https://arxiv.org/abs/2509.16054
Tutorial on the Probabilistic Unification of Estimation Theory, Machine Learning, and Generative AI
Mohammed Elmusrati
https://arxiv.org/abs/2508.15719 https://
Synergizing Static Analysis with Large Language Models for Vulnerability Discovery and beyond
Vaibhav Agrawal, Kiarash Ahi
https://arxiv.org/abs/2509.15433 https://
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/7]:
- Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Spee...
Jeong Hun Yeo, Minsu Kim, Chae Won Kim, Stavros Petridis, Yong Man Ro
Man I hate the new language for a task one doesn't want to do — "aversive”.
Primary reason is that it externalizes the problem. It's fixed mindset framing, that something _else_ has to change to make the situation better, that it's the task's fault for being aversive, instead of our own for not working on the aversion to the needed task.
We have a bad habit lately of attributing relational traits to the party in a relationship, rather than to the relationship itself and it weakens our thinking about things. Noticing this pattern really changes how you think, because you can start attributing things better and noticing the contexts — and contexts are often things that can be changed!
this should keep yous out of trouble for a while :luna_moth: 🎹
#liveCoding
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[5/7]:
- Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distrib...
Behraj Khan, Tahir Qasim Syed, Nouman M. Durrani, Bilal Naseem, Shabir Ahmad, Rizwan Qureshi
…
Reverse Physician-AI Relationship: Full-process Clinical Diagnosis Driven by a Large Language Model
Shicheng Xu, Xin Huang, Zihao Wei, Liang Pang, Huawei Shen, Xueqi Cheng
https://arxiv.org/abs/2508.10492
Diagnostic-Guided Dynamic Profile Optimization for LLM-based User Simulators in Sequential Recommendation
Hongyang Liu, Zhu Sun, Tianjun Wei, Yan Wang, Jiajie Zhu, Xinghua Qu
https://arxiv.org/abs/2508.12645
Finally, what Xia & Lindell call a "separation problem" is, in our view, a feature of our approach and not a bug.
If, e.g., all languages in a family are polysynthetic (or none are), that’s not a statistical artefact – it’s the signal. The outcome is well associated with genealogy, showing that family membership captures someth genuinely informative about the process. When the model finds that family explains a large share of the variance, that's not a failure–it's evidence that phylogenetic structure dominates the pattern.
So while Xia & Lindell insist that "autocorrelation due to relationships and distance cannot be captured in family or regional-level analyses", we see that as an empirical question – and we treated it as one.
The real test is whether a mixed model that explicitly represents phylogeny and geography performs worse than their alternative, where the entire shared history of languages and environments is effectively collapsed into a single dimension (an eigenvector).
In other words: we model relationships – Xia & Lindell summarise them into one number per language.
Automating Code Generation for Semiconductor Equipment Control from Developer Utterances with LLMs
Youngkyoung Kim, Sanghyeok Park, Misoo Kim, Gangho Yoon, Eunseok Lee, Simon S. Woo
https://arxiv.org/abs/2509.13055
Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents
Sankalp Tattwadarshi Swain, Anshika Krishnatray, Dhruv Kumar, Jagat Sesh Challa
https://arxiv.org/abs/2509.07389
Automatic Semantic Alignment of Flow Pattern Representations for Exploration with Large Language Models
Weihan Zhang, Jun Tao
https://arxiv.org/abs/2508.06300 https://
Large Language Models Show Signs of Alignment with Human Neurocognition During Abstract Reasoning
Christopher Pinier, Sonia Acu\~na Vargas, Mariia Steeghs-Turchina, Dora Matzke, Claire E. Stevenson, Michael D. Nunez
https://arxiv.org/abs/2508.10057
Poseidon: A OneGraph Engine
Brad Bebee, \"Umit V. \c{C}ataly\"urek, Olaf Hartig, Ankesh Khandelwal, Simone Rondelli, Michael Schmidt, Lefteris Sidirourgos, Bryan Thompson
https://arxiv.org/abs/2510.11166
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Yi Han, Cheng Chi, Enshen Zhou, Shanyu Rong, Jingkun An, Pengwei Wang, Zhongyuan Wang, Lu Sheng, Shanghang Zhang
https://arxiv.org/abs/2510.07181
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/7]:
- SLGaussian: Fast Language Gaussian Splatting in Sparse Views
Kangjie Chen, BingQuan Dai, Minghan Qin, Dongbin Zhang, Peihao Li, Yingshuang Zou, Haoqian Wang
Reasoning Pattern Matters: Learning to Reason without Human Rationales
Chaoxu Pang, Yixuan Cao, Ping Luo
https://arxiv.org/abs/2510.12643 https://arxiv.org…
Outsmarting Linear Neural Networks via an Incoherent Light-Driven Optical Extreme Learner with Data Reverberation
Bofeng Liu, Xu Mei, Sadman Shafi, Tunan Xia, Iam-Choon Khoo, Zhiwen Liu, Xingjie Ni
https://arxiv.org/abs/2508.08428
Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations
Ron F. Del Rosario, Klaudia Krawiecka, Christian Schroeder de Witt
https://arxiv.org/abs/2509.08646
One Weird Trick to Untie Landin's Knot
Paulette Koronkevich, William J. Bowman
https://arxiv.org/abs/2507.21317 https://arxiv.org/pdf/2507.21317…
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/4]:
- IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model
Jiang, Gao, Wang, Sun, Wang, Heng, Sun, Tang, Zhu, Chai, Wang, Gu, Jiang, Sun
Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers
Liang Lin, Miao Yu, Kaiwen Luo, Yibo Zhang, Lilan Peng, Dexian Wang, Xuehai Tang, Yuanhe Zhang, Xikang Yang, Zhenhong Zhou, Kun Wang, Yang Liu
https://arxiv.org/abs/2508.02175
Understanding Economic Tradeoffs Between Human and AI Agents in Bargaining Games
Crystal Qian, Kehang Zhu, John Horton, Benjamin S. Manning, Vivian Tsai, James Wexler, Nithum Thain
https://arxiv.org/abs/2509.09071
OFCnetLLM: Large Language Model for Network Monitoring and Alertness
Hong-Jun Yoon, Mariam Kiran, Danial Ebling, Joe Breen
https://arxiv.org/abs/2507.22711 https://
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/4]:
- Evaluating the Robustness of Open-Source Vision-Language Models to Domain Shift in Object Captioning
Federico Tavella, Amber Drinkwater, Angelo Cangelosi
Convergence and Divergence of Language Models under Different Random Seeds
Finlay Fehlauer (ETH Zurich), Kyle Mahowald (University of Texas at Austin), Tiago Pimentel (ETH Zurich)
https://arxiv.org/abs/2509.26643
SPADE: A Large Language Model Framework for Soil Moisture Pattern Recognition and Anomaly Detection in Precision Agriculture
Yeonju Lee, Rui Qi Chen, Joseph Oboamah, Po Nien Su, Wei-zhen Liang, Yeyin Shi, Lu Gan, Yongsheng Chen, Xin Qiao, Jing Li
https://arxiv.org/abs/2509.18123
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/5]:
- Video-based Sign Language Recognition without Temporal Segmentation
Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, Weiping Li
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/5]:
- SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial...
Ziyang Gong, Wenhao Li, Oliver Ma, Songyuan Li, Jiayi Ji, Xue Yang, Gen Luo, Junchi Yan, Rongrong Ji
Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning
Dongyang Guo, Yasmeen Abdrabou, Enkeleda Thaqi, Enkelejda Kasneci
https://arxiv.org/abs/2507.18252
AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations
Berdymyrat Ovezmyradov
https://arxiv.org/abs/2509.26331 https:…
Beyond Surface-Level Detection: Towards Cognitive-Driven Defense Against Jailbreak Attacks via Meta-Operations Reasoning
Rui Pu, Chaozhuo Li, Rui Ha, Litian Zhang, Lirong Qiu, Xi Zhang
https://arxiv.org/abs/2508.03054
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[5/5]:
- TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Han, Chi, Zhou, Rong, An, Wang, Wang, Sheng, Zhang
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/7]:
- AutoDrive-QA: A Multiple-Choice Benchmark for Vision-Language Evaluation in Urban Autonomous Driving
Boshra Khalili, Andrew W. Smyth
STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning
Chenghao Wu, Ruiyang Ren, Junjie Zhang, Ruirui Wang, Zhongrui Ma, Qi Ye, Wayne Xin Zhao
https://arxiv.org/abs/2508.18812
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/1]:
- Sample-efficient Integration of New Modalities into Large Language Models
Osman Batur \.Ince, Andr\'e F. T. Martins, Oisin Mac Aodha, Edoardo M. Ponti
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/5]:
- CountingFruit: Language-Guided 3D Fruit Counting with Semantic Gaussian Splatting
Fengze Li, Yangle Liu, Jieming Ma, Hai-Ning Liang, Yaochun Shen, Huangxiang Li, Zhijing Wu
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/5]:
- RS-OOD: A Vision-Language Augmented Framework for Out-of-Distribution Detection in Remote Sensing
Chenhao Wang, Yingrui Ji, Yu Meng, Yunjian Zhang, Yao Zhu
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[5/5]:
- HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy
Myungkyu Koo, Daewon Choi, Taeyoung Kim, Kyungmin Lee, Changyeon Kim, Younggyo Seo, Jinwoo Shin
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[6/10]:
- CountingFruit: Language-Guided 3D Fruit Counting with Semantic Gaussian Splatting
Fengze Li, Yangle Liu, Jieming Ma, Hai-Ning Liang, Yaochun Shen, Huangxiang Li, Zhijing Wu
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/9]:
- DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse ...
Wang, Zhang, Fang, Tian, Yang, Ma, Pan, Song, Yu
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/3]:
- MultiStream-LLM: Bridging Modalities for Robust Sign Language Translation
Marshall Thomas, Edward Fish, Richard Bowden
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/9]:
- "Principal Components" Enable A New Language of Images
Xin Wen, Bingchen Zhao, Ismail Elezi, Jiankang Deng, Xiaojuan Qi
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/7]:
- LFTR: Learning-Free Token Reduction for Multimodal Large Language Models
Zihui Zhao, Yingxin Li, Yang Li
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[5/7]:
- CoFFT: Chain of Foresight-Focus Thought for Visual Language Models
Zhang, Dong, Zhang, Jia, Dang, Fernando, Liu, Shou
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/2]:
- SGAligner : Cross-Modal Language-Aided 3D Scene Graph Alignment
Binod Singh, Sayan Deb Sarkar, Iro Armeni
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/5]:
- VRU-Accident: A Vision-Language Benchmark for Video Question Answering and Dense Captioning for A...
Younggun Kim, Ahmed S. Abdelrahman, Mohamed Abdel-Aty
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/2]:
- CaTS-Bench: Can Language Models Describe Numeric Time Series?
Luca Zhou, Pratham Yashwante, Marshall Fisher, Alessio Sampieri, Zihao Zhou, Fabio Galasso, Rose Yu
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/6]:
- Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition
Zefeng Qian, Xincheng Yao, Yifei Huang, Chongyang Zhang, Jiangyong Ying, Hong Sun
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/5]:
- Investigating Traffic Accident Detection Using Multimodal Large Language Models
Ilhan Skender, Kailin Tong, Selim Solmaz, Daniel Watzenig
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/8]:
- SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlea...
Chen, Deng, Zheng, Yan, Liu, Wu, Jiang, Liu, Hu
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[8/8]:
- The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via ...
Jiang, Jiang, Ma, Wen, Li, Zhan, Jia, Liu, Sun, Lang
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/5]:
- RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment
Difei Gu, Yunhe Gao, Yang Zhou, Mu Zhou, Dimitris Metaxas