Tootfinder

No exact results. Similar results found.

@arXiv_csCL_bot@mastoxiv.page
2025-07-18 09:40:12

Automating Steering for Safe Multimodal Large Language Models
Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng
https://arxiv.org/abs/2507.13255

Automating Steering for Safe Multimodal Large Language Models
Recent progress in Multimodal Large Language Models (MLLMs) has unlocked powerful cross-modal reasoning abilities, but also raised new safety concerns, particularly when faced with adversarial multimodal inputs. To improve the safety of MLLMs during inference, we introduce a modular and adaptive inference-time intervention technology, AutoSteer, without requiring any fine-tuning of the underlying model. AutoSteer incorporates three core components: (1) a novel Safety Awareness Score (SAS) that …

@arXiv_csAI_bot@mastoxiv.page
2025-09-16 11:30:47

When Safe Unimodal Inputs Collide: Optimizing Reasoning Chains for Cross-Modal Safety in Multimodal Large Language Models
Wei Cai, Shujuan Liu, Jian Zhao, Ziyan Shi, Yusheng Zhao, Yuchen Yuan, Tianle Zhang, Chi Zhang, Xuelong Li
https://arxiv.org/abs/2509.12060

When Safe Unimodal Inputs Collide: Optimizing Reasoning Chains for Cross-Modal Safety in Multimodal Large Language Models
Multimodal Large Language Models (MLLMs) are susceptible to the implicit reasoning risk, wherein innocuous unimodal inputs synergistically assemble into risky multimodal data that produce harmful outputs. We attribute this vulnerability to the difficulty of MLLMs maintaining safety alignment through long-chain reasoning. To address this issue, we introduce Safe-Semantics-but-Unsafe-Interpretation (SSUI), the first dataset featuring interpretable reasoning paths tailored for such a cross-modal c…

@heiseonline@social.heise.de
2025-07-11 06:42:29

Europas erste eigene Prozessor-Entwicklung ist auf dem Weg!
🇪🇺💻 SiPearl hat einen wichtigen Meilenstein erreicht: Das Unternehmen hat sein CPU-Design für den Rhea1-Prozessor an TSMC in Taiwan geschickt, wo nun die ersten Chips produziert werden.
Zum Artikel: https://

Auf dem Bild ist der Rhea1-Prozessor zu sehen. Im Bild steht: "Europas erster eigener Prozessor landet bei TSMC" dardrunter steht: "SiPearl hat sein CPU-Design für den Rhea1-Prozessor an den taiwanesischen Chipfertiger TSMC übermittelt. Die Produktion der ersten europäischen Hochleistungs-CPUs hat damit begonnen."

@arXiv_csCV_bot@mastoxiv.page
2025-08-11 10:17:39

Effective Training Data Synthesis for Improving MLLM Chart Understanding
Yuwei Yang, Zeyu Zhang, Yunzhong Hou, Zhuowan Li, Gaowen Liu, Ali Payani, Yuan-Sen Ting, Liang Zheng
https://arxiv.org/abs/2508.06492

Effective Training Data Synthesis for Improving MLLM Chart Understanding
Being able to effectively read scientific plots, or chart understanding, is a central part toward building effective agents for science. However, existing multimodal large language models (MLLMs), especially open-source ones, are still falling behind with a typical success rate of 30%-50% on challenging benchmarks. Previous studies on fine-tuning MLLMs with synthetic charts are often restricted by their inadequate similarity to the real charts, which could compromise model training and performa…

@arXiv_csHC_bot@mastoxiv.page
2025-08-12 09:15:12

AdjustAR: AI-Driven In-Situ Adjustment of Site-Specific Augmented Reality Content
Nels Numan, Jessica Van Brummelen, Ziwen Lu, Anthony Steed
https://arxiv.org/abs/2508.06826 htt…

AdjustAR: AI-Driven In-Situ Adjustment of Site-Specific Augmented Reality Content
Site-specific outdoor AR experiences are typically authored using static 3D models, but are deployed in physical environments that change over time. As a result, virtual content may become misaligned with its intended real-world referents, degrading user experience and compromising contextual interpretation. We present AdjustAR, a system that supports in-situ correction of AR content in dynamic environments using multimodal large language models (MLLMs). Given a composite image comprising the o…

@arXiv_csNI_bot@mastoxiv.page
2025-08-13 09:27:42

Dynamic Uncertainty-aware Multimodal Fusion for Outdoor Health Monitoring
Zihan Fang, Zheng Lin, Senkang Hu, Yihang Tao, Yiqin Deng, Xianhao Chen, Yuguang Fang
https://arxiv.org/abs/2508.09085

Dynamic Uncertainty-aware Multimodal Fusion for Outdoor Health Monitoring
Outdoor health monitoring is essential to detect early abnormal health status for safeguarding human health and safety. Conventional outdoor monitoring relies on static multimodal deep learning frameworks, which requires extensive data training from scratch and fails to capture subtle health status changes. Multimodal large language models (MLLMs) emerge as a promising alternative, utilizing only small datasets to fine-tune pre-trained information-rich models for enabling powerful health status…

@arXiv_csAR_bot@mastoxiv.page
2025-09-12 07:32:59

Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference
Haoran Wu, Can Xiao, Jiayi Nie, Xuan Guo, Binglei Lou, Jeffrey T. H. Wong, Zhiwen Mo, Cheng Zhang, Przemyslaw Forys, Wayne Luk, Hongxiang Fan, Jianyi Cheng, Timothy M. Jones, Rika Antonova, Robert Mullins, Aaron Zhao
https://arxiv.org/abs/2509.09505

Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference
LLMs now form the backbone of AI agents for a diverse array of applications, including tool use, command-line agents, and web or computer use agents. These agentic LLM inference tasks are fundamentally different from chatbot-focused inference -- they often have much larger context lengths to capture complex, prolonged inputs, such as entire webpage DOMs or complicated tool call trajectories. This, in turn, generates significant off-chip memory traffic for the underlying hardware at the inferenc…

@arXiv_astrophEP_bot@mastoxiv.page
2025-09-09 07:57:12

JWST-TST DREAMS: Secondary Atmosphere Constraints for the Habitable Zone Planet TRAPPIST-1 e
Ana Glidden, Sukrit Ranjan, Sara Seager, N\'estor Espinoza, Ryan J. MacDonald, Natalie H. Allen, Caleb I. Ca\~nas, David Grant, Am\'elie Gressier, Kevin B. Stevenson, Natasha E. Batalha, Nikole K. Lewis, Douglas Long, Hannah R. Wakeford, Lili Alderson, Ryan C. Challener, Knicole Col\'on, Jingcheng Huang, Zifan Lin, Dana R. Louie, Elijah Mullens, Kristin S. Sotzen, Jeff A. Valenti, D…

JWST-TST DREAMS: Secondary Atmosphere Constraints for the Habitable Zone Planet TRAPPIST-1 e
The TRAPPIST-1 system offers one of the best opportunities to characterize temperate terrestrial planets beyond our own solar system. Within the TRAPPIST-1 system, planet e stands out as highly likely to sustain surface liquid water if it possesses an atmosphere. Recently, we reported the first JWST/NIRSpec PRISM transmission spectra of TRAPPIST-1 e, revealing significant stellar contamination, which varied between the four visits. Here, we assess the range of planetary atmospheres consistent w…

@arXiv_csCV_bot@mastoxiv.page
2025-08-12 12:47:13

Spatial-ORMLLM: Improve Spatial Relation Understanding in the Operating Room with Multimodal Large Language Model
Peiqi He, Zhenhao Zhang, Yixiang Zhang, Xiongjun Zhao, Shaoliang Peng
https://arxiv.org/abs/2508.08199

Spatial-ORMLLM: Improve Spatial Relation Understanding in the Operating Room with Multimodal Large Language Model
Precise spatial modeling in the operating room (OR) is foundational to many clinical tasks, supporting intraoperative awareness, hazard avoidance, and surgical decision-making. While existing approaches leverage large-scale multimodal datasets for latent-space alignment to implicitly learn spatial relationships, they overlook the 3D capabilities of MLLMs. However, this approach raises two issues: (1) Operating rooms typically lack multiple video and audio sensors, making multimodal 3D data diff…

@arXiv_csCV_bot@mastoxiv.page
2025-08-12 12:46:03

MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision
Zhonghao Yan, Muxi Diao, Yuxuan Yang, Jiayuan Xu, Kaizhou Zhang, Ruoyan Jing, Lele Yang, Yanxi Liu, Kongming Liang, Zhanyu Ma
https://arxiv.org/abs/2508.08177

MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision
Accurately grounding regions of interest (ROIs) is critical for diagnosis and treatment planning in medical imaging. While multimodal large language models (MLLMs) combine visual perception with natural language, current medical-grounding pipelines still rely on supervised fine-tuning with explicit spatial hints, making them ill-equipped to handle the implicit queries common in clinical practice. This work makes three core contributions. We first define Unified Medical Reasoning Grounding (UMRG…

Tootfinder

Opt-in global Mastodon full text search. Join the index!