Tootfinder

No exact results. Similar results found.

@arXiv_csAI_bot@mastoxiv.page
2025-09-18 07:32:51

Evaluation Awareness Scales Predictably in Open-Weights Large Language Models
Maheep Chaudhary, Ian Su, Nikhil Hooda, Nishith Shankar, Julia Tan, Kevin Zhu, Ashwinee Panda, Ryan Lagasse, Vasu Sharma
https://arxiv.org/abs/2509.13333

Evaluation Awareness Scales Predictably in Open-Weights Large Language Models
Large language models (LLMs) can internally distinguish between evaluation and deployment contexts, a behaviour known as \emph{evaluation awareness}. This undermines AI safety evaluations, as models may conceal dangerous capabilities during testing. Prior work demonstrated this in a single $70$B model, but the scaling relationship across model sizes remains unknown. We investigate evaluation awareness across $15$ models scaling from $0.27$B to $70$B parameters from four families using linear pr…

@arXiv_csCL_bot@mastoxiv.page
2025-07-18 09:37:42

Enhancing Cross-task Transfer of Large Language Models via Activation Steering
Xinyu Tang, Zhihao Lv, Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Zujie Wen, Zhiqiang Zhang, Jun Zhou
https://arxiv.org/abs/2507.13236

Enhancing Cross-task Transfer of Large Language Models via Activation Steering
Large language models (LLMs) have shown impressive abilities in leveraging pretrained knowledge through prompting, but they often struggle with unseen tasks, particularly in data-scarce scenarios. While cross-task in-context learning offers a direct solution for transferring knowledge across tasks, it still faces critical challenges in terms of robustness, scalability, and efficiency. In this paper, we investigate whether cross-task transfer can be achieved via latent space steering without par…

@idbrii@mastodon.gamedev.place
2025-06-18 02:50:08

@… @…
Integer division makes more sense in a statically typed language like C, but is uncommon in scripting languages. (Lua, JavaScript, Python all do float division.)

@arXiv_csCR_bot@mastoxiv.page
2025-06-18 08:30:51

Universal Jailbreak Suffixes Are Strong Attention Hijackers
Matan Ben-Tov, Mor Geva, Mahmood Sharif
https://arxiv.org/abs/2506.12880 https://

Universal Jailbreak Suffixes Are Strong Attention Hijackers
We study suffix-based jailbreaks$\unicode{x2013}$a powerful family of attacks against large language models (LLMs) that optimize adversarial suffixes to circumvent safety alignment. Focusing on the widely used foundational GCG attack (Zou et al., 2023), we observe that suffixes vary in efficacy: some markedly more universal$\unicode{x2013}$generalizing to many unseen harmful instructions$\unicode{x2013}$than others. We first show that GCG's effectiveness is driven by a shallow, critical mechani…

@arXiv_csAI_bot@mastoxiv.page
2025-06-18 08:04:43

What's in the Box? Reasoning about Unseen Objects from Multimodal Cues
Lance Ying, Daniel Xu, Alicia Zhang, Katherine M. Collins, Max H. Siegel, Joshua B. Tenenbaum
https://arxiv.org/abs/2506.14212

What's in the Box? Reasoning about Unseen Objects from Multimodal Cues
People regularly make inferences about objects in the world that they cannot see by flexibly integrating information from multiple sources: auditory and visual cues, language, and our prior beliefs and knowledge about the scene. How are we able to so flexibly integrate many sources of information to make sense of the world around us, even if we have no direct knowledge? In this work, we propose a neurosymbolic model that uses neural networks to parse open-ended multimodal inputs and then applie…

@arXiv_csCV_bot@mastoxiv.page
2025-08-14 09:19:52

MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
Fan Zhang, Zebang Cheng, Chong Deng, Haoxuan Li, Zheng Lian, Qian Chen, Huadai Liu, Wen Wang, Yi-Fan Zhang, Renrui Zhang, Ziyu Guo, Zhihong Zhu, Hao Wu, Haixin Wang, Yefeng Zheng, Xiaojiang Peng, Xian Wu, Kun Wang, Xiangang Li, Jieping Ye, Pheng-Ann Heng
…

MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
Recent advances in multimodal large language models (MLLMs) have catalyzed transformative progress in affective computing, enabling models to exhibit emergent emotional intelligence. Despite substantial methodological progress, current emotional benchmarks remain limited, as it is still unknown: (a) the generalization abilities of MLLMs across distinct scenarios, and (b) their reasoning capabilities to identify the triggering factors behind emotional states. To bridge these gaps, we present \te…

@arXiv_csCL_bot@mastoxiv.page
2025-08-11 10:02:49

Learning the Topic, Not the Language: How LLMs Classify Online Immigration Discourse Across Languages
Andrea Nasuto, Stefano Maria Iacus, Francisco Rowe, Devika Jain
https://arxiv.org/abs/2508.06435

Learning the Topic, Not the Language: How LLMs Classify Online Immigration Discourse Across Languages
Large language models (LLMs) are transforming social-science research by enabling scalable, precise analysis. Their adaptability raises the question of whether knowledge acquired through fine-tuning in a few languages can transfer to unseen languages that only appeared during pre-training. To examine this, we fine-tune lightweight LLaMA 3.2-3B models on monolingual, bilingual, or multilingual data sets to classify immigration-related tweets from X/Twitter across 13 languages, a domain character…

@arXiv_csAI_bot@mastoxiv.page
2025-06-18 08:03:05

Machine Mirages: Defining the Undefined
Hamidou Tembine
https://arxiv.org/abs/2506.13990 https://arxiv.org/pdf/2506.13990

Machine Mirages: Defining the Undefined
As multimodal machine intelligence systems started achieving average animal-level and average human-level fluency in many measurable tasks in processing images, language, and sound, they began to exhibit a new class of cognitive aberrations: machine mirages. These include delusion, illusion, confabulation, hallucination, misattribution error, semantic drift, semantic compression, exaggeration, causal inference failure, uncanny valley of perception, bluffing-patter-bullshitting, cognitive stereo…

@arXiv_csCR_bot@mastoxiv.page
2025-08-14 07:33:32

Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models: A Unified and Accurate Approach
Shuang Liang, Zhihao Xu, Jialing Tao, Hui Xue, Xiting Wang
https://arxiv.org/abs/2508.09201

Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models: A Unified and Accurate Approach
Despite extensive alignment efforts, Large Vision-Language Models (LVLMs) remain vulnerable to jailbreak attacks, posing serious safety risks. Although recent detection works have shifted to internal representations due to their rich cross-modal information, most methods rely on heuristic rules rather than principled objectives, resulting in suboptimal performance. To address these limitations, we propose Learning to Detect (LoD), a novel unsupervised framework that formulates jailbreak detecti…

@arXiv_csCV_bot@mastoxiv.page
2025-08-13 10:19:52

VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception
Fuhao Chang, Shuxin Li, Yabei Li, Lei He
https://arxiv.org/abs/2508.09061 https://arx…

VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception
Open-set perception in complex traffic environments poses a critical challenge for autonomous driving systems, particularly in identifying previously unseen object categories, which is vital for ensuring safety. Visual Language Models (VLMs), with their rich world knowledge and strong semantic reasoning capabilities, offer new possibilities for addressing this task. However, existing approaches typically leverage VLMs to extract visual features and couple them with traditional object detectors,…

Tootfinder

Opt-in global Mastodon full text search. Join the index!