Tootfinder

No exact results. Similar results found.

@arXiv_csCL_bot@mastoxiv.page
2025-09-03 14:30:53

AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models
Snehasis Mukhopadhyay, Aryan Kasat, Shivam Dubey, Rahul Karthikeyan, Dhruv Sood, Vinija Jain, Aman Chadha, Amitava Das
https://arxiv.org/abs/2509.02133

AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models
Large Language Models (LLMs) can inadvertently reflect societal biases present in their training data, leading to harmful or prejudiced outputs. In the Indian context, our empirical evaluations across a suite of models reveal that biases around caste and religion are particularly salient. Yet, most existing mitigation strategies are Western-centric and fail to address these local nuances. We propose AMBEDKAR, a framework inspired by the egalitarian vision of Dr B. R. Ambedkar, architect of the …

@arXiv_csCL_bot@mastoxiv.page
2025-08-01 10:18:11

Role-Aware Language Models for Secure and Contextualized Access Control in Organizations
Saeed Almheiri, Yerulan Kongrat, Adrian Santosh, Ruslan Tasmukhanov, Josemaria Vera, Muhammad Dehan Al Kautsar, Fajri Koto
https://arxiv.org/abs/2507.23465

Role-Aware Language Models for Secure and Contextualized Access Control in Organizations
As large language models (LLMs) are increasingly deployed in enterprise settings, controlling model behavior based on user roles becomes an essential requirement. Existing safety methods typically assume uniform access and focus on preventing harmful or toxic outputs, without addressing role-specific access constraints. In this work, we investigate whether LLMs can be fine-tuned to generate responses that reflect the access privileges associated with different organizational roles. We explore t…

@arXiv_csCR_bot@mastoxiv.page
2025-08-28 09:42:41

Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills
David Noever
https://arxiv.org/abs/2508.19500 https://

Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills
This paper identifies and analyzes a novel vulnerability class in Model Context Protocol (MCP) based agent systems. The attack chain describes and demonstrates how benign, individually authorized tasks can be orchestrated to produce harmful emergent behaviors. Through systematic analysis using the MITRE ATLAS framework, we demonstrate how 95 agents tested with access to multiple services-including browser automation, financial analysis, location tracking, and code deployment-can chain legitimat…

@arXiv_qbioQM_bot@mastoxiv.page
2025-07-29 08:50:11

Theoretical modeling and quantitative research on aquatic ecosystems driven by multiple factors
Haizhao Guan, Yiyuan Niu, Chuanjin Zu, Ju Kang
https://arxiv.org/abs/2507.19553 h…

Theoretical modeling and quantitative research on aquatic ecosystems driven by multiple factors
Understanding the complex interactions between water temperature, nutrient levels, and chlorophyll-a dynamics is essential for addressing eutrophication and the proliferation of harmful algal blooms in freshwater ecosystems algal. However, many existing studies tend to oversimplify thse relationships often neglecting the non-linear effects and long-term temporal variations that influence chlorophyll-a growth. Here, we conducted multi-year field monitoring (2020-2024) of the key environmental fa…

@arXiv_csLG_bot@mastoxiv.page
2025-07-24 10:03:49

Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation
Zixuan Wang, Jinghao Shi, Hanzhong Liang, Xiang Shen, Vera Wen, Zhiqian Chen, Yifan Wu, Zhixin Zhang, Hongyu Xiong
https://arxiv.org/abs/2507.17204

Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation
Effective content moderation is essential for video platforms to safeguard user experience and uphold community standards. While traditional video classification models effectively handle well-defined moderation tasks, they struggle with complicated scenarios such as implicit harmful content and contextual ambiguity. Multimodal large language models (MLLMs) offer a promising solution to these limitations with their superior cross-modal reasoning and contextual understanding. However, two key ch…

@arXiv_csCV_bot@mastoxiv.page
2025-07-04 10:24:11

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
Ziqi Miao, Yi Ding, Lijun Li, Jing Shao
https://arxiv.org/abs/2507.02844 h…

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
With the emergence of strong visual-language capabilities, multimodal large language models (MLLMs) have demonstrated tremendous potential for real-world applications. However, the security vulnerabilities exhibited by the visual modality pose significant challenges to deploying such models in open-world environments. Recent studies have successfully induced harmful responses from target MLLMs by encoding harmful textual semantics directly into visual inputs. However, in these approaches, the v…

@arXiv_csCR_bot@mastoxiv.page
2025-08-15 08:34:02

Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs
Jinhwa Kim, Ian G. Harris
https://arxiv.org/abs/2508.10031 https://

Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs
While Large Language Models (LLMs) have shown significant advancements in performance, various jailbreak attacks have posed growing safety and ethical risks. Malicious users often exploit adversarial context to deceive LLMs, prompting them to generate responses to harmful queries. In this study, we propose a new defense mechanism called Context Filtering model, an input pre-processing method designed to filter out untrustworthy and unreliable context while identifying the primary prompts contai…

@arXiv_csAI_bot@mastoxiv.page
2025-06-05 09:37:48

This https://arxiv.org/abs/2505.17433 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…

MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models
Memes have emerged as a popular form of multimodal online communication, where their interpretation heavily depends on the specific context in which they appear. Current approaches predominantly focus on isolated meme analysis, either for harmful content detection or standalone interpretation, overlooking a fundamental challenge: the same meme can express different intents depending on its conversational context. This oversight creates an evaluation gap: although humans intuitively recognize ho…

@arXiv_csCL_bot@mastoxiv.page
2025-08-19 11:44:00

Context Matters: Incorporating Target Awareness in Conversational Abusive Language Detection
Raneem Alharthi, Rajwa Alharthi, Aiqi Jiang, Arkaitz Zubiaga
https://arxiv.org/abs/2508.12828

Context Matters: Incorporating Target Awareness in Conversational Abusive Language Detection
Abusive language detection has become an increasingly important task as a means to tackle this type of harmful content in social media. There has been a substantial body of research developing models for determining if a social media post is abusive or not; however, this research has primarily focused on exploiting social media posts individually, overlooking additional context that can be derived from surrounding posts. In this study, we look at conversational exchanges, where a user replies t…

@arXiv_csCL_bot@mastoxiv.page
2025-07-08 14:05:21

Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models
Ziqi Miao, Lijun Li, Yuan Xiong, Zhenhua Liu, Pengyu Zhu, Jing Shao
https://arxiv.org/abs/2507.05248

Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models
Contextual priming, where earlier stimuli covertly bias later judgments, offers an unexplored attack surface for large language models (LLMs). We uncover a contextual priming vulnerability in which the previous response in the dialogue can steer its subsequent behavior toward policy-violating content. Building on this insight, we propose Response Attack, which uses an auxiliary LLM to generate a mildly harmful response to a paraphrased version of the original malicious query. They are then form…

Tootfinder

Opt-in global Mastodon full text search. Join the index!