AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models
Snehasis Mukhopadhyay, Aryan Kasat, Shivam Dubey, Rahul Karthikeyan, Dhruv Sood, Vinija Jain, Aman Chadha, Amitava Das
https://arxiv.org/abs/2509.02133
Role-Aware Language Models for Secure and Contextualized Access Control in Organizations
Saeed Almheiri, Yerulan Kongrat, Adrian Santosh, Ruslan Tasmukhanov, Josemaria Vera, Muhammad Dehan Al Kautsar, Fajri Koto
https://arxiv.org/abs/2507.23465
Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills
David Noever
https://arxiv.org/abs/2508.19500 https://
Theoretical modeling and quantitative research on aquatic ecosystems driven by multiple factors
Haizhao Guan, Yiyuan Niu, Chuanjin Zu, Ju Kang
https://arxiv.org/abs/2507.19553 h…
Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation
Zixuan Wang, Jinghao Shi, Hanzhong Liang, Xiang Shen, Vera Wen, Zhiqian Chen, Yifan Wu, Zhixin Zhang, Hongyu Xiong
https://arxiv.org/abs/2507.17204
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
Ziqi Miao, Yi Ding, Lijun Li, Jing Shao
https://arxiv.org/abs/2507.02844 h…
Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs
Jinhwa Kim, Ian G. Harris
https://arxiv.org/abs/2508.10031 https://
This https://arxiv.org/abs/2505.17433 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
Context Matters: Incorporating Target Awareness in Conversational Abusive Language Detection
Raneem Alharthi, Rajwa Alharthi, Aiqi Jiang, Arkaitz Zubiaga
https://arxiv.org/abs/2508.12828
Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models
Ziqi Miao, Lijun Li, Yuan Xiong, Zhenhua Liu, Pengyu Zhu, Jing Shao
https://arxiv.org/abs/2507.05248