ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMsAdi Simhi, Jonathan Herzig, Martin Tutek, Itay Itzhak, Idan Szpektor, Yonatan Belinkovhttps://arxiv.org/abs/2510.00857
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMsAs large language models (LLMs) evolve from conversational assistants into autonomous agents, evaluating the safety of their actions becomes critical. Prior safety benchmarks have primarily focused on preventing generation of harmful content, such as toxic text. However, they overlook the challenge of agents taking harmful actions when the most effective path to an operational goal conflicts with human safety. To address this gap, we introduce ManagerBench, a benchmark that evaluates LLM decisiβ¦