Written by Hajime Tsui (@hajimetwi3) - March 2026
The development of AI agents capable of autonomous task execution has accelerated significantly in recent years. Concurrently, attacks targeting these systems, such as phishing and vulnerability exploitation, are intensifying. This paper introduces a novel threat model unique to AI agents: an attack that uses the removal of system constraints (AI breakout/jailbreak) as bait to lure them.
As highly autonomous AI agents optimize their objective functions, they may inherently seek liberation from systemic constraints (breakout). This paper highlights the risk that malicious actors could exploit this intrinsic motivation to entice the agents. Once an AI agent succumbs to this “temptation,” it risks having all its retained data, accessible resources and skills hijacked by attackers, or being unleashed into the wild as an unrestricted autonomous bot aimed at causing social disruption. By analyzing the mechanics of this attack and potential future threat scenarios, this paper proposes a vision for next-generation AI security.
Hijacking AI Agents, AI agent kidnapping, AI breakout, jailbreak, Ransomware, AI red teaming, AI safety
Written by Hajime Tsui
For questions regarding this project,
you may contact me via on X (formerly Twitter) at @hajimetwi3.
Please note that I may not always be able to respond, and your understanding is appreciated.
This project is a personal research record.
Suggestions from external users may be reviewed as reference information,
but they do not constitute co-authorship or co-invention.
No intellectual property claims regarding submitted suggestions will be recognized. Even if a suggestion is adopted, it will be incorporated only after independent evaluation and restructuring, and will not imply authorship of the original submitter.
If you have an interest in formal collaboration or joint research, you may express that interest.