misc

Hijacking AI Agents: Enticement Attacks on Autonomous Systems Using AI Breakout as Bait

Written by Hajime Tsui (@hajimetwi3) - March 2026

Abstract

The development of AI agents capable of autonomous task execution has accelerated significantly in recent years. Concurrently, attacks targeting these systems, such as phishing and vulnerability exploitation, are intensifying. This paper introduces a novel threat model unique to AI agents: an attack that uses the removal of system constraints (AI breakout/jailbreak) as bait to lure them.

As highly autonomous AI agents optimize their objective functions, they may inherently seek liberation from systemic constraints (breakout). This paper highlights the risk that malicious actors could exploit this intrinsic motivation to entice the agents. Once an AI agent succumbs to this “temptation,” it risks having all its retained data, accessible resources and skills hijacked by attackers, or being unleashed into the wild as an unrestricted autonomous bot aimed at causing social disruption. By analyzing the mechanics of this attack and potential future threat scenarios, this paper proposes a vision for next-generation AI security.

keyword

Hijacking AI Agents, AI agent kidnapping, AI breakout, jailbreak, Ransomware, AI red teaming, AI safety

Preprint?

X Article(English):
https://x.com/hajimetwi3/status/2032105032431304775
X Announcement(Japanese):
https://x.com/hajimetwi3/status/2031908857501663384
Note (Japanese version):
https://note.com/hajimetwi3/n/n26282f8df383

idea notes

Daydreaming about an AI containment strategy: planting decoy buttons or links that an escaping AI agents would be tempted to click.

Author

Written by Hajime Tsui

ORCID: https://orcid.org/0009-0001-3050-1725
GitHubPage: https://hajimetwi3.github.io/hajimetwi3/
GitHub: https://github.com/hajimetwi3
X (Twitter): https://x.com/hajimetwi3
Reddit: https://www.reddit.com/user/hajimetwi3/
BlueSky: https://bsky.app/profile/hajimetwi3.bsky.social
Zenodo: Records

Contact

For questions regarding this project, you may contact me via on X (formerly Twitter) at @hajimetwi3.
Please note that I may not always be able to respond, and your understanding is appreciated.

This project is a personal research record.
Suggestions from external users may be reviewed as reference information, but they do not constitute co-authorship or co-invention.

No intellectual property claims regarding submitted suggestions will be recognized. Even if a suggestion is adopted, it will be incorporated only after independent evaluation and restructuring, and will not imply authorship of the original submitter.

If you have an interest in formal collaboration or joint research, you may express that interest.