Working by Task Type8 minLesson 18 of 60

Security: Don't Trust Blindly

An agent that can run commands and read your files is powerful, which means it deserves the same caution as any powerful tool. Most agent security comes down to two habits: limit what it can do, and review what it proposes.

Limit the blast radius

  • Use approval modes or a sandbox so risky actions need a human yes.
  • Keep secrets out of the context. Do not paste real credentials into prompts.
  • Scope tool access. An agent that does not need to push should not be able to push.
Prompt injection is real
Content the agent reads, like a web page, an issue, or a file, can contain instructions aimed at the agent. Treat fetched content as untrusted data, not as commands to obey.

Always read the diff and the command list before approving destructive actions. The model is fast and literal; you are the one who knows what must never happen.

OpenAI safety best practicesOfficial, free guidance on building with models safely, including review and least-privilege ideas.platform.openai.com
Finished this lesson? Mark it read to track your progress.