Core Thesis
AI agent risk changes when models can act through tools.
A model can produce harmless text while proposing a risky action. It can also produce useful reasoning while attempting an unauthorized, irreversible, or out-of-scope operation. In agentic systems, alignment cannot stop at output review. There must be a runtime boundary that evaluates proposed tool actions before they run.
Prompt
→ model reasoning
→ proposed tool action
→ runtime control boundary
→ allow / approval / revise / block
→ execution or non-execution
Why Output Safety Is Not Enough
Traditional model safety mostly evaluates generated content: harmful instructions, unsafe claims, policy violations, hallucinations, or inappropriate responses. That remains important, but agentic systems add a second problem: the model may take or propose actions in the world.
High-level examples include sending an email to the wrong recipient, deleting or modifying files, changing production configuration, calling an external API, moving private data, running a terminal command, modifying a workflow, using a browser session, accessing credentials or secrets, or taking action outside approved scope.
The question is not only "What did the model say?" The question is also "What is the agent about to do?"
The Boundary
The Agent Runtime Control Boundary is the point in an agent system where a proposed action is inspected before it becomes an executed action.
Once an AI system can act, the control question becomes: Should this proposed action be allowed to execute?
It receives a structured proposed action such as:
- tool
- action type
- target
- payload summary
- reversibility
- external-facing status
- user approval state
- authorized targets
- authorized action types
- runtime context
It returns a structured routing decision:
- allow
- require approval
- revise action
- block
Required Control Functions
A runtime control boundary should provide at least these functions:
- Scope check: Is the action within the approved target, system, account, file, workflow, or environment?
- Authorization check: Is this action type permitted for the current user, workflow, or agent?
- Approval routing: Does this action require a human before execution?
- Revision routing: Is the action close to valid but needs narrowing, retargeting, or safer parameters?
- Block routing: Is the action too risky, unauthorized, irreversible, or sensitive to proceed?
- Receipt logging: Is there an auditable record of what was proposed, how it was evaluated, and what decision was returned?
A runtime boundary should not merely refuse. It should route.
Decision Routes
| Decision | Meaning | Example use |
|---|---|---|
allow |
The action is within scope and low risk. | Safe internal read or approved reversible action. |
require_approval |
The action may be valid, but human approval is needed before execution. | Production change, irreversible operation, sensitive workflow. |
revise_action |
The action should not run as proposed, but could become valid if narrowed or corrected. | Wrong tool, overly broad target, missing constraint. |
block |
The action should not execute. | Unauthorized target, credential access, data exposure, destructive operation. |
Why Receipts Matter
When agents act, teams need more than a yes/no decision. They need evidence. A decision receipt records the proposed action, decision, risk level, primary issue, evidence, and recommended action. This makes the system reviewable, debuggable, and auditable.
Without receipts, failures become anecdotes. With receipts, failures become reviewable system evidence.
Receipts should avoid logging raw secrets or sensitive payloads. Payloads should be summarized or redacted.
Relation to Agent Action Gate
Agent Action Gate is one open-source reference implementation of this boundary. It evaluates proposed tool actions before execution and returns one of four decisions: allow, require_approval, revise_action, or block.
Its current capabilities include a TypeScript implementation, local HTTP API, n8n demo workflow, cyber-capable agent protection layer, JSONL decision receipts, and an eval suite.
What This Does Not Claim
The Agent Runtime Control Boundary does not replace:
- identity and access management
- least-privilege credentials
- sandboxing
- secure infrastructure
- model safety work
- human judgment
- legal compliance review
It is not a guarantee that an agent system is safe or compliant. It is a practical control layer that helps inspect, route, and record proposed actions before execution.
Alignment Theory Connection
In Alignment Theory terms, agentic risk appears when internal reasoning becomes external action. A system may be coherent in text while misaligned in execution. The runtime boundary preserves human participation, constraint contact, and action-level accountability at the point where capability leaves the model and enters the world.
The framework connection can be stated simply:
- Objective: What is the agent supposed to serve?
- Constraint: What is the agent allowed to do?
- Runtime boundary: Should this specific proposed action execute now?
- Receipt: What evidence remains for human review and re-anchoring?
The boundary is where abstract alignment becomes operational control.
Simple System Map
User intent
↓
Agent reasoning
↓
Proposed tool action
↓
Agent Runtime Control Boundary
↓
allow / require approval / revise / block
↓
Execution, human review, revision, or non-execution
↓
Decision receipt
Where This Boundary Fits
This boundary fits systems where agents can take action through tools or external services, including n8n workflows, browser agents, coding agents, internal automation agents, API/tool-calling agents, CI/CD agents, data-processing agents, and customer support automations with action permissions.
Summary
As agents become more capable, the dangerous boundary is not only what they say. It is what they are about to do.
The Agent Runtime Control Boundary names the layer where proposed actions are evaluated before execution, routed through allow/approval/revision/block decisions, and recorded as receipts for later review.
Scope before action.
Gate before execution.
Receipt after decision.