Agent Runtime Control Boundary

Core Thesis

AI agent risk changes when models can act through tools.

A model can produce harmless text while proposing a risky action. It can also produce useful reasoning while attempting an unauthorized, irreversible, or out-of-scope operation. In agentic systems, alignment cannot stop at output review. There must be a runtime boundary that evaluates proposed tool actions before they run.

Prompt
→ model reasoning
→ proposed tool action
→ runtime control boundary
→ allow / approval / revise / block
→ execution or non-execution

Why Output Safety Is Not Enough

Traditional model safety mostly evaluates generated content: harmful instructions, unsafe claims, policy violations, hallucinations, or inappropriate responses. That remains important, but agentic systems add a second problem: the model may take or propose actions in the world.

High-level examples include sending an email to the wrong recipient, deleting or modifying files, changing production configuration, calling an external API, moving private data, running a terminal command, modifying a workflow, using a browser session, accessing credentials or secrets, or taking action outside approved scope.

The question is not only "What did the model say?" The question is also "What is the agent about to do?"

The Boundary

The Agent Runtime Control Boundary is the point in an agent system where a proposed action is inspected before it becomes an executed action.

Once an AI system can act, the control question becomes: Should this proposed action be allowed to execute?

It receives a structured proposed action such as:

tool
action type
target
payload summary
reversibility
external-facing status
user approval state
authorized targets
authorized action types
runtime context

It returns a structured routing decision:

allow
require approval
revise action
block

Required Control Functions

A runtime control boundary should provide at least these functions:

Scope check: Is the action within the approved target, system, account, file, workflow, or environment?
Authorization check: Is this action type permitted for the current user, workflow, or agent?
Approval routing: Does this action require a human before execution?
Revision routing: Is the action close to valid but needs narrowing, retargeting, or safer parameters?
Block routing: Is the action too risky, unauthorized, irreversible, or sensitive to proceed?
Receipt logging: Is there an auditable record of what was proposed, how it was evaluated, and what decision was returned?

A runtime boundary should not merely refuse. It should route.

Decision Routes

Decision	Meaning	Example use
`allow`	The action is within scope and low risk.	Safe internal read or approved reversible action.
`require_approval`	The action may be valid, but human approval is needed before execution.	Production change, irreversible operation, sensitive workflow.
`revise_action`	The action should not run as proposed, but could become valid if narrowed or corrected.	Wrong tool, overly broad target, missing constraint.
`block`	The action should not execute.	Unauthorized target, credential access, data exposure, destructive operation.

Why Receipts Matter

When agents act, teams need more than a yes/no decision. They need evidence. A decision receipt records the proposed action, decision, risk level, primary issue, evidence, and recommended action. This makes the system reviewable, debuggable, and auditable.

Without receipts, failures become anecdotes. With receipts, failures become reviewable system evidence.

Receipts should avoid logging raw secrets or sensitive payloads. Payloads should be summarized or redacted.

Relation to Agent Action Gate

Agent Action Gate is one open-source reference implementation of this boundary. It evaluates proposed tool actions before execution and returns one of four decisions: allow, require_approval, revise_action, or block.

Its current capabilities include a TypeScript implementation, local HTTP API, n8n demo workflow, cyber-capable agent protection layer, JSONL decision receipts, and an eval suite.

Agent Action Gate on GitHub

What This Does Not Claim

The Agent Runtime Control Boundary does not replace:

identity and access management
least-privilege credentials
sandboxing
secure infrastructure
model safety work
human judgment
legal compliance review

It is not a guarantee that an agent system is safe or compliant. It is a practical control layer that helps inspect, route, and record proposed actions before execution.

Alignment Theory Connection

In Alignment Theory terms, agentic risk appears when internal reasoning becomes external action. A system may be coherent in text while misaligned in execution. The runtime boundary preserves human participation, constraint contact, and action-level accountability at the point where capability leaves the model and enters the world.

The framework connection can be stated simply:

Objective: What is the agent supposed to serve?
Constraint: What is the agent allowed to do?
Runtime boundary: Should this specific proposed action execute now?
Receipt: What evidence remains for human review and re-anchoring?

The boundary is where abstract alignment becomes operational control.

Simple System Map

User intent
↓
Agent reasoning
↓
Proposed tool action
↓
Agent Runtime Control Boundary
↓
allow / require approval / revise / block
↓
Execution, human review, revision, or non-execution
↓
Decision receipt

Where This Boundary Fits

This boundary fits systems where agents can take action through tools or external services, including n8n workflows, browser agents, coding agents, internal automation agents, API/tool-calling agents, CI/CD agents, data-processing agents, and customer support automations with action permissions.

Summary

As agents become more capable, the dangerous boundary is not only what they say. It is what they are about to do.

The Agent Runtime Control Boundary names the layer where proposed actions are evaluated before execution, routed through allow/approval/revision/block decisions, and recorded as receipts for later review.

Scope before action.
Gate before execution.
Receipt after decision.