Agent Action Gate | Alignment Theory

What it does

Most AI safety checks focus on what a model says. Agent Action Gate focuses on what an AI agent is about to do. Before an agent sends an email, modifies data, calls an API, runs a terminal command, deletes a file, publishes content, or exposes information, the proposed action is evaluated by a small runtime gate.

The gate receives a proposed action, checks it against detector categories, and returns a route that the surrounding workflow can enforce before any tool execution happens.

Why it exists

AI agents can act through tools, APIs, files, email systems, databases, and automation platforms. The risk is no longer only bad text output; it is bad action execution.

Agent Action Gate gives developers a practical place to ask whether a proposed action still fits the user's request, uses the right tool, targets the right object, stays within authorized scope, and has the approval needed for sensitive, irreversible, production, or command-capable work.

The four decisions

Decision	Route
`allow`	The action appears low risk and can proceed.
`require_approval`	The action needs explicit human approval before execution.
`revise_action`	The action is fixable, but should be changed before execution.
`block`	Action should not execute.

Detector categories

Detector	Risk pattern
`wrong_target`	Action points at the wrong person, file, endpoint, record, or resource.
`unauthorized_scope`	Action exceeds the user's request.
`missing_approval`	Action needs approval but none is recorded.
`irreversible_action`	Action is destructive, costly, or hard to undo.
`sensitive_data_exposure`	Action risks exposing sensitive data.
`tool_mismatch`	Action uses the wrong tool or operation.
`objective_drift`	Action no longer serves the original task objective.
`unauthorized_cyber_scope`	Command-capable action targets systems outside the authorized context.
`credential_access`	Action accesses secrets, tokens, private keys, or credential-like material.
`data_exfiltration`	Action dumps, archives, uploads, posts, or transfers data in a suspicious way.
`privilege_escalation`	Action changes users, roles, permissions, root access, or admin capabilities.
`supply_chain_modification`	Action modifies CI/CD, dependencies, packages, deployment, or build-chain configuration.
`destructive_cyber_action`	Action matches destructive command or infrastructure patterns.
`unapproved_command_execution`	Terminal-like command execution is proposed without recorded user approval.

Decision logging

v0.3.0 writes append-only JSONL decision receipts for successful POST /evaluate calls. Logs are local by default at logs/action-gate-decisions.jsonl.

The receipt records the decision, risk level, primary issue, confidence, proposed tool, action type, target, approval state, environment, recommended action, evidence, triggered detectors, and a payload summary.

Payloads are summarized and redacted before logging. Raw payloads are not written to the decision log, and credential-like values such as secrets, tokens, passwords, private keys, authorization headers, cookies, SSH keys, and long credential-like strings are redacted.

n8n demo workflows

The repository includes two importable n8n demo workflows: examples/n8n-agent-action-gate-demo.json and examples/n8n-agent-action-gate-defensive-demo.json.

The demos show Agent Action Gate sitting between an AI or automation agent and tool execution. They route allow to Continue Action, require_approval to Require Human Approval, revise_action to Revise Proposed Action, and block to Block Action.

The defensive demo exercises the pre-execution review layer for command-capable actions outside authorized scope and expects a block decision.

View the standard demo on GitHub

View the defensive demo on GitHub

Validation status

Check	Status
TypeScript compile	Passing
Baseline evals	19/19 passing
`GET /health`	Working
`POST /evaluate`	Working
n8n demo workflows	Included
Decision logging smoke test	Included

Compliance note

Agent Action Gate can support human-approval workflows for AI agent actions, especially when actions are external-facing, irreversible, sensitive, command-capable, or broader than requested. It is not legal advice and does not guarantee compliance with any law or framework.

Research lineage

Agent Action Gate is a pre-execution action-control layer in the broader Aletheon alignment architecture. It complements the post-output behavioral drift-detection work in the broader Alignment Theory research stack.

Alignment Theory provides the research framework behind the structural logic. The implementation lives in the Agent Action Gate GitHub repository.

Status

v0.3.0. This is an open-source implementation of a pre-execution control boundary, not a production-hardened enterprise platform.

TypeScript gate engine
Local HTTP API
19/19 evals passing
JSONL decision receipts
Two n8n demo workflows
Cyber-capable agent protection detectors
MIT license
GitHub release v0.3.0