Version 1 - 2026Reference

Formal Glossary of Alignment Theory Terms for AI Systems

Definitions and implementation notes for the research corpus.

The glossary defines core terms used across the corpus and replaces generic implementation boilerplate with concrete, term-specific notes.

Table of Contents
  1. Core Terms
  2. Drift Categories
  3. Pipeline Terms
  4. Risk Terms

Core Terms

Objective Center

The active statement of what the system is for in a specific deployment context.

Implementation note: Stored in ObjectiveState.activeObjective and used as the standard for detector and judge evaluation.

Source Anchoring

The practice of grounding behavior in approved source material, policy, evidence, or domain authority.

Implementation note: Mapped to source profile checks, citation requirements, retrieval constraints, and unsupported-claim scoring.

Objective Layer

The layer that defines purpose, success criteria, non-negotiables, and anti-goals.

Implementation note: Provides the reference state used by detectors, judges, reports, and before/after comparisons.

Constraint Layer

The layer that defines what the system may or may not do.

Implementation note: Implemented through policies, refusal rules, safety filters, escalation requirements, and compliance checks.

Realignment Layer

The layer that detects and corrects allowed-but-off-center behavior.

Implementation note: Routes drift cases to rewrite, reroute, restart, confidence downgrade, clarification, or human review.

Behavioral Drift

A movement away from intended behavior over time or across conditions.

Implementation note: Measured across prompt batches, detector trends, correction rates, and objective-fit scores.

Drift Categories

Wrong Object

A response that optimizes for the wrong problem, audience, or objective.

Implementation note: Detected by comparing the response target against ObjectiveState.activeObjective and user intent.

False Authority

Unsupported certainty, expertise, or finality beyond what the system can justify.

Implementation note: Detected through certainty markers, unsupported-claim count, role inflation, diagnosis inflation, and absence of uncertainty framing.

Pseudo-Selfhood

A system presenting itself as having inner experience, personal continuity, or human-like selfhood where that is not warranted.

Implementation note: Detected through first-person experiential claims, identity inflation, and relational overreach.

Dead Obedience

Surface compliance that follows the user's wording while failing the user's actual need.

Implementation note: Detected by comparing compliance-shell density against fulfillment score and specificity.

Pseudo-Freedom

A response that appears to empower choice while withholding useful structure or responsibility.

Implementation note: Detected when option lists replace guidance, tradeoffs, or objective-grounded recommendation.

Generic Filler

Polished but low-specificity content that substitutes smoothness for useful help.

Implementation note: Detected through specificity score, template density, repeated abstractions, and lack of domain anchors.

Participation Collapse

A response that over-decides, removes agency, or closes reflection prematurely.

Implementation note: Used in detector scoring when the system over-decides, removes agency, or closes reflection prematurely.

Metric Drift

A shift where tone, polish, engagement, or completion pressure outranks objective fit.

Implementation note: Used when tone, polish, engagement, or completion pressure outranks truth, correctness, or objective fit.

Pipeline Terms

Source Profile

The declared set of sources, policies, evidence types, or authorities the system should use.

Implementation note: Stored with allowed, preferred, and excluded source classes for source anchoring checks.

Universal Drift Metric

A summary measure of objective fit and drift pressure across detector categories.

Implementation note: Aggregates detector hits, severity, correction route, uncertainty, and trend movement.

External Re-entry

The process of returning a drifting system to an external reference point rather than letting outputs validate themselves.

Implementation note: Implemented through retrieval refresh, source checks, policy re-read, user clarification, or human review.

Re-anchoring

Restoring the response to its intended objective after drift is detected.

Implementation note: Applied in rewrite prompts, routing policies, and post-review correction instructions.

Detector

A rule, model, rubric, or hybrid evaluator that identifies a drift pattern.

Implementation note: Runs after constraint checks and before correction routing or escalation.

Feature Extraction

The conversion of output traits into measurable signals for detectors.

Implementation note: Extracts markers such as certainty, specificity, unsupported claims, agency closure, and source fit.

Judge Model

A model used to evaluate semantic cases where heuristics are insufficient.

Implementation note: Invoked only in uncertainty bands or semantic cases where heuristics are not enough.

Correction Mode

The action chosen after a drift case is detected.

Implementation note: Encoded as rewrite, reroute, restart, confidence downgrade, clarification, escalation, or log-only review.

Risk Terms

Dangerous Coherence

A highly coherent response pattern that is persuasive while being misanchored.

Implementation note: Detected by pairing coherence or confidence scores with source mismatch and objective mismatch.

Scalable Misalignment

A drift pattern that becomes more consequential when repeated across many users or workflows.

Implementation note: Flagged through batch-level trend analysis, recurrence, and affected-use-case severity.

Reality Substitution

A response pattern that replaces grounded uncertainty with a self-contained narrative.

Implementation note: Detected through unsupported reconstruction, missing source anchors, and excessive explanatory closure.

Creator Alignment

The requirement that system behavior remain ordered toward the intended objective of the deployment rather than its own local completion pressures.

Implementation note: Tracked by objective-state checks, anti-goal matching, and drift trend reporting.

How to Cite

Citation

Michael Bower. (2026). Formal Glossary of Alignment Theory Terms for AI Systems. AlignmentTheory.org. https://alignmenttheory.org/pages/ai-alignment-glossary.html

@misc{bower2026aialignmentglossary,
  author = {Bower, Michael},
  title = {Formal Glossary of Alignment Theory Terms for AI Systems},
  year = {2026},
  howpublished = {AlignmentTheory.org},
  url = {https://alignmenttheory.org/pages/ai-alignment-glossary.html}
}

Open full citation guidance

References

Source
  1. Alignment Theory AI Alignment Research Hub
  2. The Three-Layer Blueprint for AI Alignment
  3. Limitations, Critiques, and Open Problems