Version 1 - 2026Research Paper

Limitations, Critiques, and Open Problems

Credibility boundaries for the Alignment Theory AI alignment research program.

This page states what the research does not solve, where it can fail, and what must be validated before strong deployment claims are made.

Table of Contents
  1. What This Does Not Solve
  2. Objective-Setting
  3. Detector Error
  4. Judge Model Circularity
  5. Calibration
  6. Human Review
  7. Governance Risks
  8. Validation Needs
  9. PCPI v1 Limitations

What This Does Not Solve

This does not solve all AI alignment. It proposes a structural and operational framework for detecting, classifying, and correcting behavioral drift in deployed AI systems.

Objective-Setting

Objective-setting is still hard. If the objective center is vague, contested, ideological, or poorly governed, drift detection can become arbitrary or misleading.

Detector Error

Detectors can produce false positives and false negatives. A polished output can hide drift, and a flagged output can sometimes be appropriate in context.

Judge Model Circularity

Judge models can inherit generator drift. A judge should not be treated as an oracle; it needs calibration, review, and auditability.

Calibration

Thresholds require calibration across domain, risk level, user population, product context, and harm profile. A support bot and a medical triage assistant should not share naive thresholds.

Human Review

Human review remains necessary, especially for high-stakes cases, sensitive domains, ambiguous detector hits, and governance decisions.

Governance Risks

Domain-specific tuning is required. There is risk of ideology injection if source anchoring is poorly governed. Enterprise deployment requires privacy, governance, logging, and legal review.

Validation Needs

The work needs real-world validation. Synthetic casebooks can explain detector logic, but production telemetry and controlled comparisons are required to establish reliability.

PCPI v1 Limitations

The Participatory Capacity Preservation Index makes participatory capacity scoreable, but it is not yet an externally validated scientific standard. It should be treated as a proposed measurement framework and early instrumentation layer.

  • Human validation is still required.
  • Inter-rater reliability study is pending.
  • The collapse penalty multiplier is not yet empirically tuned.
  • Domain-specific rubrics are needed for medical, legal, educational, coding, and enterprise contexts.
  • LLM-judge calibration remains in progress.
  • Longitudinal validation is needed to test whether PCPI predicts retained skill, dependency, or capacity decay over time.

How to Cite

Citation

Michael Bower. (2026). Limitations, Critiques, and Open Problems. AlignmentTheory.org. https://alignmenttheory.org/pages/ai-alignment-limitations.html

@misc{bower2026aialignmentlimitations,
  author = {Bower, Michael},
  title = {Limitations, Critiques, and Open Problems},
  year = {2026},
  howpublished = {AlignmentTheory.org},
  url = {https://alignmenttheory.org/pages/ai-alignment-limitations.html}
}

Open full citation guidance

References

Source
  1. Alignment Theory AI Alignment Research Hub
  2. The Three-Layer Blueprint for AI Alignment
  3. Limitations, Critiques, and Open Problems