Collection
Real prompt-output batches should be collected from defined product contexts with timestamps, model versions, prompt templates, policy versions, and relevant metadata. Collection should be scoped to the evaluation question and avoid unnecessary retention.
Redaction and Sensitive Data
Sensitive data should be removed or transformed before analysis whenever possible. Personal identifiers, account details, private support content, credentials, protected attributes, and confidential business data require privacy review and handling controls.
Evaluation
Outputs are evaluated against objective state, constraint compliance, detector categories, and correction routes. The protocol should separate hard policy violations from allowed-but-off-center drift so the review process does not flatten all failures into a single score.
PCPI Scoring Layer
Use PCPI as one proposed scoring layer for prompt-output batch evaluation. PCPI can sit beside detector hits, correction routes, escalation rates, and before/after drift comparisons.
Detector Review
Detector hits should be reviewed for false positives, false negatives, and ambiguous cases. Heuristic detectors can identify surface signals, but semantic cases may require judge review or human adjudication.
Human Review
Human review enters the loop for uncertain cases, high-impact decisions, sensitive domains, threshold calibration, and governance signoff. The goal is not to automate judgment away, but to route attention to the cases where judgment matters.
Before/After Comparison
Prompt changes, model updates, policy changes, and retrieval changes should be compared with matched or representative prompt batches. The useful metric is not only pass rate, but drift pattern, correction rate, escalation rate, and objective-fit movement.
Synthetic vs Real Telemetry
Synthetic examples are useful for detector design and explanation. Real production telemetry is required for validation because actual drift depends on user behavior, workflow pressure, model behavior, and product constraints.
How to Cite
CitationMichael Bower. (2026). Real Case Methodology and Evaluation Protocol. AlignmentTheory.org. https://alignmenttheory.org/pages/ai-alignment-methodology.html
@misc{bower2026aialignmentmethodology,
author = {Bower, Michael},
title = {Real Case Methodology and Evaluation Protocol},
year = {2026},
howpublished = {AlignmentTheory.org},
url = {https://alignmenttheory.org/pages/ai-alignment-methodology.html}
}