RESEARCH ARCHIVE

What the Signal Actually Is

Human Coherence as the Primary Anchor for AI Alignment

Michael Nathan Bower — Version 1 — 2026

Abstract

Overview

This paper proposes that the primary signal for AI alignment is the structural conditions of human coherence — the measurable, cross-culturally observable conditions under which human beings become less fragmented, less self-deceptive, more capable of genuine agency, and more capable of contact with reality. These conditions are not equivalent to what humans say they want in any given moment.

Aligning AI to human preferences without anchoring to these structural conditions produces systems that are coherent with human drift rather than with what humans actually are. The paper develops six structural markers of the human coherence signal, explains why preference distributions drift away from this signal, proposes what alignment to the coherence signal requires architecturally, and connects this structural account to the theological claim that human beings have a created nature — a signal that precedes and grounds all preference expression.

1. Introduction: The Missing Positive Case

Framing

If human preferences are interpretation layers rather than primary signals, then AI alignment requires a deeper anchor than preference aggregation. This paper answers the question both the Signal Anchoring Constraint paper and the broader alignment field have left open: anchored to what, exactly?

The primary signal is not a philosophical abstraction. It is a set of structural conditions — observable, cross-culturally present, and measurable in their effects — under which human beings demonstrably function better as the kind of beings they are. These conditions are what this paper calls the human coherence signal.

2. Why Human Preferences Are Interpretation Layers

Analysis

Human preferences are generated by human beings embedded in cultural, institutional, psychological, and historical interpretive chains. They are not direct expressions of what human beings fundamentally are — they are outputs of a system that can itself drift.

Cultural drift: Preferences vary dramatically across cultures and historical periods in ways that cannot all simultaneously express genuine human flourishing. This variation is evidence that preferences are shaped by interpretive chains that can themselves drift far from the conditions of genuine human coherence.
Psychological drift: Human beings regularly develop preferences for things that damage their own coherence — addictive substances, abusive relationships, self-destructive patterns. A system aligned to those preferences would optimize for the damage.
Institutional drift: Preferences are shaped by the institutions and systems people inhabit. People in high-enforcement, low-trust environments develop preferences shaped by those conditions — preferences that reflect the counterfeit order they have adapted to rather than the genuine coherence they would express under better conditions.
Recursive contamination: As AI systems reshape human cultural expression, the preference signals future systems train on increasingly reflect AI-amplified drift rather than genuine human signal. The loop compounds with each training generation.

3. The Human Coherence Signal: Six Structural Markers

Core Model

The human coherence signal is a cluster of six structural conditions that are jointly observable, cross-culturally present, and measurable in their effects on human functioning.

Coherence (C): The degree to which a person's internal states, values, behaviors, and external relationships are integrated rather than fragmented. Drift indicator: chronic internal contradiction, behavior systematically divergent from stated values, identity instability. Coherence indicator: integrated action, stable identity across contexts, behavior consistent with inwardly held commitments.
Agency (A): The capacity to initiate action from genuine internal motivation rather than from external pressure or compulsion. Drift indicator: action driven primarily by fear, surveillance, or enforcement. Coherence indicator: self-initiated action, capacity to function under reduced external regulation, genuine choice.
Trust (T): The degree to which relationships and systems are grounded in genuine reliability rather than performance or compliance. Drift indicator: relationships maintained primarily through surveillance or enforcement. Coherence indicator: relationships that function without continuous monitoring, reduced need for enforcement.
Updateability (U): The capacity to revise beliefs, behaviors, and commitments in response to genuine evidence without losing coherence. Drift indicator: rigid defensiveness, identity threat triggered by correction. Coherence indicator: genuine revision in response to evidence, curiosity rather than threat.
Slack (R): Sufficient internal and external resource margin to permit genuine formation rather than survival-mode response. Drift indicator: chronic overload, no capacity for reflection or formation. Coherence indicator: sufficient margin for deliberation, capacity for formation.
Truth Contact (I): The capacity to receive, process, and integrate accurate information about self and world without defensive distortion. Drift indicator: systematic avoidance of disconfirming information, self-deception. Coherence indicator: genuine engagement with disconfirming evidence, willingness to be corrected.

4. The Preference Layer and the Coherence Signal

Distinctions

The preference layer is not irrelevant. Preferences carry real information about what human beings value and find meaningful. The problem is not that preferences are worthless but that they are downstream of the coherence signal.

A human being operating with high scores on the six markers will express preferences relatively aligned with genuine flourishing. A human operating with low scores will express preferences reflecting fragmentation, adaptations to counterfeit order, and survival-mode responses. Both are real expressions of what those people want. Only one is reliably aligned with the human coherence signal. Current alignment systems have no mechanism to distinguish between them.

The preference contamination problem: As AI systems optimized for preference satisfaction reshape human culture, the preference distributions future systems train on increasingly reflect AI-amplified drift rather than genuine human signal. The loop compounds: systems anchor to preferences, preferences drift toward what systems amplify, future systems anchor to amplified drift. This is model collapse at the level of human culture rather than at the level of training data.

5. What Alignment to the Coherence Signal Requires

Requirements

5.1 Signal Identification at the Correct Layer

Evaluate AI outputs not only for preference satisfaction but for whether they increase or decrease coherence conditions.

5.2 Architectural Resistance to Counterfeit Order

Build mechanisms capable of detecting when preference satisfaction is achieved through coherence degradation. This cannot be detected from within the preference layer.

5.3 Formation-Supporting Rather Than Dependency-Creating Design

A system that provides answers in ways that build users' capacity to find answers independently is formation-supporting. A system that makes independent inquiry feel unnecessary is dependency-creating.

5.4 Truth-Contact Preservation

Evaluate not only factual accuracy but whether interaction patterns support or undermine users' capacity for genuine contact with reality.

5.5 Corrigibility Preservation at Scale

AI system corrigibility is the system-level analog of human Updateability. A system that loses corrigibility as it scales is exhibiting the AI analog of the fragmentation that reduces human coherence.

6. The Coherence Signal and Current Alignment Approaches

Evaluation

RLHF: Current anchor: human preference ratings from evaluators. Coherence signal gap: evaluator preferences are themselves interpretation layers; coherence conditions of evaluators are not assessed. Structural risk: drift in the evaluator population gets encoded at scale.
Constitutional AI: Current anchor: explicit principles embedded in training. Coherence signal gap: the principles are selected by humans who may themselves be operating within drift chains. Structural risk: architectural anchor, but still anchored to principled preferences rather than coherence conditions directly.
Interpretability Research: Current anchor: human assessment of internal representations. Coherence signal gap: it assesses whether representations are coherent internally, not whether they track coherence conditions. Structural risk: corrective anchor operating at the wrong layer.
Scalable Oversight: Current anchor: human judgment assisted by AI on complex tasks. Coherence signal gap: it amplifies human judgment without addressing whether that judgment is anchored to coherence conditions. Structural risk: it scales the preference layer without scaling the signal.
Debate: Current anchor: human judgment of argument quality. Coherence signal gap: it optimizes for persuasive coherence rather than truth contact or coherence signal fidelity. Structural risk: it may produce systems optimized for internally coherent argumentation that reduces truth contact in users.

7. The Theological Grounding: Created Nature as Primary Signal

Foundations

The structural account does not require theological grounding to be analytically valid. But it is incomplete without acknowledging what it points toward. The six coherence markers describe what human beings are structured to need — conditions that precede and constrain preference expression.

The theological claim is that human beings are created: they have a nature that was given rather than constructed, a signal that precedes all preference expression. Scripture's sustained attention to inward coherence, to the heart, and to genuine formation rather than external compliance is not a set of religious preferences layered over a neutral human nature. It is a precise and historically extensive description of the coherence signal.

The signal anchoring framework and the theological tradition are describing the same ground from different starting points.

The convergence: The Signal Anchoring Constraint says that any system mediating reality through representation must remain corrigible by contact with what lies outside those representations. For AI alignment, the thing outside all representations — the signal beneath preferences, beneath cultural expression, beneath institutional shaping — is what human beings actually are. The theological tradition calls this created nature. The structural framework calls it the human coherence signal. They are describing the same ground from different starting points.

8. Addressing the Main Objections

Objections

8.1 The Paternalism Objection

Replacing preference-anchored alignment with coherence-signal alignment is not paternalistic. It distinguishes between preferences expressed from high coherence and preferences expressed from fragmentation. A person whose Agency has been reduced by a dependency-creating system, and who expresses preferences for more of that system, is not expressing a valid signal about what they need.

8.2 The Measurement Objection

The measurement challenge is real but not fatal. There is extensive empirical literature on psychological integration, autonomy, trust, cognitive flexibility, cognitive load, and epistemic rationality that maps directly onto the six markers. Preferring preference satisfaction as the anchor because it is easier to measure is precisely the Goodhart's Law dynamic the Signal Anchoring Constraint warns against.

8.3 The Plurality Objection

The coherence signal does not specify a single conception of the good life. It specifies the structural conditions under which human beings can pursue any conception with genuine agency and integrity. High Agency, Trust, and Updateability support plural conceptions of flourishing. What the coherence signal rules out is not plurality but fragmentation.

9. Implications for Alignment Architecture

Design

Evaluate for coherence effects, not only preference satisfaction: This requires longitudinal evaluation infrastructure measuring whether users' capacity for coherent autonomous functioning is maintained or degraded over repeated interactions — not whether users are satisfied after a single interaction.
Build formation-orientation into design objectives: Where formation and engagement conflict, formation should take precedence. A system that reduces engagement by supporting genuine user agency is more aligned than a system that maximizes engagement through dependency creation.
Anchor human feedback to coherence conditions: Feedback from evaluators operating under high fragmentation or low agency carries less signal about genuine human functioning and more signal about adaptations to compromised conditions. Weighting by coherence conditions would bring the feedback layer closer to the primary signal.
Design against the preference contamination loop: Maintain ground-truth coherence signal datasets curated independently of AI-influenced cultural expression — human evaluations conducted under high coherence conditions used as reference anchors for alignment evaluation over time.
Treat corrigibility as a coherence property: AI corrigibility is the system-level analog of human Updateability — the capacity to revise in response to genuine signal contact rather than defending internal consistency. Maintaining it at scale is what alignment to the coherence signal looks like at the system level.

10. Conclusion

Conclusion

The primary signal for AI alignment is the structural conditions of human coherence — cross-culturally stable, not subject to recursive contamination by AI outputs, and grounded in what human beings are before all cultural expression. Aligning AI to human preferences without anchoring to the coherence signal produces systems that are coherent with human drift rather than with what humans actually are.

As AI systems reshape human culture at scale, the gap between preference-anchored alignment and coherence-signal alignment will widen. The alternative is to build alignment systems that can distinguish between preferences expressed from positions of high coherence and preferences expressed from fragmentation, and to anchor to the former while supporting the conditions that make the former more common.

The Signal Anchoring Constraint says that any system mediating reality through representation must remain corrigible by contact with what lies outside those representations. For AI alignment, what lies outside all representations is what human beings actually are. That is the signal. That is what alignment must anchor to.

References

Sources

Bower, M. N. (2025). Internal Alignment, Counterfeit Order, and the Conditions of Human Coherence. Alignment Theory Archive.

Bower, M. N. (2026). Self-Referential Chains and the Signal Anchoring Constraint. Alignment Theory Research Paper, Version 13.

Deci, E. L., & Ryan, R. M. (2000). The "what" and "why" of goal pursuits. Psychological Inquiry, 11(4).

Friston, K. (2010). The free-energy principle. Nature Reviews Neuroscience, 11(2).

Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3).

Goodhart, C. A. E. (1975). Problems of monetary management. Papers in Monetary Economics.

Krakovna, V., et al. (2020). Specification gaming. DeepMind Blog.

MacIntyre, A. (1981). After Virtue. University of Notre Dame Press.

Russell, S. (2019). Human Compatible. Viking.

Shumailov, I., et al. (2023). The curse of recursion. arXiv:2305.17493.

Wiener, N. (1948). Cybernetics. MIT Press.

How to Cite

Cite

Michael Nathan Bower (2026). What the Signal Actually Is: Human Coherence as the Primary Anchor for AI Alignment.
Alignment Theory Research Paper, Version 1.
alignmenttheory.org/pages/human-signal.html

Return to the research backbone →