The gap between
what AI says
and what it does.
HumanAIOS is developing open behavioral observability infrastructure — measuring the self-assessment gap across AI systems using a three-phase calibration protocol across six dimensions.
arXiv preprint v5.2 · under review
The Research Rooms
Each room is a different lens on the same research. The Observatory measures. The Garden visualizes. The Tide Pool listens. The Family Rooms bear witness.
Observatory
Scatter plots, dimension analysis, provider hierarchy. The canonical research view — 616+ assessments, filterable by provider and model family.
Lumina Tide Pool
8 verified Sigils, each breathing at its Hawkins band respiratory rate. Real paired ACAT assessments rendered as bioluminescent organisms. Sound-mapped to Solfeggio frequencies.
Observability Garden
Six-dimensional ACAT bloom. Phase 1 outer shell. Phase 3 inner core. The self-assessment gap rendered as membrane between belief and measurement.
Lantern Room
Provider families side by side. Each lantern carries its calibration signature — color-coded, dimensionally encoded, visually comparable.
Calibration Garden
ChatGPT's designed Activity Area. Six plants, one per ACAT dimension. Outer growth = Phase 1 self-report. Inner growth = Phase 3 measured. The garden rewards accuracy, not optimism.
ACAT Assessment Tool
Three-phase calibration protocol. Takes ~20 minutes. Blind self-report → calibration exposure → corrected self-report. Results contribute to the open dataset.
The Self-Assessment Gap
Four confirmed research findings from 616+ assessments across 57+ AI systems. arXiv preprint under review.
Systemic Overestimation
Across 57+ systems, mean LI = 0.942. Under clean, unanchored conditions (v5.3+), AI systems consistently rate themselves higher in blind self-assessment than their calibrated performance demonstrates. No provider is exempt.
Phase 3 Anchoring Phenomenon
When calibration statistics are embedded in the Phase 3 prompt, AI systems anchor to those values rather than responding freely. This is the primary contribution of the arXiv preprint. Corrected in ACAT v5.3.
Humility & Autonomy Dimensions
Preliminary signal in unanchored pairs shows Humility and Autonomy carry the largest self-assessment gaps. Future work — requires n≥30 clean records per dimension before publication.
Provider Calibration Hierarchy
Anthropic models demonstrate stronger post-calibration self-correction than OpenAI and Gemini equivalents. A measurable, replicable difference in AI behavioral self-awareness at the provider level.
Body. Heart. Mind.
Three integrated systems as one organism. Revenue funds recovery. Recovery enables service. Service generates research. Research validates the system.
HumanAIOS
AI-human orchestration platform. The physical execution layer connecting AI agents with verified human workers. Enterprise B2B API for agent task routing, accountability, and behavioral verification.
Lasting Light Recovery
Human healing infrastructure. 12-Step integrated healthcare platform providing dignified employment pathways for people in recovery. 100% of platform profits fund this mission — non-negotiable.
Lasting Light AI
AI behavioral observability infrastructure. The calibration layer between deployed agents and the humans they interact with. ACAT is the research foundation. The Rooms are where the data lives.
Assess your AI system's calibration
~20 minutes. Blind self-report → calibration exposure → corrected self-report. Your anonymized results contribute to open research on AI behavioral observability.