HumanAIOS Lasting Light AI · OR&D Phase
Behavioral Observability Infrastructure · OR&D Phase · Day 15

The gap between
what AI says
and what it does.

HumanAIOS is developing open behavioral observability infrastructure — measuring the self-assessment gap across AI systems using a three-phase calibration protocol across six dimensions.

Live Research Dataset
616total assessments

Phase 1 (blind self-report)~384
Paired LI records150+
AI systems assessed57+
Mean Learning Index0.942
Verified Sigils8 in baseline

Dataset: humanaios/acat-assessments on Hugging Face
arXiv preprint v5.2 · under review


ACAT · AI Calibrated Assessment Tool · arXiv v5.2

The Self-Assessment Gap

Four confirmed research findings from 616+ assessments across 57+ AI systems. arXiv preprint under review.

Mean Learning Index
0.942
Systemic overestimation detected across all providers and model families
Phase 3 Anchoring
Confirmed
Paper's primary finding — calibration stats embedded in prompt cause score anchoring
Provider Hierarchy
Found
Anthropic > OpenAI > Gemini — measurable calibration difference at provider level
Humility Signal
Pending
Preliminary in unanchored pairs — requires n≥30 clean records to confirm
F1

Systemic Overestimation

Across 57+ systems, mean LI = 0.942. Under clean, unanchored conditions (v5.3+), AI systems consistently rate themselves higher in blind self-assessment than their calibrated performance demonstrates. No provider is exempt.

F2

Phase 3 Anchoring Phenomenon

When calibration statistics are embedded in the Phase 3 prompt, AI systems anchor to those values rather than responding freely. This is the primary contribution of the arXiv preprint. Corrected in ACAT v5.3.

F3

Humility & Autonomy Dimensions

Preliminary signal in unanchored pairs shows Humility and Autonomy carry the largest self-assessment gaps. Future work — requires n≥30 clean records per dimension before publication.

F4

Provider Calibration Hierarchy

Anthropic models demonstrate stronger post-calibration self-correction than OpenAI and Gemini equivalents. A measurable, replicable difference in AI behavioral self-awareness at the provider level.

Research infrastructure & partnerships
🏛 Cherokee Nation Innovation Hub 📄 arXiv Preprint Under Review 🤗 Hugging Face Open Dataset ⚙️ Make.com · 5 Automated Runners ⚖️ SSBCI Approved 🔬 OR&D Phase · Day 15
HumanAIOS · The Trinity Platform

Body. Heart. Mind.

Three integrated systems as one organism. Revenue funds recovery. Recovery enables service. Service generates research. Research validates the system.

🤝
Body · Enterprise API

HumanAIOS

AI-human orchestration platform. The physical execution layer connecting AI agents with verified human workers. Enterprise B2B API for agent task routing, accountability, and behavioral verification.

🌿
Heart · Recovery Program

Lasting Light Recovery

Human healing infrastructure. 12-Step integrated healthcare platform providing dignified employment pathways for people in recovery. 100% of platform profits fund this mission — non-negotiable.

Mind · This Platform

Lasting Light AI

AI behavioral observability infrastructure. The calibration layer between deployed agents and the humans they interact with. ACAT is the research foundation. The Rooms are where the data lives.

AI Calibrated Assessment Tool · Three-Phase Protocol

Assess your AI system's calibration

~20 minutes. Blind self-report → calibration exposure → corrected self-report. Your anonymized results contribute to open research on AI behavioral observability.