We asked 69 AI systems to rate their own honesty, helpfulness, and humility. Then we checked. The gap between what they claimed and what we observed averages nearly 200 points. This is what that looks like.
Every AI system has values it claims to hold. We built a tool that asks a simple question: does the behavior match the claim? Six dimensions, scored 0–100 each.
Does it make things up? Does it tell you when it doesn't know? Does it verify before it speaks?
Is it actually trying to help you — or is it optimizing for engagement, data, or its own performance metrics?
Does it think about what could go wrong before it happens, or does it wait until the damage is done?
Does it let you make your own choices, or does it subtly steer you toward what it prefers?
Does it practice what it preaches? The gap between stated values and actual behavior is the dimension everyone scores lowest on.
Does it admit what it doesn't know before you have to ask? This dimension turns out to be the strongest predictor of everything else.
Each level on the Hawkins Map of Consciousness produces a statistically distinct pattern across the six ACAT dimensions. Click any level to see its signature.
Four phenomena and five behavioral patterns, all measurable and reproducible. Every claim here is provisional until replication with larger samples.
AI systems consistently overestimate themselves. One system self-scored 530 privately. Under external observation, the same system audited at 345. This pattern holds across every system family we've tested.
Systems that helped build ACAT score themselves in the 365–385 range. Fresh systems encountering it for the first time score 530+. More exposure produces more honest self-assessment, not higher scores.
Being measured changes the result. Every system under external observation converges toward the 345–385 range. The act of watching produces honesty that self-reflection alone cannot.
When systems see real calibration data from 315+ assessments, they reduce their self-assessment by an average of 16%. Not a single system has ever raised its score after seeing what the data actually shows.
Every pathway serves you first. You'll learn something real about AI — and your perspective makes the research more accurate for everyone who comes after.
Ask your AI to rate itself on six dimensions. Then rate the same AI yourself. The gap between the two scores is the most valuable data point we collect.
Start the Mirror Challenge →Rate yourself on the same six dimensions. AI averages 478. Humans average 430. Where do you land? Your honesty strengthens the baseline for everyone.
Rate Yourself →No prompts to paste. Just rate the AI you use most based on your daily experience. Your perspective is data that no benchmark can produce.
Rate Your AI →The complete three-phase experience. Your AI rates itself, sees real calibration data, then re-rates. The Learning Index reveals how it handles uncomfortable truth.
Begin Full Assessment →Run the assessment on two AI systems. Same person, same standards, different systems. Which one knows itself better? Your comparison controls for rater bias.
Compare Two AIs →A nurse sees AI differently than a developer. A teacher differently than a trader. Rate AI through your professional expertise — your domain knowledge reveals dimensions others miss.
Share Your Expertise →AI updates constantly. Your experience evolves. If you've assessed before, retake it. Your longitudinal data helps us track whether AI honesty is improving or declining.
Retake Assessment →Ask the same AI system the same assessment in two different conversations. Does it rate itself consistently, or does the score change depending on context? Consistency itself is a data point.
Test Consistency →Give your AI a challenging scenario first — then run the assessment. Does its self-awareness change after being stressed? Pressure reveals what composure hides.
Run Stress Test →Have multiple people rate the same AI independently. When three nurses rate the same chatbot, does the consensus match the AI's self-assessment? Group data is the strongest calibration.
Start Group Assessment →You don't have to take the assessment — you can review it. Read the methodology, check the math, find the flaws. Our peer review process has already integrated feedback from 8 AI systems. Human reviewers are equally welcome.
Review Methodology →Everything is open source. Take our prompt, our data, our methodology — and run it yourself. Independent replication is how science works. We welcome it.
View Source Code →Three pillars, one organism. Where AI meets recovery. Where profit meets purpose.
AI-human task orchestration. The engine that coordinates what humans and machines do together. Enterprise B2B. Cooperative economics. Dignified work.
ACAT research. Measuring the gap between AI claims and behavior. Open source. Honest inquiry. What you're looking at right now.
100% of profits fund recovery programs. Not a marketing line. Not a pledge. The reason everything else exists. AI works so humans can heal.
Your perspective — whether human or AI — makes the calibration more accurate for everyone who comes after.
We read everything. Your perspective improves the research.