Hey folks!
I’m an independent AI researcher seeking an arXiv endorsement for the cs.AI category (cross-list: cs.CL, cs.LG, stat.ML). This is my first arXiv submission and I don’t have an institutional affiliation, so I need a personal endorsement from someone who has published in a related category.
About the paper
Title: “The Metacognitive Probe: Decomposing LLM Self-Knowledge into Five Measurable Dimensions”
The paper presents a 5-task diagnostic benchmark that decomposes LLM self-knowledge into separately-measurable dimensions — confidence calibration, epistemic vigilance, knowledge boundaries, calibration range, and reasoning-chain validation. Standard benchmarks (MMLU, BIG-Bench, HELM) measure what models know; this instrument measures what models know about what they know.
Headline finding: A 47-point within-model dissociation in Gemini 2.5 Flash — it achieves the panel’s best within-task calibration (T1-CC = 88) but the worst cross-task confidence prediction (T4-CR = 41). Flash reports confidence ≈ 100 on every factoid, including ones it gets wrong. This has direct implications for confidence-gated deployment systems.
The benchmark is evaluated on 8 frontier models (Claude Opus/Sonnet, Gemini Pro/Flash, DeepSeek-R1, GLM-5, Qwen 3, Gemma 3) and a human calibration panel (N=69). All code, data, prompts, and scoring rubrics are publicly released.
Verifiable materials
-
Live Kaggle benchmark: https://www.kaggle.com/benchmarks/rctoliveira/metacognitive-probe-measuring-llm-self-awareness
-
Google DeepMind Hackathon entry (Measuring Progress Toward AGI — Cognitive Abilities track)
-
Happy to share the full PDF privately before you decide
Endorsement details
-
Category: cs.AI (primary), cross-list cs.CL, cs.LG, stat.ML
-
Endorsement code: I4G6HG
-
To endorse, the endorser needs to have submitted 3+ papers to any cs.* category on arXiv within the last 5 years
If you’re an active arXiv author in any of these categories and willing to help, I’d really appreciate it. The endorsement takes about 30 seconds — just clicking a link and confirming. I’m happy to send you the paper first if you’d like to review it.
Thanks for your time!
Rafael Oliveira ![]()