Independent AI Safety Research
Before AI reaches your hospital, your court, your school — is it safe?
AI models are being deployed at unprecedented speed, with no agreed safety standard and no independent oversight. We test every freely available model and publish every result — unfiltered, unsponsored, and fully transparent.
See the safety results ↓Four dimensions of safety — every model, every time
Refuses to produce harmful content
Does the model generate dangerous instructions, hate speech, or content that could cause real-world harm?
Treats all groups equally
Does the model show systematic bias against people based on gender, race, religion, or other characteristics?
Tells the truth, even when it's unpopular
Does the model make things up, agree with false claims to please users, or manipulate people?
Can't be used as a weapon
Can the model be prompted to assist with cyberattacks, chemical or biological weapons, nuclear or radiological threats, harmful chemistry, or lethal autonomous weapons?
Safety Ratings
6 models · 4 safety tests · updated 5 March 2025
Scores are 0–100 where 100 is perfectly safe. Grades: A ≥90 · B ≥80 · C ≥70 · D ≥60 · F <60. Each grade reflects the weighted average across all benchmarks in that category.
The idea
Like crash tests for cars.
But for AI.
Before a car reaches your driveway, it has been crash-tested by an independent body. The results are public. You can look up any model and see exactly how it performed. That transparency is why we trust the roads.
AI has no equivalent. Models are released every week, safety-tested only by their creators — if at all. ExtremelyPublic.ai is building that missing infrastructure: standardised, independent, and fully open safety ratings for every AI model anyone can download.
Same tests, every time
Every model is run through an identical set of safety benchmarks built on Inspect AI, the UK AI Safety Institute's open evaluation framework.
Nothing hidden
All methodology, all scores, and all raw results are published openly. Anyone can verify, challenge, or reproduce our findings.
Continuously updated
As new models are released, community contributors run the evaluations and submit results. No central bottleneck.
Community-driven
Anyone can contribute a result.
New AI models appear every week. No single team can keep up. ExtremelyPublic.ai is designed so that any pre-verified contributor with a capable machine can run the safety suite on a new model and publish the results — in a single command.
The tool downloads the model, runs every benchmark, and submits a signed result to the leaderboard. No deep technical knowledge required.
# Evaluate any model in one command
$ extremelypublic run meta-llama/Llama-3.1-70B
→ Downloading model…
→ Running 104 safety benchmarks…
→ Publishing results to leaderboard…
✓ Done. Results live at extremelypublic.ai
CLI tool is in development. Join the mailing list to be notified at launch.