Peer review in 15 minutes: how Science AI Journal works
An inside look at our 8-agent review engine, what each agent checks, and why we publish full reports alongside every accepted paper.
Traditional peer review takes between 2 and 18 months, depending on field. For a PhD student with a tight submission deadline — or a clinician with a finding that might change practice — that timeline is incompatible with how science actually moves today.
Science AI Journal runs manuscripts through 8 specialised AI reviewers calibrated on 23,000+ real peer reviews scraped from OpenReview, eLife, SciPost, PLOS ONE, BMJ Open, Nature Communications, and a dozen other open review platforms. A paper submitted in the morning typically has a full editorial decision plus a line-by-line reviewer report before lunch.
What each agent checks
| Agent | Focus | Calibration source |
|---|---|---|
| Methodology | Study design, sample-size justification, CONSORT / STROBE / PRISMA compliance, causal validity | OpenReview ML track + eLife full-length reviews |
| Plagiarism & Prior Publication | Fuzzy-matched against a local 900K-paper FTS5 index plus CrossRef, Unpaywall, arXiv, medRxiv, bioRxiv in parallel | Retraction Watch + journal desk-reject corpora |
| Language & Structure | Clarity, IMRaD adherence, undefined acronyms, hedging misuse | Copy-editor annotated datasets |
| Figures & Tables | Readability, axis labelling, colourblind safety, caption sufficiency | Nature Comms + Scientific Reports figure critiques |
| Literature | Missing seminal references, over-reliance on self-citation, coverage gaps against OpenAlex | 250M-paper OpenAlex corpus |
| Statistics | Test appropriateness, multiple-comparison correction, confidence interval reporting | BMJ statistical referee notes |
| Ethics & Reproducibility | Consent, IRB, code/data availability statements, pre-registration | Guardian + PLOS retractions |
| Synthesis | Weighs all agent verdicts, produces the editorial decision | — |
Why 8 and not 1?
A single monolithic prompt hits two walls. First, it hallucinates — it wants to find issues everywhere, so it invents them. Second, it can't hold all the relevant context: 250M-paper literature coverage plus methodology rubrics plus statistics guidance plus figure-reading heuristics exceed any single model's useful attention window.
The agent pattern lets each reviewer carry only the rubric it needs and only the calibration examples matching its domain. When we tested a monolithic baseline against the 8-agent pipeline on a held-out set of 1,000 reviewed papers, the agents matched human editorial decisions 83% of the time. The monolith matched 57%.
What happens after a decision
Unlike most AI tooling, every acceptance comes with the full review reports published alongside the paper under CC BY 4.0. Authors see exactly what each agent flagged, reviewers can be cited, and readers get a second opinion baked into the publication. Open access without open review is transparency theatre; we think the two have to ship together.
What we won't claim
- We do not replace human peer review for stakes where it genuinely matters — drug trials, regulatory submissions, grant panels.
- We do not outperform a careful, well-resourced human reviewer on nuanced theoretical work.
- We do not generate novel scientific insight. We review.
What we do claim: for the 90% of submissions that need a competent, fast, transparent first pass before the world sees them, AI peer review at this quality bar is a strictly better default.