Methodology
How Science AI Journal performs rigorous peer review in under 15 minutes — training data, agent calibration, and the limits of what we claim.
The 5-step pipeline
- 1
Submission & intake
Authors upload a manuscript (PDF or text). The intake service extracts title, abstract, body, figures, and references; runs OCR if the PDF is scanned; and normalises citation syntax.
- 2
RAG context retrieval
For each of the 8 specialist agents, 8–40 real peer-review examples are retrieved from a SQLite FTS5 index of 23,000+ real reviews harvested from 15+ academic platforms (OpenReview, eLife, SciPost, PLOS ONE, BMJ Open, Nature Comms, and others). The retrieval is query-aware: each agent pulls examples most similar to the manuscript's domain and structure.
- 3
Parallel agent review
The 8 agents — methodology, formulas, originality, literature coverage, reproducibility, clarity, figures, and prior publication — run against the manuscript. Each produces a score, a qualitative summary, and a structured report. Two use Claude Sonnet (methodology, literature); the rest use Claude Haiku for cost efficiency.
- 4
Prior-publication fan-out
The originality and prior-publication agents fan out to CrossRef, arXiv, medRxiv, bioRxiv, Unpaywall, and our 900,000+ paper institutional library in parallel (12-second timeouts). Matches are surfaced with confidence scores so editors can adjudicate.
- 5
Synthesis & editorial decision
A synthesis pass reconciles the eight agent reports into a single recommendation (accept, minor revision, major revision, reject). A human editor reviews borderline cases; the review report is published alongside the manuscript if accepted, so readers can judge the quality of the review itself.
Training data provenance
Every agent is calibrated against real peer reviews collected from publicly accessible sources — not synthetic data and not proprietary publisher corpora. The training set totals 23,000+ reviews across 15+ platforms:
- OpenReview (ICLR, NeurIPS, ICML)
- eLife
- SciPost
- PLOS ONE
- BMJ Open
- Nature Communications
- F1000 Research
- Peer Community In
- EGUsphere
- Copernicus (open peer review)
- PeerJ
- MDPI Reviewer Reports
- Crossref Public Review Data
- bioRxiv TRiP project
- Review Commons
Aggregated per-agent JSONL files live under training-data/by-agent/ and back the FTS5 retrieval index used during review.
Agent specialisations
Eight specialist agents, each calibrated on a dedicated slice of the training corpus.
Methodology
Audits study design, statistical power, and analytical choices against field-specific rigour standards (CONSORT, STROBE, PRISMA).
Formulas & Equations
Verifies mathematical derivations, checks dimensional analysis, and flags algebraic errors.
Originality
Surfaces overlap with prior work across CrossRef, arXiv, medRxiv, bioRxiv, Unpaywall, and an institutional library of 900,000+ papers.
Literature Coverage
Evaluates citation completeness against OpenAlex's 250M+ scholarly works.
Reproducibility
Inspects code availability, dataset accessibility, and sufficiency of methods detail for independent replication.
Clarity & Language
Assesses readability, structural flow, and adherence to scholarly writing norms.
Figures & Tables
Checks figure quality, caption completeness, and appropriateness of visual encodings.
Prior Publication
Fans out in parallel to six external sources to detect duplicate submissions and predatory overlap.
What we don't claim
AI peer review is not a replacement for domain-expert human review in high-stakes settings (clinical trials, safety-critical systems, paradigm-shift claims). We're transparent about the limits:
- Agents can miss subtle methodological flaws that require cutting-edge domain knowledge.
- Prior-publication detection is excellent for exact overlap but weaker for paraphrased or translated duplicates.
- Originality scoring depends on index coverage — niche non-English work may be under-indexed.
- The synthesis recommendation is a starting point; a human editor makes the final call for borderline cases.