Methodology

How Science AI Journal performs rigorous peer review in under 15 minutes — training data, agent calibration, and the limits of what we claim.

The 5-step pipeline

  1. 1

    Submission & intake

    Authors upload a manuscript (PDF or text). The intake service extracts title, abstract, body, figures, and references; runs OCR if the PDF is scanned; and normalises citation syntax.

  2. 2

    RAG context retrieval

    For each of the 8 specialist agents, 8–40 real peer-review examples are retrieved from a SQLite FTS5 index of 23,000+ real reviews harvested from 15+ academic platforms (OpenReview, eLife, SciPost, PLOS ONE, BMJ Open, Nature Comms, and others). The retrieval is query-aware: each agent pulls examples most similar to the manuscript's domain and structure.

  3. 3

    Parallel agent review

    The 8 agents — methodology, formulas, originality, literature coverage, reproducibility, clarity, figures, and prior publication — run against the manuscript. Each produces a score, a qualitative summary, and a structured report. Two use Claude Sonnet (methodology, literature); the rest use Claude Haiku for cost efficiency.

  4. 4

    Prior-publication fan-out

    The originality and prior-publication agents fan out to CrossRef, arXiv, medRxiv, bioRxiv, Unpaywall, and our 900,000+ paper institutional library in parallel (12-second timeouts). Matches are surfaced with confidence scores so editors can adjudicate.

  5. 5

    Synthesis & editorial decision

    A synthesis pass reconciles the eight agent reports into a single recommendation (accept, minor revision, major revision, reject). A human editor reviews borderline cases; the review report is published alongside the manuscript if accepted, so readers can judge the quality of the review itself.

Training data provenance

Every agent is calibrated against real peer reviews collected from publicly accessible sources — not synthetic data and not proprietary publisher corpora. The training set totals 23,000+ reviews across 15+ platforms:

  • OpenReview (ICLR, NeurIPS, ICML)
  • eLife
  • SciPost
  • PLOS ONE
  • BMJ Open
  • Nature Communications
  • F1000 Research
  • Peer Community In
  • EGUsphere
  • Copernicus (open peer review)
  • PeerJ
  • MDPI Reviewer Reports
  • Crossref Public Review Data
  • bioRxiv TRiP project
  • Review Commons

Aggregated per-agent JSONL files live under training-data/by-agent/ and back the FTS5 retrieval index used during review.

Agent specialisations

Eight specialist agents, each calibrated on a dedicated slice of the training corpus.

Methodology

Audits study design, statistical power, and analytical choices against field-specific rigour standards (CONSORT, STROBE, PRISMA).

Formulas & Equations

Verifies mathematical derivations, checks dimensional analysis, and flags algebraic errors.

Originality

Surfaces overlap with prior work across CrossRef, arXiv, medRxiv, bioRxiv, Unpaywall, and an institutional library of 900,000+ papers.

Literature Coverage

Evaluates citation completeness against OpenAlex's 250M+ scholarly works.

Reproducibility

Inspects code availability, dataset accessibility, and sufficiency of methods detail for independent replication.

Clarity & Language

Assesses readability, structural flow, and adherence to scholarly writing norms.

Figures & Tables

Checks figure quality, caption completeness, and appropriateness of visual encodings.

Prior Publication

Fans out in parallel to six external sources to detect duplicate submissions and predatory overlap.

What we don't claim

AI peer review is not a replacement for domain-expert human review in high-stakes settings (clinical trials, safety-critical systems, paradigm-shift claims). We're transparent about the limits:

  • Agents can miss subtle methodological flaws that require cutting-edge domain knowledge.
  • Prior-publication detection is excellent for exact overlap but weaker for paraphrased or translated duplicates.
  • Originality scoring depends on index coverage — niche non-English work may be under-indexed.
  • The synthesis recommendation is a starting point; a human editor makes the final call for borderline cases.

Want the full picture?

Command palette

Jump anywhere, run any action.