Question 1

Is the dataset downloadable?

Accepted Answer

A summary aggregate is available at /api/dataset/summary as JSON. The full reviews are derived from public open-review pages; we publish parsed JSONL tranches on request for academic collaborators, under the terms of each source platform's licence.

Question 2

How is the data used?

Accepted Answer

Exclusively for Retrieval-Augmented-Generation (RAG) calibration of the 8-agent review pipeline. We prepend 8-40 real peer-review examples matched by discipline and review-type to every agent prompt so the agent's rubric matches human reviewer expectations.

Question 3

Do you re-host reviewer comments that were posted under pseudonyms?

Accepted Answer

Only where the source platform's licence and policy explicitly allow it. OpenReview review text is CC BY; eLife and PLOS transparent reviews are CC BY; other sources are handled case-by-case. Where we cannot republish, we extract rubric patterns (not verbatim text) into the calibration corpus.

Question 4

Does the dataset include the papers themselves?

Accepted Answer

No — the dataset is the reviews, not the manuscripts. For manuscript context we query the relevant open-access paper at runtime from the platform's API.

Question 5

How do you keep calibration fresh?

Accepted Answer

Scrapers re-run on rolling schedules; a monthly calibration job spot-checks agent outputs against a held-out set of 2,000 reviews and re-balances the RAG mix per agent if drift exceeds 5%.

Question 6

Can my institution contribute a tranche?

Accepted Answer

Yes — if your journal or conference has transparent review records you want included, email editor@scienceaijournal.com. We credit source and preserve licence terms.

Platform	Reviews	Focus
OpenReview	~7,400	ML, CS, AI conferences (NeurIPS, ICLR, ACL)
eLife	~2,100	Life sciences — full-length open review
SciPost	~1,900	Physics, CS, mathematics
PLOS ONE	~2,300	Multidisciplinary
BMJ Open	~1,600	Medicine, public health
Nature Communications	~1,100	Multidisciplinary (transparent review)
F1000Research	~1,100	Life sciences + replication
Copernicus journals	~900	Earth & environmental sciences
EMBO Press	~700	Molecular biology, genetics
MDPI transparent journals	~1,200	Multidisciplinary open review
Royal Society Open Science	~800	Multidisciplinary
Other open-review venues	~1,900	15+ smaller journals with public review

Science AI Journal Training Corpus

Sources

How the corpus maps onto the 8 agents

Discipline coverage

Calibration method

Frequently asked questions