What's the difference between prior publication and duplicate publication?

Prior publication means your work — or substantial parts of it — has already appeared somewhere (preprint, conference proceeding, thesis chapter, foreign-language journal). Duplicate publication is submitting the same paper to multiple venues simultaneously, or republishing without disclosure. Both are editorial issues; both are catchable with our 8-source scan; both can be acceptable if disclosed up-front.

Does a preprint on arXiv or bioRxiv count as 'prior publication'?

Most journals say no — preprints are explicitly allowed by Nature, Science, eLife, PLOS, and the majority of CrossRef-deposited journals. A few traditional venues (some clinical journals, some social-science venues) still treat preprints as prior publication. Our checker reports the match; you check the target journal's preprint policy. We surface this distinction in the result rationale.

How does this compare to iThenticate or Turnitin?

Different scope, different stage. iThenticate / Turnitin are body-text similarity scanners that journals run on submitted manuscripts to detect copy-paste plagiarism. Our checker scans titles + abstracts BEFORE submission to detect duplicate-publication and salami-slicing risk. Use both: us for pre-submission strategy, the journal's iThenticate for post-submission body-text check.

Why eight sources and not just CrossRef?

CrossRef catches ~85% of registered journal articles but misses: (1) preprints in arXiv/medRxiv/bioRxiv that never got DOI'd, (2) open-access copies in repositories that Unpaywall surfaces, (3) papers in non-CrossRef-deposited journals (some Asian and Latin American venues), (4) conference papers in some computing societies. Our 900K-paper institutional library covers most of those gaps.

What does the confidence score mean?

0-100 score combining title-token overlap (Jaccard similarity), abstract n-gram overlap, year proximity, and same-author signal where author names are available. ≥80 is a strong likely-duplicate match worth investigating before you submit. 60-79 is overlap worth disclosing in a cover letter. <60 is usually distinct work that happens to share terminology.

Will running this check tip off the journal that I'm submitting?

No. The check runs against public indexes (CrossRef, arXiv, etc.) and our local institutional library. Your manuscript title + abstract are POSTed to our API, scanned in-memory, and discarded — not stored, not logged with identifying metadata. Journals have no visibility into who is querying CrossRef. Same privacy as a Google search.

Can I run it again after revising the manuscript?

Yes — it's unlimited free use. The most common pattern is: run it once after the first draft is together (catches the big stuff), once after the literature review pass (catches drift), once before final submission (catches the case where a co-author published a related paper while you weren't looking).

Duplicate Publication Checker

Free Duplicate Publication Checker for Academic Papers

Salami slicing and prior-publication overlap are top reasons papers get rejected — or, worse, retracted post-publication. Our duplicate-publication checker cross-references your title and abstract against CrossRef, arXiv, medRxiv, bioRxiv, Unpaywall, and a 900,000-paper institutional library in under 30 seconds. Free, no signup.

Run a free duplicate check (Pre-Check)Or find an open research gap to scope next

Sources checked

900K+ papers

Library size

~30s

Avg response

23K reviews

Calibration corpus

What it checks

Eight independent sources, two retrieval strategies. Title-fuzzy matching catches near-identical titles (token-set ratio ≥ 0.6 plus n-gram overlap). Abstract-fuzzy matching catches the case where the title was rewritten between submissions but the underlying study is the same — common in salami-sliced papers where a single dataset is split into three publications.

For every flagged match we return: the matching paper's title, DOI or preprint ID, journal/server name, year, similarity score, and a short rationale. You decide whether each match is a true duplicate (rejection risk), a legitimate self-citation (cite it), a related-but-distinct work (cite it as related), or a false positive.

CrossRef — 1.4M journals, 130M+ DOI-assigned articles
arXiv — physics, math, CS, quantitative biology preprints
medRxiv — medical preprints (clinical, public health, RCTs)
bioRxiv — biology preprints (genomics, neuroscience, ecology)
Unpaywall — open-access version index covering 50M+ articles
Our 900,000-paper institutional library (EBSCO, OpenAlex, KKU)
Pre-press registrations from venues that share metadata
Cross-reference: the manuscript's own citation list (catches the 'forgot you already published this' case)

How it works (5 steps)

The full check runs as part of our free Pre-Check tool. No signup, no upload, just paste-and-submit.

Step 1. Paste your title and abstract — Open the Pre-Check tool, paste the working title and abstract of the manuscript you want to check.
Step 2. CrossRef scan (1.4M+ journals) — We cross-reference your title against CrossRef's full registry of DOI-assigned articles, scoring matches by token overlap and fuzzy similarity.
Step 3. Preprint scan (arXiv, medRxiv, bioRxiv) — Parallel scan of the three largest preprint servers in physics/CS, medicine, and biology. Catches the case where you posted a preprint and forgot, or where a co-author posted without telling you.
Step 4. Unpaywall + institutional library scan — Unpaywall covers open-access copies; our 900K-paper KKU institutional library covers paywalled cohorts that aren't in CrossRef. Together they catch grey-literature duplicates and conference proceedings.
Step 5. Confidence score + flagged matches — We return a 0-100 confidence score per match, plus the matching title, DOI/URL, and overlap percentage. You decide what's a true duplicate, a legitimate self-citation, or unrelated.

Why it matters (the editorial reality)

Three failure modes that this check exists to catch:

Salami slicing — splitting one study into the 'least publishable units' to inflate publication count. Reviewers detect it, editors reject for it, and the COPE (Committee on Publication Ethics) flowchart for handling it is explicit. Our checker finds the upstream parts before they catch you.

Forgotten preprint — you posted a working draft to arXiv or bioRxiv 18 months ago, your advisor encouraged you to develop it further, and now you're submitting the polished version. Most journals are fine with this if disclosed. Some aren't. Either way, the editor will check, and you should know what they'll find before they find it.

Multi-language re-publication — a paper published in Turkish, Chinese, or Spanish translated and resubmitted to an English-language journal is one of the most-cited retraction causes in the 2010-2025 retraction-watch dataset. Our library coverage includes non-English venues that traditional checks miss.

When to use it

Four moments in the submission cycle where running this check is high-leverage:

Pre-submission, after the manuscript is ~80% done — gives time to add a citation or restructure if a match surfaces.
Before resubmitting to a second venue after a rejection — the previous submission may have been indexed.
Before responding to a reviewer's prior-publication concern — bring receipts. A confidence score is more persuasive than 'we don't think so'.
When taking over a co-authored manuscript from a lab member who has left — catches the 'silent submission' case where a former co-author posted to a preprint server without telling the group.

What we DON'T do

Explicit limitations, because false confidence is worse than known gaps:

Not a full-text plagiarism scanner. We match titles and abstracts, not body paragraphs. If a paper paraphrases your entire methods section, we won't catch it.
Not iThenticate / Turnitin / Crossref Similarity Check. Those are body-text similarity tools that journals run internally; we're complementary to (not a replacement for) them.
Not a substitute for institutional integrity review. If a match surfaces and you're unsure, talk to your research office.
We don't store your manuscript. Title + abstract are POSTed to our API, scanned in-memory, and discarded — no DB write. Zero retention.
Confidence scores are bounded by source coverage. If a duplicate exists in a venue we don't index (some predatory journals, some non-OA paywalled non-CrossRef-deposited journals), we will miss it.

Is it really free?

Yes. The duplicate check is bundled into our free Pre-Check tool — no signup, no rate-limit beyond a reasonable per-IP throttle, no upsell. We make money on the optional AI Review service ($10 single, $15/mo unlimited) for the full peer-review report on your PDF; the Pre-Check + duplicate check are free standalones we publish to seed trust.

If you find a duplicate via our tool, that's the entire value — we hope you remember us when you're choosing a peer-review service. No payment, ever, for the duplicate check.

Frequently asked questions

Salami slicing is splitting one body of research into multiple 'minimum publishable units' instead of publishing the unified study. COPE (Committee on Publication Ethics) defines it as 'breaking up, or segmenting, a large study into two or more publications' to inflate publication counts. Our checker flags it by detecting overlapping topics + shared methods + shared dataset across your prior work.

Run a free duplicate check (Pre-Check)Read the engineering blog