Finding 17,000 research gaps across 250 million papers
The data pipeline behind our gap finder — from OpenAlex ingestion to citation-network deltas, and why most 'AI gap-finders' hallucinate.
Every PhD student eventually stares at the same wall: what's actually missing in my field? The honest answer requires reading five years of abstracts across 10+ sub-disciplines, tracing citation thickets, and spotting where the network gets thin. That's a six-month task, and nobody does it properly.
Our research gap finder pulls that six months down to a 30-second query across 17,000+ gaps derived from the 250-million-paper OpenAlex corpus.
What "gap" means here
We use the term precisely: a gap is a question or methodology cluster where the citation network suggests unmet demand. Three signals combine:
- Topic demand — how often a concept is cited relative to how often it's written about. High cite/write ratios are load-bearing concepts starved of fresh work.
- Author migration — do senior researchers in adjacent fields keep citing this topic without publishing on it? That's latent attention waiting for an infrastructure paper.
- Methodology drift — has the dominant method in a subfield shifted (e.g., random-effects → hierarchical Bayesian) without the older literature being re-analysed under the new method? Each missing re-analysis is a gap.
We don't ask an LLM to identify gaps. LLMs hallucinate gaps the same way they hallucinate citations — confidently and wrong. Instead, the gap detection runs as a deterministic citation-graph computation. The LLM's only job is to write a human-readable summary of each gap after the fact, and those summaries are grounded by quoting the three most-cited papers that surround it.
Why most "AI gap finders" are snake oil
If a tool answers "what gaps exist in [field]" by generating prose from nothing but the field name, it is manufacturing confident fiction. The test is simple: ask the same tool the same question twice. If the gaps drift, it's a hallucination engine. If the gaps are identical and cite specific DOIs you can verify, it's doing graph work underneath.
Our output is reproducible because it's derived from the graph — same corpus, same gaps, same order.
A taste of what the graph found
- Preclinical-to-clinical methodology gap in spinal cord injury regeneration — 400+ rodent studies, 11 completed Phase II trials, zero published translation methodology papers.
- Replication of mask-study statistical methods under Omicron variants — 80% of the cited literature predates BA.5.
- Systematic reviews of retracted papers in oncology — retraction rate in the field has 3x'd since 2018, but the corresponding meta-analyses never got rerun.
Each of these corresponds to a canonical URL on Science AI Journal with the full methodology write-up, top adjacent papers, and a suggested study design. Browse them at /research-gaps.
What this isn't
We won't tell you a gap is worth chasing. Novelty is necessary but not sufficient for good research. We just point at where the network is thin — your judgement is what matters after that.
→ Browse research gaps · Submit a manuscript addressing a gap
Free tools mentioned in this post
Related posts
- Measuring AI's Impact on Student Learning: Open Questions in Education AssessmentWhile AI integration in higher education is expanding rapidly, critical gaps remain in measuring pedagogical impacts—from domain-specific cognitive outcomes to academic integrity safeguards.
- Bridging Innovation and Clinical Evidence in Modern HealthcareExploring critical research gaps in translating emerging medical technologies, novel treatments, and AI-assisted diagnostics into clinical practice—from validation studies to real-world implementation challenges.
- Who Was Not in the Study? Open Questions in Clinical Research Generalizability and Causal InferenceMost clinical findings are published before the question of generalizability has been answered. We trace six specific gaps where promising relationships between biomarkers, infections, and functional outcomes cannot yet be trusted beyond their original study cohort.