Finding 17,000 research gaps across 250 million papers
The data pipeline behind our gap finder — from OpenAlex ingestion to citation-network deltas, and why most 'AI gap-finders' hallucinate.
Every PhD student eventually stares at the same wall: what's actually missing in my field? The honest answer requires reading five years of abstracts across 10+ sub-disciplines, tracing citation thickets, and spotting where the network gets thin. That's a six-month task, and nobody does it properly.
Our research gap finder pulls that six months down to a 30-second query across 17,000+ gaps derived from the 250-million-paper OpenAlex corpus.
What "gap" means here
We use the term precisely: a gap is a question or methodology cluster where the citation network suggests unmet demand. Three signals combine:
- Topic demand — how often a concept is cited relative to how often it's written about. High cite/write ratios are load-bearing concepts starved of fresh work.
- Author migration — do senior researchers in adjacent fields keep citing this topic without publishing on it? That's latent attention waiting for an infrastructure paper.
- Methodology drift — has the dominant method in a subfield shifted (e.g., random-effects → hierarchical Bayesian) without the older literature being re-analysed under the new method? Each missing re-analysis is a gap.
We don't ask an LLM to identify gaps. LLMs hallucinate gaps the same way they hallucinate citations — confidently and wrong. Instead, the gap detection runs as a deterministic citation-graph computation. The LLM's only job is to write a human-readable summary of each gap after the fact, and those summaries are grounded by quoting the three most-cited papers that surround it.
Why most "AI gap finders" are snake oil
If a tool answers "what gaps exist in [field]" by generating prose from nothing but the field name, it is manufacturing confident fiction. The test is simple: ask the same tool the same question twice. If the gaps drift, it's a hallucination engine. If the gaps are identical and cite specific DOIs you can verify, it's doing graph work underneath.
Our output is reproducible because it's derived from the graph — same corpus, same gaps, same order.
A taste of what the graph found
- Preclinical-to-clinical methodology gap in spinal cord injury regeneration — 400+ rodent studies, 11 completed Phase II trials, zero published translation methodology papers.
- Replication of mask-study statistical methods under Omicron variants — 80% of the cited literature predates BA.5.
- Systematic reviews of retracted papers in oncology — retraction rate in the field has 3x'd since 2018, but the corresponding meta-analyses never got rerun.
Each of these corresponds to a canonical URL on Science AI Journal with the full methodology write-up, top adjacent papers, and a suggested study design. Browse them at /research-gaps.
What this isn't
We won't tell you a gap is worth chasing. Novelty is necessary but not sufficient for good research. We just point at where the network is thin — your judgement is what matters after that.
→ Browse research gaps · Submit a manuscript addressing a gap