How many data gaps are there in Computer Science?

225 quality-filtered data gaps from 185 distinct Computer Science papers in our local library.

How were these Computer Science gaps identified?

Extracted from the limitations, future-work and other gap-stating passages of Computer Science papers in our institutional library, then quality-filtered — boilerplate removed, near-duplicates collapsed, and only gaps scoring 4 or better for substance kept.

Computer Science · 185 papers

Data gaps in Computer Science

225 open data research questions in Computer Science — gaps in available data, datasets, benchmarks, or measurements — extracted from 185 papers in our local library. Below are representative open questions, each linked to the paper that raised it.

Representative open questions

Showing 30 of 225 — one per source paper, highest-quality first.

Target discovery and drug design in the era of artificial intelligence (2026) · doi
Graph neural networks (GNNs) are often trained on small, curated datasets and may not generalize well to larger, more diverse chemical spaces; more comprehensive training datasets are needed to improve GNN performance.
AN ITERATIVE GLMM–XGBOOST ALGORITHM WITH GROUP-AWARE CONDITIONAL PERMUTATION IMPORTANCE FOR EXPLAINING MULTILEVEL ITEM RESPONSE DATA (2026) · doi
Simulation Study 1 evaluates parameter recovery and prediction accuracy across ICC levels (0.00 to 0.25) and sample sizes, but does not test scenarios with extreme ICC values (>0.50) or highly imbalanced cluster sizes, which are common in educational and longitudinal item response studies.
William & Mary: Comprehensive AI Governance & Activity Inventory (2026) · doi
No documentation exists regarding how the AI tool review process (which approved ChatGPT Edu, Microsoft Copilot, Google Gemini, and Google Workspace AI tools) evaluates or tracks the security and accessibility outcomes post-deployment across these enterprise platforms in production use.
Can deep learning-based segmentation and classification improve the detection of renal cortical abnormalities? (2026) · doi
The study acknowledges that DMSA scintigraphy underestimates scarring in 30% of kidneys, particularly polar scars which are especially susceptible to misinterpretation. The deep learning-based segmentation and classification models were trained on DMSA images with this inherent underestimation bias, but the paper does not investigate how this label noise affects model performance or propose methods to correct for DMSA underestimation in polar scar detection.
Leveraging machine learning to enhance aerosol classification using Single-Particle Mass Spectrometry (2026) · doi
Class imbalance remains a fundamental constraint for Soot (0.8% support) and biological particles (Bacteria, Snomax, Agar, Hazelnut), where limited training data prevents model development despite their high atmospheric significance. Targeted approaches must be developed to overcome the scarcity of labeled single-particle mass spectra for these underrepresented but scientifically critical aerosol types.
Quantum Information Framework for Neural Network Generalization: A Comprehensive Experimental Analysis (2026) · doi
The modular arithmetic dataset generation supports only three operations (x+y, x²+y, x³+xy) with a fixed modulus of 97; systematic investigation of how operation complexity, modulus size, and arithmetic structure affect quantum information metrics during neural network training is absent.
AI in Cybersecurity: A Systematic Review and Conceptual Audit Model (2026) · doi
The paper identifies that conventional cybersecurity auditing uses binary compliance checks (control 'implemented' or 'not implemented') but does not quantify how frequently real-world audits fail to detect critical vulnerabilities despite passing binary checks, nor does it provide empirical thresholds for when risk-calibrated auditing should override binary compliance decisions.
Research on a strongly generalizable fault diagnosis method based on adversarial transfer learning (2026) · doi
The HDAL model demonstrated superior noise robustness compared to DANN and DSAN (90.156% accuracy with noise vs. 84.384% and 86.329% respectively), but the paper does not investigate the specific noise types, frequency ranges, or signal-to-noise ratios tested. Future work should systematically characterize the types of sensor noise present in nuclear reactor operational transients and evaluate HDAL performance across varying noise conditions relevant to real plant instrumentation.
Machine-learning-based reconstruction of Ming-dynasty defensive corridors in Yuxian (2026) · doi
The study identifies that highly suitable defense corridor areas overlap with high-value zones of kernel density and visibility density, but does not quantify the magnitude of spatial overlap, provide statistical correlation coefficients, or test whether kernel density bandwidth selection and visibility raster resolution significantly affect the overlap assessment.
LLM-Powered Silent Bug Fuzzing in Deep Learning Libraries via Versatile and Controlled Bug Transfer (2026) · doi
TransFuzz's bug transfer mechanism relies on API similarity matching to select target APIs for test migration, but the paper lacks analysis of which bug categories transfer successfully across API boundaries versus those that fail (code migration failures). Characterizing the transferability landscape of different silent bug types in deep learning libraries would enable more targeted bug transfer strategies.
From unstructured text to structured reasoning: a hybrid knowledge graph for Indonesian sentencing analysis (2026) · doi
The study evaluated entity extraction on only corruption and narcotics offenses from Indonesian court decisions; applicability to other offense categories (theft, assault, environmental crimes) and cross-jurisdictional legal systems with different statutory structures and epistemological frameworks remains untested.
Securing Fog-assisted IoT: An Adaptable and Efficient Threat Identification Approach (2026) · doi
The evaluation uses four existing datasets without specifying which specific IoT attack types (e.g., DDoS variants, zero-day exploits, protocol-specific attacks) are represented in each dataset. The generalizability of the DEL approach to emerging and unknown threat categories in fog-IoT systems remains unvalidated.
A Robust Hybrid Deep Learning Model for Multiclass Depression Classification from Speech Audio (2026) · doi
EEG data are available in the dataset repository but were not utilized; multimodal fusion integrating EEG with audio signals for improved robustness in multiclass depression severity classification remains unexplored and is explicitly positioned as a future research direction.
On the interface between linguistics, computer science and psychiatry: analyzing textual key-factors affecting BERT-based classification of schizophrenia in social media texts (2026) · doi
The r/AskDocs subreddit analysis revealed lexical overfitting with only 18 samples; this requires expansion to larger samples across multiple health-adjacent social media contexts to quantify the degree to which explicit disorder-related vocabulary inflates classification performance and to establish minimum dataset sizes needed to control lexical bias in mental-health NLP models.
Predicting Employee Attrition: A Machine Learning Approach in Human Resource Analytics (2026) · doi
While the paper notes that Overtime emerges as a major predictor in Gradient Boosting (rank 3) for attrition, reflecting workload effects, it does not empirically measure the threshold at which overtime hours transition from being a retention factor to a significant attrition driver, or examine sector-specific variation in this relationship.
Enhancing Breast Cancer Diagnosis through Machine Learning: A Robust Approach for Early Detection (2026) · doi
The breast cancer diagnosis system was trained and validated primarily on the Wisconsin Breast Cancer Dataset supplemented by unspecified 'other datasets'. Cross-validation across diverse datasets representing different patient demographics, ethnicities, age groups, and clinical environments to measure generalization capabilities to real-world scenarios has not been performed.
GaussianSeal: Rooting Adaptive Watermarks for 3D Gaussian Generation Model (2026) · doi
While the paper compares GaussianSeal against GaussianMarker and post-generation methods like 3DGS+HiDDeN on specific datasets (Chair, Lego, Hotdog, Mic), comprehensive evaluation across diverse 3D object categories with varying geometric complexity, texture density, and scale properties is not provided. Generalization of watermark capacity and robustness to complex real-world 3D scenes remains unexplored.
Scour depth prediction using machine learning and explainable AI: assessment of bridge vulnerability (2026) · doi
The study compared Gradient Boosting, XGBoost, CatBoost, Random Forest, and ANN-based models on a single scour dataset; cross-validation across multiple independent bridge scour datasets with different geological, hydrodynamic, and pier geometry characteristics is needed to confirm model generalizability for practical bridge vulnerability assessment.
A Motion-Based Compression and Tracking System for Video Camera Trap-Based Insect Behaviour Studies (2026) · doi
The proposed motion-based compression system was evaluated on four datasets (Ratnayake et al. 2020, Voort der van Driessche et al., Navid et al., and Nest Monitoring), but systematic evaluation across camera trap deployments with varying environmental conditions (wind patterns, illumination changes, vegetation density) and different insect taxa beyond Honey bees, Syrphidae, Lepidoptera, and Vespidae has not been conducted. Dataset diversity limitations prevent generalization of compression efficiency and behavioral detection accuracy claims.
Hepatitis C Diagnosis using Supervised Machine Learning Algorithms and Ensemble Learning Techniques (2026) · doi
The computational efficiency analysis (Table 8 and 9) measures training time, prediction latency, and model size for hepatitis C classifiers but does not evaluate memory consumption during inference or power requirements for mobile/point-of-care hepatitis C diagnostic deployment scenarios.
Understanding the Dynamics of Trust and Engagement in E-Commerce Recommender Systems: Trends and Influences (2026) · doi
Gamified feedback mechanisms for group shopping with real-time collaborative recommendations are mentioned as engagement-enhancing features, but no empirical data exists on optimal game mechanics, reward structures, or how group dynamics affect individual trust and satisfaction in recommender systems.
A general framework for Gaussian Splatting-based human-centric volumetric videos (2026) · doi
Feedforward generative networks for 3D Gaussian Splatting that directly predict per-frame dynamic Gaussian attributes (position, rotation, scale, color) from multi-view video remain underdeveloped, with current attempts still struggling to match optimization-based quality due to unresolved challenges in constructing large-scale high-quality dynamic datasets and improving network generalization ability.
Uncertainty Assessment in Deep Learning-based Plant Trait Retrievals from Hyperspectral data (2026) · doi
The representativeness and quantity limitations of current datasets constrain the ability of machine learning algorithms to capture the full complexity and diversity of real-world vegetation data in uncertainty assessment. Future work should focus on expanding training datasets with more diverse land cover types and environmental conditions to reduce distributional biases in deep learning-based trait predictions.
UNLOCKING SECURE SEARCH: KEY-AGGREGATE TECHNIQUES FOR ENCRYPTED CLOUD DATA RETRIEVAL (2026) · doi
The experimental evaluation lacks testing on realistic large-scale cloud datasets with millions of documents and dynamic insertion/deletion operations; scalability of the key-aggregate mechanism and optimized indexing to production-scale encrypted cloud environments remains unvalidated.
Criminal Face Sketch Recognition and Construction (2026) · doi
The drag-and-drop sketch construction interface is described qualitatively as requiring no artistic expertise, but no user study data, error rates for untrained users, or comparison of sketch quality produced by forensic artists versus novice users is provided to validate usability claims.
Smart Prediction of Weather-Induced Flight Delays Applying Deep Learning (2026) · doi
The SkyPulse AI system currently relies on a static dataset and does not ingest live flight or weather APIs. Integration with real-time data providers such as OpenSky Network or OpenWeatherMap is needed to enable genuine session-based flight delay prediction rather than batch processing on historical data.
ENHANCING IOT SECURITY USING LIGHTWEIGHT BLOCKCHAIN FOR DATA INTEGRITY AND TRACEABILITY (2026) · doi
The edge node validation process and its latency characteristics were not empirically characterized across different edge computing hardware configurations (e.g., ARM processors, Raspberry Pi variants, industrial gateways). The paper lacks specification of which edge devices and processing architectures were used in the experimental evaluation.
Deep Learning Based Fish Species and Freshness Detection Using Convolutional Neural Networks (2026) · doi
The dataset composition and size are not specified in the paper; expanding the training dataset to include multiple fish species beyond those currently represented, with images captured under diverse environmental conditions (temperature variations, humidity levels, different lighting), is needed to improve the generalization capability of the deep learning model.
Cleansera: A Context-Aware, Algorithm-Centric Data Cleaning System with RAG-Enhanced Intelligence (2026) · doi
The paper states future work will evaluate Cleansera's performance and empirical efficacy across industry datasets, but does not specify which industry types (healthcare, finance, e-commerce, manufacturing), dataset sizes, data quality profiles, or cleaning rule complexities will be tested. Comparative performance evaluation against existing data cleaning systems on standardized benchmarks is not outlined.
Artificial Intelligence and Multi-Omics for Anticancer Drug Development and Repurposing (2026) · doi
Data heterogeneity across clinical datasets (imaging, genomic profiles, morphological features) lacks standardized integration protocols for multi-omics drug discovery pipelines. The disparity between high-volume data generation and capacity to normalize and integrate these diverse modalities for AI-driven drug repurposing remains unresolved, particularly for proprietary clinical response datasets.

Working on one of these gaps? Review it with us.

Science AI Journal reviews manuscripts in one pass with 8 specialised AI agents calibrated on 69,000+ real peer reviews.

Run a pre-submission check →Run an AI review

Tools for your next paper

Pre-CheckIs your paper ready? A Tier 1–5 acceptance probability in seconds.Journal FinderA ranked shortlist of target journals from a 17,500-venue index.Duplicate Publication CheckerCheck prior-publication & salami-slicing risk across 8 sources.Citation GeneratorFreeA DOI or title → APA, MLA, Vancouver, Chicago, BibTeX, RIS. No signup.Graphical Abstract MakerFreeTurn your findings into a colorblind-safe graphical abstract.AI Review8 specialist agents return an editor-ready review of your full PDF.

Compare the category — Honest roundups of the AI research tools, ours listed alongside the alternatives.

Other gap types in Computer Science

Methodology gaps Validation gaps Application gaps Scalability gaps Theory gaps

Representative open questions

Working on one of these gaps? Review it with us.

Tools for your next paper

Other gap types in Computer Science

Command palette