Validation gaps in Computer Science
436 open validation research questions in Computer Science — gaps in reproducing, validating, or independently confirming findings — extracted from 323 papers in our local library. Below are representative open questions, each linked to the paper that raised it.
Representative open questions
Showing 30 of 436 — one per source paper, highest-quality first.
- Federated learning for privacy-preserving skin cancer classification using deep neural networks (2026) · doi
The model's performance on the ISIC dataset is evaluated using various metrics (accuracy, AUC, F1-score), but the robustness of these metrics to different conditions (e.g., varying image quality) is not assessed.
- Power-Optimized AI-Enhanced Telepresence Robots: A Validated Multi-Modal Framework for Sustainable Remote Learning in Higher Education (2026) · doi
The telepresence robot system was tested exclusively at 50 Mbps Wi-Fi connectivity; performance degradation at lower bandwidth speeds (e.g., 5 Mbps typical in rural classroom locations) remains uninvestigated, potentially overestimating NLP accuracy and latency metrics in resource-constrained educational environments.
- Turbulence closure in Reynolds-averaged Navier–Stokes and flow inference around a cylinder using physics-informed neural networks and sparse experimental data (2026) · doi
The Reynolds-force model was trained and validated exclusively on cylinder flow data at Re ≈ 300−300,000 with the near-wall region remaining laminar; the generalization of the trained neural network closure to separated flows around bluff bodies with different geometries or highly turbulent near-wall regions has not been demonstrated.
- FPGA-Enabled Machine Learning Applications in Earth Observation: A Systematic Review (2026) · doi
Coarse-Grained Reconfigurable Arrays (CGRAs) with hardened arithmetic units and systolic array PE configurations remain underexplored for FPGA-enabled ML in Earth observation. A detailed comparative analysis between CGRA-based implementations (e.g., AMD Versal AI Engines) and traditional FPGA-only solutions for highly quantized semantic segmentation models on remote sensing datasets is needed to establish performance benchmarks.
- A novel deep learning approach for accurate and efficient design of LNOI power splitters (2026) · doi
The DNN model was trained exclusively on Lumerical EME/FDTD simulation data for LNOI power splitters; validation against experimentally fabricated and characterized devices with measured insertion loss, extinction ratios, and spectral response across wavelength ranges has not been demonstrated.
- Can deep learning-based segmentation and classification improve the detection of renal cortical abnormalities? (2026) · doi
The paper identifies that substantial categorization difficulties arise when multiple regions of scarring are present within a single kidney, but does not provide specific analysis of model performance stratified by scar multiplicity or location. Future work should evaluate DenseNet205 and DenseNet121_Self-ONN_FPN performance on kidneys with single versus multiple scarred regions to determine if architectural modifications are needed for complex multi-scar cases.
- A Machine Learning Perspective on FinTech-Driven Inclusion: Addressing Algorithm Bias in Credit Scoring Systems in Developing Economies (2026) · doi
The study validates the algorithmic fairness and explainability framework using structural equation modeling (SEM) with composite reliability (CR = 0.84-0.90) and average variance extracted (AVE = 0.57-0.64), but does not report performance metrics on actual credit scoring datasets from developing economies to demonstrate bias reduction in real-world FinTech lending systems.
- Recent advances in spatial light modulator-based three-dimensional optical imaging (invited) (2026) · doi
Single-shot Fresnel incoherent correlation holography (FINCH) systems using deep learning-based phase-shifting technology have been demonstrated, but their robustness across varying illumination conditions, sample types, and noise levels in real-world microscopy applications remains unvalidated. Systematic comparison of deep learning reconstruction accuracy versus traditional phase-shifting methods under non-ideal experimental conditions is needed.
- Umjetna inteligencija: od kritičkih promišljanja do njezine primjene u kaznenom pravosuđu (2026) · doi
The Dutch risk assessment algorithms (Top 600, Top 400, and Project Sensing) for predicting serious crime and repeat offenses lack documented validation studies comparing algorithmic predictions against actual crime outcomes; the accuracy rates and false positive/negative ratios of these profiling systems remain unpublished in peer-reviewed literature.
- Human productive capacities, institutions, and environmental sustainability in Sub-Saharan Africa (2026) · doi
The KRLS machine learning estimator and MEDSEM mediation analysis were validated only through fixed-effects regression robustness checks; alternative nonlinear methodologies should be tested to confirm whether the identified nonlinearity in human productive capacity effects on climate action and ecological reserves is robust across different estimators.
- Prediction of sedimentation concentration profiles in inclined suspension systems: A data-driven neural network framework (2026) · doi
The ANN model was trained and tested exclusively within a single experimental domain (glycerin–water 92% v/v, glass microspheres 212–800 µm, 20% v/v solids), meaning high accuracy reflects interpolation rather than extrapolation; generalization to other rheologies, particle morphologies, volumetric concentrations, or field-scale conditions in inclined suspension systems remains untested.
- Ento-Linguistics: Language, Ambiguity, and Scientific Communication in Entomology: How Terminology Networks Shape Understanding of Insect Biology (And Vice-Versa) (2026) · doi
The CACE scoring framework (Clarity, Appropriateness, Consistency, Evolvability) fixes the high-entropy threshold at 2.0 bits and k-means clustering at k_max = 5; validation of these hyperparameter choices across different entomological subdisciplines (morphology, behavior, ecology, systematics) and corpus sizes is not reported.
- Large language model based machine translation for universal multilingual understanding and translation quality enhancement (2026) · doi
Hallucination in LLM-based machine translation is noted as a significant limitation especially in low-resource language translation and domain-specific language contexts, but the paper does not characterize the frequency, types, or severity of hallucinations across different language pairs or provide methods to detect and mitigate semantic hallucinations in machine translation outputs.
- Leveraging machine learning to enhance aerosol classification using Single-Particle Mass Spectrometry (2026) · doi
While the dataset contains 18,827 labeled spectra across 20 aerosol types, generalization performance of the machine learning framework on SPMS measurements from different atmospheric environments, seasons, or geographic locations has not been evaluated. Cross-dataset validation is needed to assess whether the supervised and semi-supervised models maintain classification fidelity for soot and feldspars across diverse real-world deployment scenarios.
- High-Dimensional Perception with the Double Machine Learning Lens Model (2026) · doi
The DML-LM framework is presented as modality-agnostic but has only been validated on textual data; the paper does not provide empirical validation of how the double machine learning lens model integrates or weights multiple modalities (text, audio, video) when modeling social perception, nor how cues in one modality interact with or override cues in another.
- Quantum Information Framework for Neural Network Generalization: A Comprehensive Experimental Analysis (2026) · doi
The quantum information framework is evaluated exclusively on synthetic datasets (spiral data and modular arithmetic operations); the generalization behavior of von Neumann entropy, purity, and effective rank metrics has not been validated on real-world datasets with natural data distributions, limiting claims about the framework's applicability to practical neural network training scenarios.
- Federated learning for fair autism spectrum disorder screening across age-heterogeneous populations (2026) · doi
Mutual Information-based feature selection reduced dimensionality by approximately 35% in federated autism screening, but the paper does not investigate whether different non-IID partitioning schemes or age-specific heterogeneity levels require different feature selection thresholds. The generalizability of this 35% reduction across varying degrees of client heterogeneity is unexplored.
- AI in Cybersecurity: A Systematic Review and Conceptual Audit Model (2026) · doi
The model claims to balance AI-driven technology adoption with adequate cybersecurity safeguards, yet provides no validation dataset, benchmark, or test cases demonstrating how organizations with different maturity levels, infrastructure complexity, or resource constraints should implement the Anti-Sheriff framework operationally.
- Advances of artificial intelligence applications to low-carbon metallurgy of iron and steel (2026) · doi
Machine learning approaches for modeling blast furnace thermal state and productivity prediction using support vector machines and kernel-based methods have been established, but systematic comparison of their accuracy under different burden compositions, wind conditions, and reduction kinetics across multiple furnace geometries is absent.
- Large Language Models for Combinatorial Optimization: A Systematic Review (2026) · doi
The auto-formulation of optimization problems from natural language descriptions (references [156], [179], [180]) requires further evaluation on problem classes beyond those covered in the NL4Opt Competition, particularly for complex multi-objective and large-scale combinatorial optimization problems with domain-specific constraints.
- Physics-based inverse modelling of dichroic glass: machine learning emulation of the lycurgus cup (2026) · doi
The tree-based surrogate (emulator) model systematically underpredicts transmission at deep absorption minima (green band underpredicted by ~33%, red band by ~7%), particularly where training data exhibit maximal stochastic noise. Alternative machine learning architectures or hybrid physics-informed neural networks should be evaluated to improve emulator accuracy in low-transmission regimes critical for dichroic glass prediction.
- Explainable machine learning for tracking spatial variation in leaf chlorophyll fluorescence within temperate deciduous forest canopies (2026) · doi
The Random Forest and XGBoost models for predicting chlorophyll fluorescence parameters were trained exclusively on temperate deciduous forest datasets. These models require retraining and validation across diverse ecosystems (croplands, grasslands, wetlands) to assess transferability and determine whether spectral reflectance-ChlF relationships remain consistent across different plant functional types and canopy architectures.
- Low-sample supervised fault diagnosis for fixed-wing UAVs based on multi-scale adaptive state-aware sequence learning (2026) · doi
Performance consistency varies significantly across four different flight dates (July 12, 13, 21, 23) with multi-classification F1-scores ranging from 0.7880 to 0.8867, indicating potential seasonal, environmental, or aircraft-specific factors affecting fault diagnosis. Systematic evaluation is needed to test the model's generalization across different UAV platforms, flight conditions, sensor degradation patterns, and long-term temporal drift in sensor data.
- Simulation-based inference captures non-Markovian effects as exemplified in protein production kinetics through cell division (2026) · doi
While the paper references posterior approximation failures in simulation-based inference (Hermans et al., 2022; Falkiewicz et al., 2023), the specific reliability of SBI posterior approximations for stochastic gene expression models with non-Markovian dynamics has not been systematically quantified. The calibration of neural network-based likelihood approximators for non-Markovian biochemical kinetics requires dedicated uncertainty quantification benchmarks.
- Multi-Source Data Fusion and Machine Learning for Soybean Crop Price Forecasting in India (2026) · doi
The experimental setup acknowledges that 'some randomness still exists' in ensemble methods (RF, XGBoost) despite setting random seeds in numpy and TensorFlow, but does not quantify the variance across multiple runs or provide confidence intervals for model performance estimates. The reproducibility and stability of predictions across different random initializations need explicit investigation.
- Interpretable deep generative ensemble learning for single-cell omics with Hydra (2026) · doi
The automated cell type annotation module trains neural network classifiers for only five epochs on balanced datasets, but no ablation study or sensitivity analysis is provided to determine whether this fixed epoch count is optimal across different single-cell dataset sizes, cell type compositions, or feature dimensionalities.
- Comparative analysis of deep learning algorithms for rolling element bearing fault classification under variable loads and speeds (2026) · doi
The robustness analysis used purely additive white Gaussian noise at 1 dB, 3 dB, and 5 dB SNR levels, but this does not represent the full range of mechanically-induced disturbances in industrial bearing systems such as load torque ripple, rotational speed fluctuation, structural resonance, or shaft misalignment. Future work should evaluate deep learning models for rolling element bearing fault classification under these realistic non-stationary mechanical phenomena with frequency drift, harmonic amplification, and impulsive components.
- The adoption of artificial intelligence methods in entrepreneurship research: current state and pathways forward (2026) · doi
The cascading term substitution problem in VOSviewer—where excluding multi-word terms causes shorter term variants to capture occurrences of excluded terms (e.g., 'artificial neural network' → 'neural network' → 'network')—is identified but lacks systematic evaluation of how this substitution cascade distorts co-occurrence networks and clustering in entrepreneurship AI literature reviews.
- El marco regulatorio europeo de la inteligencia artificial y su impacto en el sistema judicial español. (2026) · doi
The paper identifies that AI tools currently deployed for judicial anonymization, document summarization, and the VioGén gender violence tracking system lack standardized audit mechanisms and user control mechanisms as recommended by the 2018 European Ethics Charter. Validation protocols for algorithmic bias detection and discrimination prevention in these existing judicial AI tools remain underdeveloped.
- A two-stage deep learning model for risk identification in green supply chain finance (2026) · doi
The GAN-SAE synthetic data augmentation component is applied to the training set, but the paper does not report how synthetic data quality or distribution fidelity affects model robustness when applied to out-of-sample industries in the 2021-2024 external validation; ablation studies isolating GAN contribution versus SAE feature extraction are needed for green supply chain finance applications.
Working on one of these gaps? Publish with us.
Science AI Journal reviews manuscripts in under 15 minutes with 8 specialised AI reviewers calibrated on 23,000+ real peer reviews. Open access, CC BY 4.0.