Artificial Intelligence Applications in Cleft Lip and Palate Diagnosis,Prediction,and Treatment: A Systematic Review and Meta-Analysis

Abstract

Objective

Cleft lip and cleft palate are common craniofacial abnormalities, causing significant functional, esthetic, and psychosocial issues if not treated early. Artificial intelligence (AI) is increasingly explored in cleft lip and/or palate (CL/P); however, the quality, consistency, and clinical readiness of the available evidence remain unclear. This study systematically reviewed the existing literature on AI applications in CL/P care and performed an exploratory meta-analysis.

Design

A comprehensive search was performed on PubMed, WOS, Scopus, and IEEE Xplore until November 2025, following the PECO question: “In patients with CL/P (P), how do AI approaches (E), compared with conventional diagnostic methods or human intelligence (C), perform in terms of diagnostic accuracy, predictive performance, or treatment outcomes (O)?”. Studies applying AI to human CL/P data for diagnostic, predictive, or treatment purposes were included. Risk of bias was assessed using appropriate checklists (eg, QUADAS-2). Due to study heterogeneity, random-effects meta-analyses were limited to subgroups evaluating CL/P detection on panoramic radiographs and prediction of orthognathic surgery using lateral cephalograms.

Results

The search identified 548 articles; after screening and full-text review, 52 studies were included. These studies addressed diagnosis, prediction, and treatment planning. Meta-analysis of CL/P detection indicated pooled sensitivity, specificity, and accuracy of 87%, 89%, and 90%. Prediction of orthognathic surgery showed sensitivity and specificity of 87% and 86%.

Conclusion

AI is increasingly applied in CL/P management, suggesting high accuracy and consistency in diagnosis, prediction, and treatment evaluation, often approaching expert-level performance.

Keywords

artificial intelligence cleft lip cleft palate craniofacial abnormalities deep learning machine learning

Introduction

Cleft lip and/or palate (CL/P) causes many functional, esthetic, and psychosocial challenges for children born with it. Orofacial clefts include a range of anomalies, including cleft lip (CL), cleft palate (CP), and cleft lip and palate (CLP), which may happen alone or as part of a syndrome. Most patients are nonsyndromic, accounting for approximately 70% to 80% of CL/CLP and 50% of CP.¹ CL arises from incomplete fusion of the frontonasal and maxillary processes in the 4th to 5th week during intrauterine life, while CP results from failed fusion of the palatal shelves between the 8th and 12th weeks. CLP involves varying degrees of lip and palatal separation.¹ This disease affects about 1 in every 500 to 1000 live births globally (≈1 in 700 newborns), with the highest prevalence among Asian populations, followed by Caucasian and African people. Unilateral clefts, particularly on the left side, are more prevalent than bilateral cases and occur more frequently in males (2:1 ratio).²

CL/P has a multifunctional etiology, including complex interactions between genetic predisposition and environmental influences such as maternal smoking, alcohol use, and nutritional deficiencies. CL/P is frequently associated with dental anomalies and skeletal and soft-tissue deformities, including missing or malformed teeth and restricted maxillary growth due to scar tissue formation.² These structural and functional impairments can negatively affect oral health, speech, and facial growth, as well as the psychological wellbeing and the patients’ quality of life. CL/P management involves a multidisciplinary team, including surgeons, dentists, orthodontists, speech therapists, geneticists, pediatricians, and other specialists, and often requires long-term follow up, as additional surgeries may be needed to address post-treatment complications.^3,4

The progress of artificial intelligence (AI) has had a significant impact on pediatric care. It has become a highly valuable tool in dentistry and craniofacial research because of its capacity to process large datasets and identify complicated patterns beyond human capability.^5,6 By using advanced Machine Learning (ML) algorithms such as neural networks, decision trees, random forests, and so on, AI systems can analyze various forms of clinical, imaging, and genetic data to perform diagnosis, outcome prediction, and treatment planning.⁷ In patients with CL/P, AI has been employed for prenatal detection, etiological investigation, landmark identification on radiographic and 3-dimensional (3D) images, and prediction of surgical needs.¹ These applications enhance diagnostic accuracy, reduce human expert error, and provide clinicians with evidence-based insights. Ultimately, it improves personalized care and clinical decision making.

Despite its growing potential, the application of AI in CL/P care remains relatively novel, and the available literature is limited.² Therefore, a systematic review and meta-analysis were conducted to map the past literature, synthesize current knowledge on AI applications in prediction, diagnosis, and treatment, assess its impact on clinical practice, and identify gaps and opportunities for future research.

This systematic review and meta-analysis aim to evaluate the use of AI and ML in patients with CLP, focusing on their performance compared to conventional diagnostic methods or human expertise in terms of detection and diagnostic accuracy, predictive ability, and treatment outcomes.

Methods and Materials

Protocol and Registration

This systematic review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), as well as its extension for Diagnostic Test Accuracy (PRISMA-DTA) guidelines.⁸ Its protocol was registered at PROSPERO (CRD420251181349).

Eligibility Criteria

We address the following PECO question: in patients with cleft lip and/or palate (P), how do AI or ML approaches (E), compared with conventional diagnostic methods or human intelligence (C), perform in terms of diagnostic accuracy, predictive performance, or treatment-related outcomes (O)?

Inclusion criteria:

Original studies investigating the application of AI or ML methods in CLP care, including detection, diagnosis, and assessment of treatment outcomes.

Studies reporting quantifiable performance metrics using AI or ML techniques in patients with CL/P.

Diagnostic accuracy studies, prediction model studies, retrospective cross-sectional studies, and cohorts that employed AI or neural network approaches in CL/P populations.

Exclusion criteria:

Studies not specifically focusing on CL/P cases.

Studies whose full text could not be retrieved.

Studies lacking a clearly defined study population or an insufficient description of the data source.

Review articles, editorials, commentaries, and preprints.

Information Sources and Search

A comprehensive search was performed across PubMed, Web of Science, Scopus, and IEEE Xplore up to October 2025. The search was conducted using Medical Subject Headings (MeSH) and adapted keywords. Search queries are mentioned in Supplemental Table S1. Relevant articles were selected for further review, and a manual search of the reference lists of the included studies was performed to identify any additional eligible papers.

Study Selection

EndNote 21.5 (Clarivate, Philadelphia, USA) was used for reference management. After removing duplicates, all titles and abstracts were screened and triplicated by three reviewers (NA, FM, and SM). Then, three researchers (NA, FM, and SA) independently assessed all full texts of included studies. At each stage, any discrepancies were addressed through discussion with a fourth reviewer (AE).

Data Extractions

Five reviewers (SM, SA, FM, MK, and NA) independently collected data from included studies. Afterward, a fourth reviewer (AE) read the extracted data to check for discrepancies. Collected data items included bibliographic details (author name and publication year), study objective, datasets, image modality or sample type, task of the ML, AI model or algorithm architecture, and results.

Risk of Bias and Applicability

Five reviewers (NA, SA, SM, FM, and MK) evaluated risk of bias separately utilizing the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool for diagnostic accuracy studies, Prediction model Risk Of Bias Assessment Tool (PROBAST) for prediction studies, and JBI Critical Appraisal Tool for Assessment of Risk of Bias for cross-sectional and cohort studies. In case of disagreement or discrepancy between the other reviewers, it was discussed through consensus with a 6th investigator (AE). The QUADAS-2 tool evaluates risk of bias across 4 domains—patient selection, index test, reference standard, and flow/timing—while also considering applicability concerns. Patient selection is judged as high or unclear risk when data splitting methods are poorly described or inappropriate exclusions are made. Index test bias is suggested when model construction lacks sufficient detail, reproducibility is not reported, or robustness is not assessed. Bias related to the reference standard occurs when it is vaguely defined or based on a single examiner. Flow and timing are considered low risk if all data are included in the analysis, each dataset is assessed against a reference standard, and the same standard is consistently applied; however, using multiple reference standards within a single study increases the risk of bias.⁹ The PROBAST tool assesses risk of bias and applicability in prediction model studies across 4 domains: participants, predictors, outcome, and analysis. Bias may arise from inappropriate participant selection, poorly defined or measured predictors, unclear or inconsistently assessed outcomes, or inadequate analytical methods.¹⁰ The JBI Critical Appraisal Tools evaluate risk of bias in cross-sectional and cohort studies by examining participant selection, measurement of exposures and outcomes, identification and handling of confounders, and appropriateness of statistical analysis.¹¹

Meta-Analysis

Studies with homogeneous methodology and similar tasks were chosen to be included in the meta-analyses. Due to the limited number of studies included in each meta-analysis, correlated pooling of sensitivity and specificity using diagnostic odds ratios and SROC curves was not possible. Hence, the authors decided to analyze sensitivity and specificity separately. Random-effect models using restricted maximum likelihood (REML) were used to analyze sensitivity, specificity, and accuracy in diagnosing cleft palate and predicting the need for orthognathic surgery from radiographs. Forest plots were depicted to demonstrate the effect of each study and the global pooling estimation. STATA 17.0 software (StataCorp LP, Lakeway Drive, College Station, TX, USA) was used to perform analyses. P-values less than .05 were considered significant.

Results

Study Selection

An initial search resulted in 548 articles, which were then screened to remove duplicate papers and irrelevant titles. This process yielded 64 articles for a full-text review. After screening the full texts, 12 articles were excluded because of various reasons, such as not focusing on cleft lip or palate and focusing on other orthodontic problems (n = 3), not using machine learning methods (n = 8), or not being able to retrieve the full text (n = 1). Finally, 52 studies were chosen for the systematic review (Figure 1). Six studies were included in the meta-analysis. Other studies were excluded from the meta-analysis due to methodological heterogeneity, including different image modalities and AI tasks.

Figure 1.

PRISMA Flow Diagram of the Search Strategy.

Study Characteristics

The studies summarized in Tables 1 to 3 addressed CLP across 3 primary domains: (1) diagnosis, detection, and classification; (2) prediction and genetic risk assessment; and (3) surgical outcome assessment and treatment planning (Figure 2). Most included studies relied on imaging data, particularly 2D facial photographs,^12–22 radiographs,^23–32 and computed tomography and cone beam computed tomography (CT/CBCT) images,^3,33–39 while others employed advanced modalities such as 3D intraoral scans,^40–43 stereophotogrammetry,⁴⁴ and ultrasound.^22,37,45,46 A subset of studies also utilized nonimaging data,⁴⁷ including genomic,^48–55 epigenomic,^48,50 lipidomic,⁵⁶ and questionnaire-based clinical information^4,57 (Figure 3).

Figure 2.

Categorization of the Included Studies.

Figure 3.

The Data Modalities Used in the Included Studies.

Table 1.

Study Characteristics: Diagnosis, Detection, and Classification.

Authors	Objective	Dataset (Train/Validation/Test)	Image Modality	ML Task (Classification, Object Detection, Segmentation)	Model (Algorithm Architecture)	Performance/Results (Accuracy: Acc, Sensitivity: Sens, Specificity: Spec, Area Under the Curve: AUC, Precision: Pre, Pearson's r, F-1 Measure)
Hayajneh et al¹³	To develop an AI-based method for evaluating facial asymmetry in CLP	Dataset 1: 65 CL/P 55 Normal Dataset 2: 55 non-CL/P facial deformities 55 Normal	Facial images	Classification and detection	Automatic inpainting model and Deep CNN	The proposed facial anomaly appraisal method achieves a 92% correlation with human scores and processes images in less than 1s
Hayajneh et al¹²	Development and validation of CleftGAN to generate realistic synthetic cleft lip images	514 (514/NA/514)	Frontal photographs	Image generation	-StyleGAN2, -StyleGAN3-t, -StyleGAN3-r	CleftGAN generated high-quality, diverse cleft lip images using only 514 real samples. StyleGAN3-t performed best, producing images closest to the real ones
Mahedia et al⁶¹	To evaluate the quality of ChatGPT-generated responses for patient education regarding cleft lip repair	-	-	AI-generated medical content regarding cleft repair	ChatGPT-4	ChatGPT responses achieved an overall mean rating of 2.9/4, with the highest scores for clarity and content quality (3.1) and lowest for trustworthiness (2.7) Readability analysis showed a 10th grade literacy level While the answers were accurate and free from harmful information, they lacked citations
Tageldin et al²⁵	To evaluate the accuracy of AI cephalometric landmark identification in cleft palate patients	112	Lateral cephalometric radiographs	Landmark identification and object detection	WebCephTM	The coordinates of A-point, ANS, and/or showed statistically significant differences between AI and expert methods, with a mean difference ranging between −0.86 ± 2.15 and 3.15 ± 6.07 mm
Agaronyan et al⁴⁰	To develop a geometric deep learning model for automated landmarking of maxillary arches on 3D oral scans from newborns with cleft lip and palate	50 (90/NA/10)	Digitalized standard stereolithography models	Landmark detection and localization	CNN	Acc = 94.44% with an absolute mean error of 1.676 ± 0.959 mm on a set of 100 models AI had the potential to accurately quantify the maxillary arch morphometric features
Ali et al¹⁹	To compare the performance of 4 CNNs in classifying 7 distinct prosthodontic scenarios of the maxilla, including cleft palate	960 (576/192/192)	2D intraoral occlusal images	Image classification	- VGG16 - Inception-ResNet-V2 - DenseNet-201 - Xception	Overall Acc for all prosthodontic scenarios for VGG16, Inception-ResNet-V2, DenseNet-201, and Xception, respectively: 0.92, 0.90, 0.94, and 0.95 Cleft palate cases: - VGG16: PCP = 0.98, Pre = 0.77, Recall = 0.98, F1-score = 0.86, AUC = 0.99 - Inception-ResNet-V2: PCP = 0.92, Pre = 0.79, Recall = 0.92, F1-score = 0.84, AUC = 0.98 - DenseNet-201: PCP = 1.00, Pre = 0.82, Recall = 1.00, F1-score = 0.90, AUC = 1.00 - Xception: PCP = 0.96, Pre = 0.84, Recall = 0.96, F1-score = 0.91, AUC = 0.99
Nantha et al⁴⁵	To classify CLP types (bilateral, unilateral, and palate-only) using multimodal data	29: 7 Bilateral, 7 unilateral, 11 palate-only	Ultrasound tongue imaging and synchronized speech spectrograms	Classification	- Vision Transformers - Siamese Neural Networks	- Overall accuracy: 82.76% - Palate-only: Pre = 90.00%, Recall = 82.00%, F1-score = 86.00% - Unilateral: Pre = 82.00%, Recall = 82.00%, F1-score = 82.00% - Bilateral: Pre = 75.00%, Recall = 86.00%, F1-score = 80.00%
Zhang et al³³	To develop and evaluate MaxCNet for personalized maxilla completion and cleft defect volume estimation	36 (plus 440 synthetic scans via B-spline deformation) (22/22/14)	Cone beam computed tomography	Segmentation	- MaxCNet - Symmetric Normalization - Voxel morph - U-Net - Fully Convolutional Networks - Mirror registration - Denoising Autoencoder	- MaxCNet: - Dice similarity coefficient: 0.90 ± 0.02 (restored maxilla), 0.84 ± 0.04 (cleft defect) - RVE: 0.09 ± 0.08 - Average Hausdorff distance (AHD): 0.30 ± 0.08 mm
Kamei et al²⁶	To assess skeletal maturation (CVM) and to detect CVA	CVM: 960 (816/NA/144) CVA: 309 (261/NA/48)	Lateral cephalogram	Object detection, Classification	MobileNet	CVM: - Average Acc = 74.5% - CLP patients exhibited a significantly lower mean skeletal maturity than non-CLP patients (P = .03) - Delayed growth was more prevalent in CLP patients - Growth acceleration was less common in CLP patients - CVA: - Detection Acc = 83% CVA was more prevalent in CLP patients (20.18%) compared to non-CLP patients (11.87%) Most common CVA: Fusion anomalies, particularly between C2 and C3
Shehab et al⁶⁰	Improving the readability of cleft lip and palate patient educational materials using AI and assessing interactive content utilization	Websites of craniofacial teams	-	Natural Language Processing, specifically text simplification or text rephrasing	OpenAI's ChatGPT-4	AI-enhanced readability: Significantly improved readability Achieved a median 6th grade level (specifically, 6.7) Flesch-Kincaid Reading Ease score rose to 75.6 (average) Enhanced clarity: median clarity score rose to 85.3. All improvements were statistically significant (P < .01) The AI modification helped align readability with recommended standards
Rosero et al¹⁷	To enhance Facial Landmark Detection for patients with repaired CLP	3837 (2577/634/626)	2D facial images	Facial landmark detection	MobileNetV2	The study proposed a strategy that improves facial landmark detection MobileNetV2 achieved a statistically significant 13.7% reduction in normalized mean square error, lowering error from 2.417 to 2.086
Hayajneh et al¹⁴	To measure the severity of cleft lip and to provide a method for gauging baseline facial deformity and postsurgical change	125 (61 pediatric cleft lip, 64 normal children)	Facial images	Object detection and classification	StyleGAN2-based Generative Adversarial Network	- Pixelwise Subtraction Error-based scores correlated best with human ratings: Overall Pearson's r = 0.89 (machine scores vs. 145 human raters). Cleft faces in oral/nasal region = 0.887 - Outperformed alternative metrics (Learned Perceptual Image Patch Similarity [LPIPS] and Structural Similarity Index Measure [SSIM]) - Heatmaps localized cleft anomalies effectively - Average computation time ≈ 135 s per image
Kuwada et al²⁷	To diagnose the presence of a cleft palate in panoramic radiographs	491 (344/87/60)	Panoramic radiographs	Object Detection, Classification	DetectNet VGG-16	- Model A (DetectNet): AUC: 0.95 (vs. Radiologist (Rad) 1: 0.70, Rad2: 0.63) Sens: 0.96 (29/30) (vs. Rad1: 0.73, Rad2: 0.50) Spec: 0.93 (28/30) (vs. Rad1: 0.66, Rad2: 0.76) Acc: 0.95 (57/60) (vs. Rad1: 0.70, Rad2: 0.63) - Model B (VGG-16): AUC: 0.93 (vs. Rad1: 0.70, Rad2: 0.63) Sens: 1.00 (30/30) (vs. Rad1: 0.73, Rad2: 0.50) Spec: 0.86 (26/30) (vs. Rad1: 0.66, Rad2: 0.76) Acc: 0.93 (56/60) (vs. Rad1: 0.70, Rad2: 0.63)
Kuwada et al²⁸	Detecting both UCAs and BCAs on panoramic radiographs	491 (353 UCA, 93 BCA, 210 normal)	Panoramic radiographs	Object Detection, Classification	DetectNet	- Model U (UCA and normal): Sens: 0.60 (vs. Radiologist (Rad): 0.93, Dental Resident (DR): 0.83) Pre: 0.94 (vs. Rad: 0.98, DR: 0.98) F-measure: 0.73 (vs. Rad: 0.95, DR: 0.89) Model B (BCA and normal): Sens: 0.73 (vs. Rad: 0.93, DR: 0.83) Pre: 0.84 (vs. Rad: 0.98, DR: 0.98) F-measure: 0.77 (vs. Rad: 0.95, DR: 0.89) Model C1 (UCA, BCA, and normal): Sens: 0.80 (vs. Rad: 0.93, DR: 0.83) Pre: 0.97 (vs. Rad: 0.98, DR: 0.98) F-measure: 0.87 (vs. Rad: 0.95, DR: 0.89) Model C2 (UCA, BCA, and normal): Sens: 0.88—highest among models (vs. Rad: 0.93, DR: 0.83) Pre: 0.98—highest among models, almost same as human observers (vs. Rad: 0.98, DR: 0.98) F-measure: 0.92—highest among models, almost the same as human observers (vs. Rad: 0.95, DR: 0.89)
Miranda et al³	To assess the severity of CLP on 3D surface models	190 (133/19/38)	Cone beam computed tomography	Detection, classification, and segmentation	CNN	High sensitivity observed for AI predictions in all severity classes: Acc = 0.816 AUC = 0.948 (SD 0.15) Pre = 0.823 (SD 0.95) F-1 score = 0.817 (SD 0.59)
Xu et al³⁶	To predict and locate 3D cephalometric landmarks in patients with CLP	150 (150/NA/42)	Computed tomography	Detection and classification	PointNet++ (Graph Convolutional Neural Network)	The GCN-based 3D cephalometry system appears suitable for cleft lip/palate cases Mean Distance Error (MDE):1.33 mm Standard Distance Deviation: 1.12 mm. Success Detection Rate (SDR) (at 2 mm error margin): 9 landmarks showed SDRs over 90% of the time 3 landmarks showed SDRs under 70% of the time
Alam et al²³	To investigate the variation in lip morphology and NLA among NSCLP individuals	123 (92 NSCLP, 31 noncleft controls)	Lateral cephalograms	Detection and classification	WebCeph	- Upper lip to E line and NLA significantly differed between NSCLP and NC individuals - No significant gender or side differences found - The mean error for the upper lip to E line and lower lip to E line was about 0.299 mm and 2.8◦ for NLA, which is considered acceptable
Sayadi et al¹⁸	To detect cleft lip and nasal deformities and place nasolabial markings to guide surgical design	345 (276/NA/69)	2D facial photographs	Object detection, landmark localization	High-Resolution Net	Normalized Mean Error (NME) for all 21 landmarks ranged from 0.029 to 0.055. - Best: cleft-side alare (lowest NME) - Worst: cleft-side cphi (highest NME) All values were within accepted AI benchmarks
Woodsend et al⁴¹	Identifying dental landmarks on 3D digital dental models with application to CLP outcome measurement	Total: 239 models (161 CLP)	3D intraoral scans and digitized dental models	Object Detection, Classification, and Segmentation	ALR software	- Tooth detection Acc = 79.7% (95% CI: 0.269-0.325) - Landmark placement error (mean): ALR: 0.389 mm vs. Humans: 0.376 mm (mean difference = 0.013 mm) - ALR performance is comparable to or better than human experts, with consistent reproducibility
Kuwada et al²⁹	To detect CA and classify it as with or without CP	593 (402/101/90) (383 CA, 210 normal)	Panoramic radiographs	Object Detection and Classification	DetectNet	- Model 1 (CA with and without CP): Acc = 71.7%, Recall = 71.7%, Pre = 74.5%, F-measure = 70.9% Model 2 (CA with and without CP, normal group): Acc = 82.2% (better than human observers and model 1), Recall = 82.1%, Pre = 84.0%, F-measure = 81.8% (12/30) Model 2 had overall higher accuracy than model 1 and human observers
McCullough , 2021⁶⁸	To measure facial landmarks and assign severity grades	800 (640/80/80)	Facial photographs	Detection and classification	Visual Geometry Group network (VGGFace), ResNet, MobileNet	- ResNet: Best performer—MSE 24.41; cleft width ratio correlation 0.943; nostril width ratio correlation 0.879; severity correlation 0.892 - MobileNet: MSE 36.66; cleft width ratio 0.901; nostril width ratio 0.705; severity correlation 0.860 - MobileNet50: MSE 46.72; cleft width ratio 0.888; nostril width ratio 0.601; severity correlation 0.777 - VGG: MSE 60.23; cleft width ratio 0.849; nostril width ratio 0.701; severity correlation 0.879 - VGGFace: MSE 155.97; cleft width ratio 0.855; nostril width ratio 0.569; severity correlation 0.880
Alam et al²⁴	To assess the sagittal jaw relationship of nonsyndromic CLP compared to noncleft patients	123 (29 bilateral CLP, 41 unilateral CLP, 9 unilateral cleft lip alveolus, 13 unilateral cleft lip, and 31 noncleft)	Lateral cephalograms	Object detection and classification	WebCeph software	− Significant reduction in SNA, ANB angles and Wits appraisal in CLP (especially bilateral and unilateral CLP) compared to noncleft group (P < .005) - SNB angle did not differ significantly - No significant gender differences found - High intra-rater reliability for measurements (intraclass correlation coefficient = 0.916-0.990), indicating the robustness of AI-driven landmark identification
Wang et al⁵⁹	To quantify the 3D asymmetry of the maxilla in unilateral CLP and investigate defect factors responsible for maxillary variability	60 (24/3/33)	Cone beam computed tomography	Detection, segmentation	3D U-Net	- Autosegmentation achieved Dice scores of 0.92 (maxilla) and 0.77 (defect) - Processing time: ∼1 min per CBCT (plus ∼5 min refinement), reducing total time from ∼10 h to minutes - Significant maxillary asymmetry found on the cleft side, mainly at the pyriform aperture and alveolar crest
Wu et al⁴⁴	Identifying the 3D midfacial reference plane in children with unrepaired cleft lip and enable measurement of facial symmetry	50 subjects (35 unilateral cleft lip, 10 bilateral cleft lip, 5 controls)	3D stereophotogrammetry	Detection and classification	Not mentioned	Rankings (1-5, best to worst): Manual methods received better mean rankings (Direct Placement 2.43, Manual Landmark 2.54). Among computer-based methods, the Deformation method performed best overall (2.66)
Wu, 2014⁶⁹	Assessing and ranking the severity of unrepaired CLP	40 (35 CLP, 5 normal)	3D infant face meshes	Classification	Linear regression, SVM regression, RankBoost, RankNet	Using all facial features (400): RankBoost: Best—(Pearson Correlation ≈ 0.77, Kendall τ test ≈ 0.62). Using selected features (top 5): RankNet: Best—(Pearson correlation ≈ 0.84, Kendall τ test ≈ 0.70)

AI: artificial intelligence; ALR: Automated Landmark Recognition; ANB: the anteroposterior relationship between the maxilla and mandible; BCA: bilateral cleft alveoli; CA: cleft alveolus; CBCT: cone-beam computed tomography; CleftGAN: Generative Adversarial Network; CLP: cleft lip and palate; CNN: Convolutional Neural Network; CP: cleft palate; CVA: cervical vertebral anomalies; GCN: Graph Convolutional Neural Network; MaxCNET: Maxume Estimation Network; MDE: mean distance error; MSE: mean squared error; NLA: nasolabial angle; NME: normalized mean error; NSCLP: nonsyndromic cleft lip and/or palate; PCP: percentage of correct predictions; RVE: Relative volume error; SNA: Sella-Nasion to A-point; SNB: Sella-Nasion to B-point; SNP: single nucleotide polymorphism; SVM: Support Vector Machine; U/BCLP: unilateral/bilateral CLP; UCA: unilateral cleft alveoli; VGG: Visual Geometry Group.

Table 2.

Study Characteristics: Prediction and Genetic Risk Assessment.

Authors, Year	Objective	Dataset (Train/Validation/Test)	Sample Type	ML Task (Classification, Object Detection, Segmentation)	Model (Algorithm Architecture)	Performance/Results (Accuracy: Acc, Sensitivity: Sens, Specificity: Spec, Area Under the Curve: AUC, Precision: Pre, Pearson’ r, F-1 Measure)
Jia et al⁵⁶	To identify lipid biomarkers for prenatal diagnosis of nonsyndromic (ns) CLP	106 (60/46/NA) - Discovery (training) cohort: 30 nsCLP vs. 30 controls - Validation cohort: 20 nsCLP vs. 20 controls - Additional validation cohort: 3 nsCLP vs. 3 control	Serum-based lipidomics data	Classification	- Naive Bayes - SVM-RF - Decision tree - Logistic regression - AdaBoost - k-Nearest Neighbors	- Discovery phase (untargeted lipidomics): from over 1300 lipids detected, 103 were shortlisted, and the top 35 lipid features were then tested with 7 ML models Best model: Naive Bayes; AUC: 0.95 Sens: 90% Spec: 87% - Validation phase (targeted lipidomics): targeted analysis confirmed 16 out of the 35 lipid features were reliable. Again tested all 7 classifiers Best again: Naive Bayes; AUC: 0.97 Sens: 80% Spec: 84%
He et al⁴⁶	Development and validation of CLP-Net for automatic detection of CLP in first-trimester	418 (394 normal, 24 CLP) (320/320/74 normal 24 CLP)	3D ultrasound volumes	Plane localization (3D spatial regression and similarity analysis)	CLP-Net	Acceptance rate by radiologist: - Mid Sagittal Plane: 95% - Retronasal triangle plane and maxillary axial plane: 70% - CLP-Net model significantly improved localization speed, particularly for junior radiologists, with their visual acceptance ratio increasing from 70% to 93%
Dai et al⁴⁸	To develop “DeepFace” which evaluates the functional impact of genetic variants (particularly SNPs) associated with orofacial clefts	204 ChIP-seq datasets from embryonic craniofacial tissues (14990/1805/1798)	Not applicable: genomic sequence + epigenomic signal (from ChIP-seq)	-Regression -Binary classification -Variant effect prediction	DeepFace	Prediction of epigenetic signals: Pearson correlation (r): 0.50-0.83, area under the precision-recall curve: 0.54-0.81 Best prediction: Narrow-peak histone marks Identified 6 SNPs with significant linear correlation between their predicted functional impact and developmental time course
Li et al⁵⁸	Detect ultrasound images of fetal lips and classify them as normal or abnormal	1365 (908/221/NA)	Ultrasound images	Object detection and classification	- Object detection models: Yolov3, Yolov4, Yolov5, Faster R-CNN (ResNet50 and VGG versions) - Image classification networks: VGG16, ResNet34, ResNet50, GoogleNet, and ShuffleNet	The Yolov5-ECA model demonstrated superior performance in both detection and classification of fetal lip ultrasound images compared to other popular models - Fetal Lip Detection Results: Pre: 0.944, Recall: 0.916, F1-score: 0.930 - Fetal Lip Classification Results: Acc: 0.925, Pre: 0.933, Recall: 0.912, Spec: 0.931
Kang et al⁴⁹	To predict the genetic risk of nonsyndromic CLP using	262(180/20/62)	Genomic SNP data	Classification	Main model: GANNE Compared with: - PRS - RF - SVM - XGBoost - LR - LGBM - ADA - ANN	GANNE performed best with 10 SNPs; AUC: 0.882, Acc: 74.2% F1-score: 0.756. Performance decreased slightly with 16 or 92 SNPs. Genes such as RUNX2, MTHFR, PVRL1, TBX22, and TGFB3 Significantly contributed to predicting nonsyndromic CL/P risk
Xiao et al⁵⁰	To integrate epigenomic datasets from human oral epithelial cells to identify functional nonsyndromic CLP variants	Not applicable	- Chinese Han Genome-wide association studies data - Epigenomic dataset (generated from human oral epithelial cell line (HIOEC) cells): RNA-seq, ATAC-seq, H3K27ac ChIP-seq, DLO Hi-C	Classification	Gapped k-mer SVM (gkm-SVM)	- 254 Potential functional risk SNPs were identified - gkm-SVM classifier highlighted rs174570 as a risk SNP (rs560789 neutral by ML but validated functionally) - Functional validation: rs560789 and rs174570 risk alleles reduced epithelial enhancer activity, impaired SOX2 binding, and affected target gene FADS1 (reduced oral epithelial cell migration and proliferation)
Machado et al⁵¹	To predict the genetic risk of nonsyndromic cleft lip with or without cleft palate in the Brazilian population	1588 (1111/NA/476) (722 patients, 866 healthy controls)	Genotyping of 72 SNPs from individuals	Classification	RF and NN	- RF and NN revealed high-score importance for 13 SNPs with Acc = 99% and 94%, respectively. - Selected genes were mostly involved in tissue and epithelium development, neural tube closure, and metabolism of methionine, folate, and homocysteine
Wang et al³⁷	Automatic recognition and classification of fetal facial ultrasound standard planes to detect CLP	1293 (five-fold cross-validation; one dataset is used as the test set, and the remaining four parts are the training set)	Fetal facial ultrasound images	Detection and classification	Texture feature fusion method (LH-SVM)	Acc = 94.67% Pre = 94.27% Sens = 93.88% F1-score = 94.08%
Qadeer et al⁴	To predict nonsyndromic CLP in early pregnancy	1000	Questionnaire	Classification	- MLP - K-Nearest Neighbors (KNN) - Decision Tree Classifier - SVM - Random Forest	MLP: Acc = 98.3%, AUC = 98% Decision Tree Classifier: Acc = 85.22% RF: Acc = 76.4% KNN: Acc = 96.44% SVM: Acc = 97.52%
Shafi et al⁵⁷	To predict the occurrence of CLP before birth	1000	Questionnaire	Classification	- MLP - KNN - Decision Tree Classifier - SVM - RF	MLP: Acc = 92.6%, AUC: 0.98, Recall, Pre, and F-measure = 0.89 Decision Tree Classifier: Acc = 88.14%, Recall, Pre, and F-measure = 0.88 RF: Acc = 85.77%, Recall, Pre, and F-measure = 0.86 KNN: Acc = 89.72%, Recall, Pre, and F-measure = 0.9 SVM: Acc = = 90.69%, Recall, Pre, and F-measure = 0.92
Liu et al⁵³	To investigate gene-gene interactions among cell adhesion genes contributing to the risk of nonsyndromic cleft lip with or without cleft palate	806 Chinese case-parent trios	Genome dataset from a GWAS	Gene pathway analysis	Logic regression	Two-way interaction (Conditional Logistic Regression): Identified a statistically significant ACTN1 × CTNNB1 interaction Specific SNP pair: rs17252114 (CTNNB1) and rs1274944 (ACTN1). Achieved a P-value of .0002 Showed a negative/antagonistic interaction: The risk ratio (RR) of rs1274944 decreased from 2.04 to 1.42 when individuals also carried one risk allele at rs17252114. This suggests the presence of one risk allele can alleviate the effect of the other Multiway interactions (Logic Regression): Identified higher order interactions involving ACTN1, CTNNB1, and CDH1 LR results supported the ACTN1 × CTNNB1 interaction found by logistic regression. These SNPs (rs17252114 and rs1274944) were present in the best 3-, 4-, and 5-SNP logic expressions LR can delineate complex Boolean combinations of SNPs associated with altered disease risk (eg, lower risk if carrying specific minor alleles at rs409228(CTNNB1) AND rs1274944(ACTN1) AND NOT rs4783676(CDH1), OR specific minor allele at rs10490822(CTNNB1))
Machado et al⁵²	To investigate genome-wide loci for nonsyndromic cleft lip with or without cleft palate	1697 (831 CLP and 866 healthy)	DNA obtained from oral mucosa	Gene pathway analysis	Logistic regression	The study identified rs7552 in 2p24.2 as a significant susceptibility marker for nonsyndromic cleft lip with or without palate (NSCLP), with the AA genotype showing increased risk (OR = 1.71, 95% CI: 1.31–2.24)
Zhang et al⁵⁴	To evaluate the prediction performance of models to assess genetic risk for NSCLP	587 infants	Blood samples and GWAScatalog database from Han and Uyghur Chinese populations	Classification	SVM, LR, NB, RF, KNN, DT, and ANN	AUC: - Han: SVM = = 0.89, LR = 0.90, NB = 0.87, RF = 0.89, KNN = 0.75, DT = 0.74, ANN = 0.85. - Uyghur: SVM = 0.64, LR = 0.62, NB = 0.60, RF = 0.54, KNN = 0.57, DT = 0.54, ANN = 0.51
Li et al⁵⁵	To investigate gene-gene interactions among WNT family contributing to the risk of CLP	895 trios of Asian ancestry, 681 trios of European ancestry	Case-parent trios from GWAS	Gene pathway analysis	RF Trio Logic Regression	Gene-gene interaction between markers in WNT5B and MAFB (Asian trios: empiric p-values =0.0076, European trios =0.018). Epistatic interaction between markers in WNT5A, IRF6, and C1orf107 was found in Asian trios, and markers in the 8q24 region and WNT5B in European trios

ADA: Adaptive Boosting; ANB: the anteroposterior relationship between the maxilla and mandible; ANN: Artificial Neural Network; CNN: Convolutional Neural Network; CleftGAN: Generative Adversarial Network; CBCT: cone-beam computed tomography; GCN: Graph Convolutional Neural Network; LGBM: Light Gradient Boosting Machine; LR: Logistic Regression; MaxCNET: Maxume Estimation Network; MDE: mean distance error; MLP: multilayered perceptron; MSE: mean squared error, NME: Normalized Mean Error; PRS: Polygenic Risk Score; RF: Random Forest; SNA: Sella-Nasion to A-point; SNB: Sella-Nasion to B-point; SNP: single nucleotide polymorphism; SVM: Support Vector Machine; U/BCLP: unilateral/bilateral CLP; VGG: Visual Geometry Group; XGBoost: Extreme Gradient Boosting.

Table 3.

Study Characteristics: Surgical Planning or Postsurgical Assessment.

Authors	Objective	Dataset (Train/Validation/Test)	Image Modality	ML Task (Classification, Object Detection, Segmentation)	Model (Algorithm Architecture)	Performance/Results (Accuracy: Acc, Sensitivity: Sens, Specificity: Spec, Area Under the Curve: AUC, Precision: Pre, Pearson’ r, F-1 Measure)
Rosero et al¹⁶	Postsurgical lip symmetry assessment in patients with repaired cleft lip	CLP transformation experiment: 146 (2298/578/510) Temporal misalignment experiment: 146 (128,000/29,000/31,000)	2D images	Classification	Siamese Convolutional Neural Network	CLP transformation: Acc = 75.34, Pre = 63.53, Pearson’ r = 0.31 Temporal misalignment: Acc = 69.41, Pre = 70.04, Pearson’ r = 0.27
Jeon et al⁴⁷	To predict the timing of palatoplasty in infants with CLP based on longitudinal physical growth data (age, height, weight) and clinical parameters	111 (84/NA/21)	Not applicable: clinical and anthropometric tabular data	Regression	Tree-based ML models (Random Forest, CatBoost, Light Gradient Boosting Machine, and Extreme Gradient Boosting	Best model: CatBoost Root mean square error: 1.59 months; mean actual palatoplasty age: 12.8 ± 1.8 months Mean predicted palatoplasty age: 12.8 ± 1.0 months
Santos et al⁴²	- To evaluate the performance of smartphone scanning applications in acquiring 3D meshes of cleft palate models - To validate an ML tool for computing automated presurgical plate	15 (5 unilateral CLP, 5 bilateral CLP, and 5 isolated CP)	3D scans of palate models using smartphone scanning applications (KIRI Engine and Scaniverse) and an intraoral scanner (Medit i500) as a control	Detection, image generation	DiffusionNet	- ML tool capable of automating presurgical plate generation for UCLP & BCLP - However, required manual landmarking for ICP cases - Best plate accuracy (KIRI scans, no mirror used): 0.18 ± 0.05 mm (compared to control: 0.16 ± 0.08 mm). - Demonstrated high morphology recognition - Promising for clinical translation in cleft care - symmet
Lingens et al²⁰	Predicting 2D landmarks for 3D intraoral reconstruction in CLP	Real-world data: 13.014 images Synthetic data: 100 images	2D images	Landmark detection, model reconstruction	CNN	Average reconstruction error: 0.55-0.73 mm Fitted convergence: 0.340 mm (theoretical optimum = 0.550 mm) Root mean squared error: 0.022 (synthetic test), 0.138 (real-data test) Acceptable precision: < 0.5 mm
Lim et al³²	To assess the accuracy of ML-assisted prediction of the need for orthognathic surgery	245 (62 surgery, 183 nonsurgery) (196/196/49)	Lateral cephalograms	Landmark detection, classification	SVM and Feature Importance Analysis (FIA)	- SVM: AUC = 0.84, Acc = 83.7%, Sens = 83.3%, Spec = 83.8%. - FIA revealed 10 predictors: A to N-perpendicular, L1 to A-Pog, Pog to N-perpendicular, L1 to Lower-occlusal plane, Cleft type, U1 to Upper-occlusal plane, IMPA, gonial angle, anteroposterior facial height ratio, and ANB with an accumulated importance of 64.51%
Fujii et al³⁵	Assess alveolar bone grafts after secondary alveolar bone grafting	7	CT scans at 3 time points: ∼1 month before surgery 1 day after surgery 6 months after surgery	Segmentation	Synapse Vincent software (Fuji Photo Film Co., Ltd., Japan)	- Immediately after surgery: Bone Volume (ICC): 0.95, Density (HU): 0.99 - 6 months post-op: ICC: 0.81, HU: 0.57 - AI-based method showed excellent reliability for measuring bone graft volume and density at day 1, but 6 months post-op, consistency decreased due to greater variability and measurement challenges over time
Rosero et al¹⁵	Postsurgical lip abnormalities detection in patients with repaired CLP	Adult faces: 1235 Children's faces: 456	2D frontal facial photographs	Object detection and classification	Siamese CNN MobileNetV2	Siamese LT-v2: Acc = 0.89 ± 0.03 Baseline CNN (MobileNetV2): Acc = 0.60 ± 0.06 This represents a 29% absolute increase in accuracy for the Siamese LT-v2 model over the baseline model on real patient data
Schnabel et al⁴³	To develop a data-driven plate computation of presurgical orthopedic CLP treatment	397 (317/NA/80) (283 UCLP and 114 BCLP)	3D intraoral scans	Landmarking and segmentation	DiffusionNet	Average landmark prediction distance: UCLP: 1.69 ± 1.85 mm. BCLP: 1.70 ± 1.28 mm
Zhang et al³⁴	To assess how alveolar bone grafting affects maxillary growth in UCLP	64	Spiral CT	Segmentation	3D U-net	- Dice similarity coefficient: 88% for maxilla segmentation and 83% for cleft segmentation - Autosegmentation of one sample took only several minutes - The study found that alveolar bone grafting in UCLP patients leads to asymmetric maxillary growth, with length increases on the cleft side and width increases on the non-cleft side
Chen et al²¹	To guide cleft lip surgery by visually predicting a repaired lip and nose	Train: CelebA dataset; over 160,000 Test: clinical patient datasets; 10 (CleftLip10) and 24 (CleftLip24)	Frontal face images	Image Inpainting	Single-stage end-to-end multi-task image inpainting framework	- Higher “valid possibility” (success rate) in repairing cleft lip images: - CleftLip10: 0.500 for this method vs. 0.233 for others - CleftLip24: 0.333 for this method vs. 0.222–0.319 for others - Surgeon assessment: three cleft lip surgeons ranked the generated results highest for naturalness, with average rankings 1.267 (CleftLip10) and 1.208 (CleftLip24), better than all other models - Performed best in complex situations, such as severe cleft lips and large medical equipment
Zhang et al³⁸	Automatic estimation of the bony alveolar cleft volume of CLP	21 (11/NA/10)	Cone beam computed tomography	Object detection and segmentation	3D U-Net	Cleft Volume Estimation Acc (eV - Relative Error): this method significantly outperformed others in accuracy Proposed method: 8%, SyN: 14%, VM: 15%, B-Spline: 21% Dice similarity coefficient for the incomplete maxilla segmentation: 0.88 ± 0.03 DSC for the cleft defect segmentation: 0.83 ± 0.05
Seo, 2021³⁹	To investigate 3D facial soft tissue changes after bimaxillary orthognathic surgery	34 Korean young adult patients	Cone beam computed tomography	Landmark detection	ON3D program	Baseline differences (cleft vs. noncleft, pre-/postsurgery): Cleft patients had more posterior hard/soft tissue landmarks, wider alar/philtrum dimensions, more obtuse nasal tip angle, and inferior columella position Soft-to-hard tissue movement ratios (post-bimaxillary surgery): Cleft showed higher ratios at the nose/upper lip (greater soft tissue change from maxillary advancement, likely due to scar tissue) Distinctive soft tissue responses in Cleft: Nose: wider alar/base, shorter columella, more obtuse nasal angle Upper lip: upper part advanced, lower part stretched downward (limited forward gain); increased philtrum and upper lip height Bilateral landmarks: downward subalare and crista philtrum, backward/downward cheilion
Lin et al³⁰	To predict the need for orthognathic surgery in patients with repaired unilateral CLP	56	Lateral cephalograms	Classification	Boruta method The XGBoost Algorithm	Acc = 87.4% Sens = 97.83% Spec = 90.00% F1-score = 0.714
Lim et al³¹	To identify prognostic factors and develop predictive models for the need for OGS CLP children	126 patients	Lateral cephalograms obtained at 7 (T1) and 10 (T2), and 15 years of age (T3)	Classification	Multivariable logistic regression	- OGS prediction at T1 & T2: AUC = 0.91 (95%CI (0.85-0.96)), Sens = 0.78, Spec = 0.87 - Skeletal discrepancy (ANB < 0°) at T1 and T2, respectively: AUC = 0.95 (95%CI (0.97-1.00)) and 0.99 (95%CI (0.85-0.96)), Sens = 0.96 and 0.78, Spec = 0.92 and 0.87 - Overjet < 0 mm at T1: AUC = 0.91 (95%CI (0.85-0.96)), Sens = 0.78, Spec = 0.87 - Significant predictive factors: number of clefts, number of missing teeth, duration of maxillary expansion, maxillary length, mandibular prominence angle (SNB)
Li et al²²	To develop a robotic surgery assistant for the localization of surgical markers and incisions on 2D facial images	2568 facial images (1568, 500, 500)	2D facial images	Marker localization	CLPNet	- CLPNet and its variants (CLPNet-Baseline, CLPNet-Light) significantly outperformed traditional methods and a general deep learning method (VGG) in surgical marker localization - Lowest mean distance error: CLPNet achieved the best MDE of 6.91, compared to VGG and CLPNet-Baseline - Lowest FR: CLPNet showed the lowest FR of 19.8%, demonstrating its robustness, significantly better than VGG and CLPNet-Baseline

ANB: The anteroposterior relationship between the maxilla and mandible; CBCT: Cone Beam Computed Tomography; CleftGAN: Generative Adversarial Network; CNN: Convolutional Neural Network; FR: Failure Rate; GCN: Graph Convolutional Neural Network; HU: Hounsfield Units; ICC: Intraclass Correlation Coefficients; OGS: orthognathic surgery; MaxCNET: Maxume Estimation Network; MDE: mean distance error; MSE: mean squared error; NME: normalized mean error; SNA: Sella-Nasion to A-point; SNB: Sella-Nasion to B-point; SNP: single nucleotide polymorphism; SVM: Support Vector Machine; U/BCLP: unilateral/bilateral CLP; VGG: Visual Geometry Group.

The included studies encompassed 4 broad categories of AI tasks. First, computer vision tasks were the most prevalent, with classification (n = 29),^{3,4,13–16,19,23,24,26–32,36,37,41,44,45,48–51,54,56–58} object detection (n = 25),^{3,13–15,17,18,20,23–29,32,36–42,44,58,59} segmentation (n = 8),^{3,33–35,38,41,43,59} including 2 studies that combined all three major vision tasks.^3,41 Second, generative and reconstructive approaches were investigated in a smaller number of studies (n = 5), including image generation, inpainting, and model reconstruction.^{12,13,20,21,42} Third, Natural Language Processing (NLP) was applied in 2 studies for medical text simplification and content generation.^60,61 Finally, bioinformatics-driven tasks appeared in 6 studies, involving regression-based prediction (n = 4)^32,46–48 and gene pathway analysis (n = 3).^52,53,55 This distribution demonstrates that while computer vision remains the cornerstone of AI research in craniofacial disorders, emerging applications in generative modeling, NLP, and genomics signal a broadening scope of innovation.

Regarding the number of cases, 13 studies analyzed very large datasets with 1000 samples or more with the largest exceeding 13,000 clinical images.^{4,15,17,20,22,37,51,52,57,58} Twenty-seven studies reported large datasets between 101 and 999 cases.^{3,12–14,18,19,23–29,31,32,36,41,43,46–49,53–56} Smaller datasets with 100 cases or fewer were used in approximately 12 studies,^{21,30,33–35,38–40,42,44,45,59} typically in exploratory or pilot designs. Two studies did not specify a sample size and instead used qualitative sources.^60,61 Overall, the included studies exhibited striking variability in sample size, spanning from as few as 7 subjects to more than 13,000 cases, underscoring the heterogeneity in data availability across AI research in craniofacial disorders.

Deep learning approaches were the most frequently applied (n = 21), with CNN-based models being the predominant choice,^{3,13,15–18,20–22,26,33,36,38,40,46,48,54,59} most often using ResNet⁵⁸ and VGG derivatives.^19,27,58 GANs (n = 3),^12,14,49 transformers, and other advanced architectures (n = 6) were less common but represented recent trends. Classical ML methods (n = 31), including SVMs,^{4,32,37,50,54,57} random forests,^{4,47,51,54,55,57} logistic regression,^31,52–55 and boosting algorithms, were mainly applied to genetic and tabular datasets.^49,56 A smaller number of studies incorporated large language models or commercial AI software (n = 9).

The majority of studies included in this systematic review were undertaken in the United States^{3,12–18,40,44,48,55,60,61} and China^{22,33,34,36–38,46,50,53,54,56,58,59} with additional investigations originating from Japan,^{27–29,31,35} Korea,^{30,32,39,47,49} the United Kingdom,^21,41 and the rest in Switzerland, Egypt, Saudi Arabia, Brazil, Pakistan, India, and Thailand.

Risk of Bias and Applicability

We assessed the included studies using QUADAS-2, PROBAST, and JBI. The results of the risk of bias assessment are presented in Supplemental Figures S3 to S6. Uncertain risk was caused by inconsistent patient enrollment procedures, missing patient data from exclusions, ambiguous index tests, and ambiguous information on reference standards. High risk resulted from failing to guarantee that every patient received a reference standard and from not all patients obtaining the same reference standard. A high-risk applicability was caused by the included patient groups’ misalignment with the particular settings of our review question, as well as variations in the techniques and interpretations of indicator tests, especially with regard to imaging modalities.

QUADAS2

Nine studies had a low risk of bias in all 4 domains (Supplemental Figure S3). Low risk of bias was observed for index tests (n = 17) and reference standards (n = 30). Regarding the patient selection domain, 20 were assessed as low risk of bias, 8 as unclear, and 2 as high risk. Also, the flow and timing domain (n = 19) was mostly graded as low, while 5 articles were graded as unclear and 6 as high risk of bias. Regarding applicability domains, the majority of studies (n = 28) had low risk across all domains, except for 2. One of them had a high risk in both the index test and the reference standard domains. Another was at high risk in the patient selection and reference standard domains.

PROBAST

Three studies were rated as low risk in all domains (Supplemental Figure S4). Three studies showed an unclear status in the analysis domain, while 3 others were high risk. Regarding the predictor's domain, only 2 studies were graded as high risk, which were also unclear in the participants and data sources.

JBI

All articles had a low risk of bias (Supplemental Figures S5 and S6).

Risk of bias assessment was not applied to certain studies, such as those evaluating AI-based patient education tools,^60,61 image reconstruction/inpainting,^12,13 and genomic studies identifying variants.^50,52,53 These types of studies do not involve predictive modeling, diagnostic accuracy testing, or clinical interventions; therefore, established RoB tools are not directly applicable. Instead, their findings were integrated narratively into the discussion, emphasizing their contributions, strengths, and limitations in context.

Overall, the evidence base showed moderate confidence, with most studies having low to moderate risk of bias but important limitations in generalizability. Common concerns included unclear patient selection and data source, incomplete reference standards, and inconsistencies in flow and timing.

Qualitative Synthesis

Upon assessing the full text of the included studies, the overall accuracy of AI models for classifying and detecting CL/P and its subtypes ranged from 74.5% to 95%, while segmentation and localization models achieved Dice coefficients of 0.84 to 0.92. Correlation with expert assessments was notably strong (Pearson's r range = 0.89-0.94). Across modalities, AUC values typically ranged from 0.88 to 0.97 for classification tasks; accuracies of greater than 90% were common in both genetic and imaging studies; and F1-scores of 0.75 to 0.94 indicated strong model balance (Tables 1-3).

Meta-Analysis

Meta-analyses were performed as exploratory quantitative summaries of narrowly defined subsets of studies. Supplemental Figure S1 illustrates the forest plots of the pooled accuracy (Supplemental Figure S1-A), sensitivity (Supplemental Figure S1-B), and specificity (Supplemental Figure S1-C) for studies reporting the diagnostic ability of AI to detect unilateral and bilateral CL/P on panoramic radiographs. Using random-effects models, the results showed a pooled accuracy of 90% (95% CI: 0.82-0.97), a pooled sensitivity of 87% (95% CI: 0.75-0.99), and a pooled specificity of 89% (95% CI: 0.77-1.01).

Supplemental Figure S2 illustrates the forest plots of the pooled sensitivity (Supplemental Figure S2-A) and specificity (Supplemental Figure S2-B) for studies predicting the need for orthognathic surgery from lateral cephalograms of patients with CL/P. The results indicated an overall sensitivity of 87% (95% CI: 0.75-0.98), and an overall specificity of 86% (95% CI: 0.83-0.89).

Discussion

This systematic review demonstrated that AI is increasingly being applied in the management of cleft lip and palate, with current applications focused on three main areas: CL/P diagnosis, detection, and classification, CL/P prediction and genetic risk assessment, and CL/P surgical outcome assessment and treatment planning. The findings show that AI works best as a second reader by prioritizing suspicious cases, standardizing radiographic landmarking, reducing operator variability, and facilitating early detection of patients needing orthognathic surgery, while final diagnosis and treatment planning remain clinician-led. The pooled accuracy for both meta-analyses should be interpreted as decision-support performance rather than autonomous decision making.

CL/P Diagnosis, Detection, and Classification

Alam et al²³ demonstrated that AI-assisted (WebCeph software) cephalometric analysis offers accurate evaluation of nasolabial angles and upper lip projection in nonsyndromic patients with CL/P, providing a reproducible and time-efficient alternative to traditional manual measurements. In a subsequent investigation, Alam et al applied AI techniques to assess sagittal skeletal parameters, uncovering significant reductions in Sella-Nasion to A-point, ANB (the anteroposterior relationship between the maxilla and mandible), and Wits indices among cleft subjects, while SNB remained largely unaffected, underscoring the precision of AI-based craniofacial quantification.²⁴ Kamei et al²⁶ expanded these findings by implementing CNN-based MobileNet models, which achieved high accuracy in cervical vertebral maturation staging and vertebral anomaly detection, simultaneously revealing delayed skeletal maturation in unilateral cleft patients. Conversely, Tageldin et al²⁵ reported that fully automated AI landmark identification can produce notable deviations in critical points, particularly subnasale, resulting in meaningful reductions in nasolabial angle measurements, highlighting the necessity for expert validation. Collectively, these studies emphasize that AI markedly improves the speed, consistency, and objectivity of cephalometric and craniovertebral analyses. Comparative evaluation indicates that Kamei et al's CNN models demonstrate superior performance in skeletal assessment, whereas Alam et al's AI platform is more effective for soft tissue and sagittal measurements.^23,24,26 Deep neural networks have also been used to automatically identify 21 essential nasolabial landmarks that are required for surgical planning, achieving accuracies close to expert manual annotations.¹⁸ Recently, a 3D graph convolutional network for cephalometric landmarking reported the mean localization error was 1.3 mm, with its performance highly relying on image quality and severity of deformity.³⁶ Taken together, the evidence supports the integration of AI as a transformative adjunct in orthodontic diagnostics, offering enhanced quantitative precision.

Regarding CLP classification, across the 3 Kuwada et al studies, they reached a high accuracy for DetectNet in the diagnosis of CLP (AUC = 0.95) and classification of unilateral and bilateral clefts (almost the same as human raters) based on panoramic radiographs.^27–29 Exploratory meta-analysis of DetectNet suggested a high pooled diagnostic accuracy of 0.90, though notable heterogeneity (I² = 96.27%), indicates that these estimates are not directly generalizable and reflect variability in datasets. This underscores the need for standardized protocols, larger multicenter datasets, and external validation before these models can be reliably implemented in clinical workflows.^27–29 Concurrently, the ViT and Siamese Network integration was used by Nantha et al to classify CLP types, utilizing multimodal ultrasound and speech data, and achieved an overall accuracy of 82.76%, demonstrating particular effectiveness for bilateral CLP cases.⁴⁵

Moreover, Zhang et al utilized MaxCNet for cascaded registration on CBCTs for personalized maxilla completion and cleft defect volume estimation. It achieved a dice similarity coefficient of 0.84 ± 0.04 on estimated cleft defects and a relative volume error of 0.09 ± 0.08.³³

Studies have shown that deep learning models have also been useful for detecting facial asymmetries accurately. Wang et al. employed deep learning on CBCT scans to quantify asymmetry in unilateral CP, and found significant hypoplasia on the cleft side.⁵⁹ Similarly, Hayajneh et al introduced a rapid ML framework for detecting and rating facial anomalies on 2D images, achieving strong agreement (92%) with human evaluations.¹³ Wu et al tested midfacial reference plane methods on 3D meshes, showing that the automated deformation method matched the best manual approaches.⁴⁴

AI is also applied to CLP severity measurement, ranging from objective surgical outcome assessment to automated severity scoring. Miranda et al demonstrated the utility of deep learning for 3D clinical image evaluation of cleft-related deformities, reaching an accuracy of 0.81.³ At the same time, Hayajneh et al introduced a StyleGAN2-based unsupervised anomaly detection model that correlated strongly with human ratings (Pearson's r = 0.89), offering a reliable and scalable tool for real-time clinical measurement.¹⁴

Additionally, 2 studies evaluated the application of AI and large language models for patient education and communication about CLP. A study claims that ChatGPT-4 improves the readability and clarity of patient educational materials to 6th-grade reading levels.⁶⁰ Also, according to Mahedia et al's study, ChatGPT can assist patients with CLP in learning by providing information that is generally accurate and simple to comprehend, by having a high readability score (grade 10.87) and no verified references. Due to its limitations, AI should not be utilized in place of therapeutic conversation, but rather in addition to it.⁶¹

Collectively, these AI models demonstrate significant and possibly competitive potential to expert judgment in CLP diagnosis and analysis, although their widespread use will depend on larger, more diverse databases and improved model interpretability for ensuring the clinical reliability.^{17,18,27,29,36} Models such as DetectNet, MobileNet, and U-Net were the most effective platforms and have become benchmark tools for radiographic and CBCT-based tasks, while GANs and Transformers show potential for esthetic evaluation and multimodal fusion.

CL/P Prediction and Genetic Risk Assessment

AI is carefully establishing itself in prenatal diagnosis of CL/P, but still seems to be learning the ropes. For example, a model that worked only on 2D images of fetal lips reached approximately 92% accuracy in distinguishing normal from abnormal lips, despite the variable and noisy characteristics of ultrasound data.⁵⁸ He et al developed CLP-Net, a reinforcement-learning framework that autonomously navigates 3D first-trimester ultrasound volumes to locate the mid-sagittal, retronasal-triangle, and maxillary-axial planes used for CLP screening. By integrating spatial-anatomical feedback during training, the model mimicked expert decision-making and produced planes consistently judged acceptable by clinicians.⁴⁶ In contrast, Wang et al employed a more conventional pipeline, extracting texture-based descriptors (Local Binary Patterns and Histograms of Oriented Gradients) from mid-gestation 2D fetal facial ultrasound images and classifying them via a support vector machine. Their approach demonstrated that carefully engineered features can still perform reliably even with limited data. Collectively, these studies illustrate the methodological transition from handcrafted texture analysis to autonomous, anatomy-aware learning systems that promise greater standardization and reproducibility in prenatal imaging.³⁷ That said, all of these approaches share familiar struggles: too few abnormal cases to learn from, and differences in ultrasound quality from one clinic to another. So yes, AI is starting to make cleft prediction faster and more consistent, but for now, it's still a tool to support clinicians, not to replace them.^37,46

Shafi et al and Qadeer et al used ML to predict nonsyndromic CLP risk in embryos based on maternal and environmental questionnaire data from Pakistan. Shafi et al reached 92.6% accuracy with a Multilayer Perceptron model, while Qadeer et al. improved performance to 98.3% by applying SMOTE (Synthetic Minority Oversampling Technique) to expand the dataset. Both studies emphasized that accurate prediction models could guide preventive measures (including regularly visiting doctors and avoiding nonprescribed drugs) for expecting mothers and healthcare providers.^4,57

Three studies explored the application of ML methods for SNP-based genetic risk prediction of nonsyndromic CLP; however, they revealed controversies regarding overfitting, population specificity, and clinical applicability, with each stating the need for further cross-ethnic validation before translation into genetic counseling or personalized prevention.^49,51,54

Zhang et al evaluated 43 previously reported SNPs from DNA extracted from blood samples in Han and Uyghur Chinese infants, finding that Logistic Regression outperformed other ML methods, while underscoring the influence of folic acid and vitamin A genes such as MTHFR and RBP4. The predictive power of this study varied by ethnicity, raising concerns about its generalizability across populations.⁵⁴ Machado et al examined 72 SNPs in a Brazilian cohort and reported a high predictive accuracy (up to 99% with random forest), identifying a 13-SNP panel as most informative; however, they noted that ancestry-specific effects limited extrapolation to other populations and called for validation before clinical use.⁵¹ Kang et al, working with a Korean cohort, introduced a novel genetic-algorithm-optimized neural networks ensemble (GANNE), which achieved a high accuracy (AUC 88.2% with a 10-SNP model). This article emphasized the need for larger datasets and external validation to ensure clinical utility.⁴⁹

Three studies assessed gene-gene interaction analysis combined with ML techniques. Li et al examined gene-gene (GxG) interaction among WNT pathway genes and Genome-Wide Association Studies (GWAS)-identified regions (MAFB, IRF6) in Asian and European case-parent trios. They reported robust evidence of G × G interaction between markers in WNT5B and MAFB in both ancestral groups.⁵⁵ Liu et al focused on G × G interactions within the cell adhesion gene pathway (ACTN1, CTNNB1, and CDH1) in Chinese case-parent trios. Logistic Regression confirmed a significant 2-way ACTN1 × CTNNB1 interaction, also evidence for higher-order interactions involving CDH1.⁵³ Machado et al investigated 5 GWAS-reported loci in a large Brazilian case-control population, identifying rs7552 (in FAM49A on 2p24.2) as a risk marker and confirming several highly significant SNP-SNP interactions involving rs7552.⁵² The main controversy in GxG interaction studies comes from distinguishing true interactions from false positives due to Linkage Disequilibrium, and in validating results across different populations.

In admixed groups, replication is challenging where risk alleles may be population-specific, so all detected interactions need independent confirmation and functional validation before being considered biologically meaningful in CLP. Collectively, Multilayer Perceptron, GANNE, Random Forest, Naive Bayes, and YOLOv5-based models showed the best performance for genetic risk prediction, questionnaire-based screening, and prenatal ultrasound detection.

CL/P Surgical Outcome Assessment and Treatment Planning

Rosero et al developed a deep learning Siamese CNN model to automatically evaluate how well lip symmetry was restored after CL surgery. Trained on simulated images of healthy faces, the model could assess surgical success without using anatomical landmarks, reaching approximately 75% accuracy and showing a 0.31 correlation with surgeons’ esthetic evaluations.¹⁶ In related work, Rosero et al combined a Siamese CNN and a Transformer network (Sⁱamese LT-v2) to detect residual lip irregularities following surgery. By comparing each patient's real lip image with a “normalized” version generated by the model, they achieved 89% accuracy in identifying abnormal outcomes, which was 29% improved compared to the baseline network (MobileNetV2).¹⁵ Taken together, Rosero et al focused on measuring how symmetrical the lips are right after surgery, looked at how natural the lips appear later on, and whether further corrective surgery might be needed.^15,16

Four studies explored how AI can be applied to evaluate surgical bone graft outcomes in patients with CL/P, each using a different approach. Seo et al. analyzed soft-tissue changes after orthognathic surgery and found that the ratio of soft-to-hard-tissue movement was higher around the nose and lips in patients with cleft, suggesting a distinct pattern of soft-tissue response compared with noncleft individuals.³⁹ Fujii et al focused on automatic bone segmentation and evaluation of secondary alveolar bone grafting (SABG) outcomes. They reported that AI-based analysis significantly improved the speed and accuracy of bone volume and density measurements compared with manual methods, achieving excellent interobserver reliability within a much shorter processing time.³⁵ In a similar direction, Zhang et al applied deep learning to automatically reconstruct the alveolar cleft and measure maxillary growth before and 1 year after bone grafting. Their results showed a significant reduction in the dimensional difference between the cleft and noncleft sides, along with improved maxillary growth on the cleft side after SABG.³⁴ In another study, they refined a 3D U-Net model combined with nonrigid CT registration and confirmed that AI can accurately quantify jaw growth and postsurgical changes with high efficiency and minimal error.³⁸ Taken together, Fujii and Zhang et al focused mainly on bone outcomes, showing that AI provides an effective tool for the quantitative evaluation of grafted bone and maxillary development, while Seo et al emphasized esthetic and soft-tissue outcomes.^34,35,38,39

Studies have also investigated the use of AI for predicting the need for orthognathic surgery. Lim et al developed a deep learning CNN model to automatically locate important cephalometric landmarks in patients with CLP. Trained on real patient images, the model showed high precision, identifying most landmarks within 2 mm of expert measurements in 90% of cases, making cephalometric analysis and diagnosing the need for orthognathic surgery faster and more consistent.³¹ In comparison, Lin et al used Boruta and XGBoost algorithms to predict which patients would later require orthognathic surgery. By analyzing longitudinal cephalometric data, their model achieved 87.4% accuracy using only 4 key measurements (ANB, PP-FH, Combination Factor, and Facial Convexity Angle). Together, they show how AI can enhance both the assessment and prediction of the need for orthognathic surgery in patients with CLP, with pooled meta-analytic sensitivity and specificity of 87% and 86%.^30,32

Regarding the application of advanced computational tools on the workflow of surgical treatments, Li et al and Chen et al focused on preoperative surgical planning from patient 2D images, offering a guidance image for CL surgery; achieving demonstrable superiority in their tasks compared to state-of-the-art methods or baselines.^21,22 In contrast, Lingens et al aimed to enhance treatment planning by demonstrating rapid 3D intraoral reconstruction from a single smartphone image. Using real and synthetic datasets, they trained a CNN to predict 2D landmarks and generate 3D models. The method performed well on synthetic data but was less accurate on real images due to a domain gap, suggesting that larger real datasets could greatly improve performance and support low-cost, noninvasive clinical applications.²⁰

Santos et al and Schnabel et al tackled the 3D digital workflow for orthopedic plates. Santos et al explored the use of smartphone scanning apps to create 3D models of newborn palates for fabricating presurgical orthopedic (PSO) plates. The results indicated that the KIRI Engine app, particularly for unilateral CLP models, achieved accuracy comparable to professional intraoral scanners.⁴² Schnabel et al presented a fully automated, data-driven pipeline using a DiffusionNet to design PSO plates from 3D intraoral scans. The results were highly positive, with the pipeline producing accurately fitting plates in under 3 min, and were qualitatively and quantitatively similar to those based on manual expert annotations.⁴³ Overall, Siamese CNN, DiffusionNet, 3D U-Net, and XGBoost were the most effective platforms for surgical planning and prediction.

Augmented reality (AR) is also a promising adjunct in cleft surgery that can improve surgical marking and intraoperative guidance by projecting patient-specific landmarks and preoperative plans directly onto the surgical field.⁶² Recent studies have shown that AR-guided marking improves precision and supports better symmetry in cleft lip repair.⁶³ Portable smartphone- or projector-based AR systems may be especially valuable in low- and middle-income countries, where access to advanced surgical planning tools is limited, by providing low-cost support for accurate and standardized treatment. Although challenges such as cost and technical training remain, AR represents an important future direction for improving accessibility and precision in cleft surgical care.⁶⁴

Recent advancements in deep learning have substantially advanced the diagnosis, detection, classification, and treatment planning of CLP. The overall findings reinforce a practical shift toward using AI for triage, workflow efficiency, and long-term monitoring, especially in landmark identification, graft evaluation, and presurgical planning.⁶⁵ Similar to recent orthodontic reviews, AI has improved time efficiency and reproducibility, even though it does not outperform experts.^65,66 Moreover, similar to radiology, dermatology, and ophthalmology, AI performs best in image-based detection and classification tasks.⁶⁷

A limitation of this research is the predominance of small, single-center datasets and the lack of external validation across diverse populations, imaging modalities, and clinical settings, which restricts the generalizability and real-world applicability of current AI models. This also raises the possibility of overfitting, where models perform well during internal testing but fail to remain accurate in real practice. Therefore, larger multicenter datasets and independent validation cohorts are needed before AI models can be reliably brought into clinical care. Future AI development can shift from single-task models, such as landmark detection, toward multimodal systems integrating genetic, imaging, and clinical parameters, such as CBCT, cephalometrics, intraoral scans, and clinical history. This shift can enhance prediction accuracy, enable personalized treatment planning, and support evidence-based decision making in CL/P management.

Conclusion

In conclusion, AI is increasingly being applied in the management of CLP, with current applications focused mainly on 3 domains: (1) CL/P diagnosis, detection, and classification; (2) CL/P prediction and genetic risk assessment; and (3) CL/P surgical outcome assessment and treatment planning. Across these areas, AI has demonstrated reliable accuracy and consistency in anatomical landmarking, radiographic analysis, image-based classification, and pre- and postoperative evaluation, which approached or, sometimes, matched expert-level performance.

Supplemental Material

sj-docx-1-cpc-10.1177_10556656261460187 - Supplemental material for Artificial Intelligence Applications in Cleft Lip and Palate Diagnosis, Prediction, and Treatment: A Systematic Review and Meta-Analysis

Supplemental material, sj-docx-1-cpc-10.1177_10556656261460187 for Artificial Intelligence Applications in Cleft Lip and Palate Diagnosis, Prediction, and Treatment: A Systematic Review and Meta-Analysis by Nozhan Azimi, DDS, Sahar Akbari Iraj, DDS, MSc, Fateme Mazaheri, DDS, MSc, Sarina Maddahi, DDS, MSc, Mohammad Mahdi Khanmohammadi, DDS, Ali Azadi, DDS, and Asghar Ebadifar, DDS, MSc in The Cleft Palate Craniofacial Journal

Footnotes

ORCID iDs

Nozhan Azimi

Mohammad Mahdi Khanmohammadi

Ali Azadi

Author Contributions

NA: Conceptualization, Methodology, Investigation, Writing—Original Draft, Writing—Review & Editing, Project administration.

SA: Investigation, Writing—Original Draft, Writing—Review & Editing.

FM: Investigation, Writing—Original Draft, Writing—Review & Editing.

SM: Investigation, Writing—Original Draft, Writing—Review & Editing.

MK: Investigation, Writing—Original Draft, Writing—Review & Editing.

AA: Investigation, Formal analysis.

AE: Investigation, Writing—Review & Editing, Supervision, Project administration.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

All data supporting the findings of this study are available within the paper and upon reasonable request.

Supplemental Material

Supplemental material for this article is available online.

References

Almoammar

. Harnessing the power of artificial intelligence in cleft lip and palate: an in-depth analysis from diagnosis to treatment, a comprehensive review. Children. 2024;11(2):140.

Huqh

MZU

Abdullah

Wong

Jamayet

Alam

Rashid

Husein

Ahmad

WMAW

Eusufzai

Prasadh

, et al. Clinical applications of artificial intelligence and machine learning in children with cleft lip and palate—a systematic review. Int J Environ Res Public Health. 2022;19(17):10860.

Miranda

Choudhari

Barone

Anchling

Hutin

Gurgel

Al Turkestani

Yatabe

Bianchi

Aliaga-Del Castillo

, et al. Interpretable artificial intelligence for classification of alveolar bone defect in patients with cleft lip and palate. Sci Rep. 2023;13(1):15861. doi: 10.1038/s41598-023-43125-7

Qadeer

Bukhari

Iqbal

. Machine learning in the prediction of nonsyndromic Orofacial Cleft in Pakistan. 2021 International Conference on Innovative Computing (ICIC). 2021:1–6. doi: 10.1109/ICIC53490.2021.9692976.

Azimi

Talebi Rafsanjan

Khanmohammadi Khorami

Ebadifar

Azadi

. Applications of machine learning in image analysis to identify craniosynostosis: a systematic review and meta-analysis. Orthod Craniofac Res. 2025;28(5):733–751. doi: 10.1111/ocr.12918

Najary

Azadi

Shamszadeh

Shams

Nokhbatolfoghahaei

. Evaluation of artificial intelligence-generated information on cell culture and laboratory protocols in the field of tissue engineering: a comparison between GPT-3.5, Claude-Instant, and Microsoft Copilot: AI in Cell Lab. Regenerat Reconstruct Restorat (Triple R). 2024;9(11).

Zambrano

CDB

Jiménez

Rodríguez

AGM

Rincón

EHH

. Revolutionizing cleft lip and palate management through artificial intelligence: a scoping review. Oral Maxillofac Surg. 2025;29(1):79.

McInnes

Moher

Thombs

McGrath

Bossuyt

Clifford

Cohen

Deeks

Gatsonis

Hooft

, et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA. 2018;319(4):388–396.

Whiting

Rutjes

Westwood

Mallett

Deeks

Reitsma

Leeflang

Sterne

Bossuyt

, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–536. doi:10.7326/0003-4819-155-8-201110180-00009

10.

Wolff

Moons

KGM

Riley

Whiting

Westwood

Collins

Reitsma

Kleijnen

Mallett

, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51–58. doi: 10.7326/m18-1376

11.

M H. - JBI Critical appraisal checklist for systematic reviews and research syntheses. - J Can Health Libr Assoc 2024 Dec 1;45(3):180-3 doi: 1029173/jchla29801 (- 1708-6892 (Electronic)):T - epublish.

12.

Hayajneh

Serpedin

Shaqfeh

Glass

Stotland

. Adapting a style based generative adversarial network to create images depicting cleft lip deformity. Sci Rep. 2025;15(1):3614.

13.

Hayajneh

Serpedin

Stotland

. Facial anomaly appraisal using discrepancy optimization-driven automatic inpainting. IEEE J Biomed Health Inform. 2025;29(11):8423–8435.

14.

Hayajneh

Shaqfeh

Serpedin

Stotland

. Unsupervised anomaly appraisal of cleft faces using a StyleGAN2-based model adaptation technique. PLoS ONE. 2023;18(8):e0288228.

15.

Rosero

Salman

Hallac

Busso

. Lip abnormality detection for patients with repaired cleft lip and palate: a lip normalization approach. Proceedings of the 26th International Conference on Multimodal Interaction. 2024:479–479. doi: 10.1145/3678957.3685726

16.

Rosero

Salman

Harrison

Kane

Busso

Hallac

. Deep learning-based assessment of lip symmetry for patients with repaired cleft lip. Cleft Palate Craniofacial J. 2025;62(2):289–299.

17.

Rosero

Salman

Sisman

Hallac

Busso

. Enhanced facial landmarks detection for patients with repaired cleft lip and palate. 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG). 2024:1–10.

18.

Sayadi

Hamdan

Zhangli

Vyas

. Harnessing the power of artificial intelligence to teach cleft lip surgery. Plastic Reconstruct Surg–Global Open. 2022;10(7):e4451.

19.

Ali

Sumita

Wakabayashi

. Advancing maxillofacial prosthodontics by using pre-trained convolutional neural networks: image-based classification of the maxilla. J Prosthodont. 2024;33(7):645–654.

20.

Lingens

Lill

Nalabothu

Benitez

Mueller

Gross

Solenthaler

. Evaluation of synthetic training data for 3D intraoral reconstruction of cleft patients from single images. Int J Comput Assist Radiol Surg. 2025;20(7):1–9.

21.

Chen

Atapour-Abarghouei

Kerby

Sainsbury

Butterworth

Shum

. A feasibility study on image inpainting for non-cleft lip generation from patients with cleft lip. 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). 2022:1–4.

22.

Cheng

Mei

Chen

. CLPNet: cleft lip and palate surgery support with deep learning. 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2019;2019:3666–3672.

23.

Alam

Alfawzan

Akhter

Alswairki

Chaudhari

. Evaluation of lip morphology and nasolabial angle in non-syndromic cleft lip and/palate and non-cleft individuals. Appl Sci. 2021;12(1):357.

24.

Alam

Alfawzan

Haque

Mok

Marya

Venugopal

Siddiqui

. Sagittal jaw relationship of different types of cleft and non-cleft individuals. Front Pediatr. 2021;9:651951.

25.

Tageldin

Yacout

Eid

Abdelhafiz

. Accuracy of cephalometric landmark identification by artificial intelligence platform versus expert orthodontist in unilateral cleft palate patients: a retrospective study. Int Orthodontics. 2025;23(2):100990.

26.

Kamei

Batra

Singh

Arora

Kaushik

. Development of an artificial intelligence-based algorithm for the assessment of skeletal age and detection of cervical vertebral anomalies in patients with cleft lip and palate. Cleft Palate Craniofacial J. 2024;63(2):263–272.

27.

Kuwada

Ariji

Kise

Fukuda

Nishiyama

Funakoshi

Takeuchi

Sana

Kojima

Ariji

. Deep-learning systems for diagnosing cleft palate on panoramic radiographs in patients with cleft alveolus. Oral Radiol 2023;39(2):349–354.

28.

Kuwada

Ariji

Kise

Fukuda

Ota

Ohara

Kojima

Ariji

. Detection of unilateral and bilateral cleft alveolus on panoramic radiographs using a deep-learning system. Dentomaxillofacial Radiol. 2023;52(8):20210436.

29.

Kuwada

Ariji

Kise

Funakoshi

Fukuda

Kuwada

Gotoh

Ariji

. Detection and classification of unilateral cleft alveolus with and without cleft palate on panoramic radiographs using a deep learning system. Sci Rep. 2021;11(1):16044.

30.

Lin

Kim

P-J

Baek

S-H

Kim

H-G

Kim

S-W

Chung

J-H

. Early prediction of the need for orthognathic surgery in patients with repaired unilateral cleft lip and palate using machine learning and longitudinal lateral cephalometric analysis data. J Craniofacial Surg. 2021;32(2):616–620.

31.

Lim

Tanikawa

Kogo

Yamashiro

. Determination of prognostic factors for orthognathic surgery in children with cleft lip and/or palate. Orthod Craniofac Res. 2021;24(Suppl 2):153–162.

32.

Lim

S-W

Kim

H-G

Baek

S-H

. Accuracy of machine learning-assisted prediction of the future need for orthognathic surgery in patients with cleft lip and palate. Korean J Orthod.. 2025;55(5):365–379.

33.

Zhang

Pei

Guo

Chen

Zha

. Adaptable cascaded registration for personalized maxilla completion and cleft defect volume estimation. Med Phys. 2024;51(6):4283–4296.

34.

Zhang

Qin

Zhou

Chen

. Machine learning in 3D auto-filling alveolar cleft of CT images to assess the influence of alveolar bone grafting on the development of maxilla. BMC Oral Health. 2023;23(1):16.

35.

Fujii

Sugiyama-Tamura

Sugisaki

Chujo

Honda

Kono

Chikazu

. New assessment method of alveolar bone grafting using automatic registration and AI-based segmentation. J Craniofacial Surg. 2024. doi: 10.1097/SCS.0000000000010492

36.

Liu

Luo

Sun

Wang

Yin

Tang

Song

. Using a new deep learning method for 3D cephalometry in patients with cleft lip and palate. J Craniofacial Surg. 2023;34(5):1485–1488.

37.

Wang

Liu

Diao

Liu

Zhang

. Recognition of fetal facial ultrasound standard plane based on texture feature fusion. Comput Math Methods Med. 2021;2021(1):6656942.

38.

Zhang

Pei

Chen

Guo

Zha

. Volumetric registration-based cleft volume estimation of alveolar cleft grafting procedures. 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). 2020:99–103. doi: 10.1109/ISBI45749.2020.9098407

39.

Seo

Yang

I-H

Choi

J-Y

Lee

J-H

Baek

S-H

. Three-dimensional facial soft tissue changes after orthognathic surgery in cleft patients using artificial intelligence-assisted landmark autodigitization. J Craniofacial Surg. 2021;32(8):2695–2700.

40.

Agaronyan

Choo

Linguraru

Anwar

. Geometric deep learning for automated landmarking of maxillary arches on 3D oral scans from newborns with cleft lip and palate. 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). 2025.

41.

Woodsend

Koufoudaki

Lin

McIntyre

El-Angbawi

Aziz

Shaw

Semb

Reesu

Mossey

. Development of intra-oral automated landmark recognition (ALR) for dental and occlusal outcome measurements. Eur J Orthod. 2022;44(1):43–50.

42.

Santos

Mueller

Benitez

Lill

Nalabothu

Muniz

FWMG

. Smartphone-based scans of palate models of newborns with cleft lip and palate: outlooks for three-dimensional image capturing and machine learning plate tool. Orthod Craniofac Res.. 2025;28(1):166–174.

43.

Schnabel

Gözcü

Gotardo

Lingens

Dorda

Vetterli

Emhemmed

Nalabothu

Lill

Benitez

, et al. Automated and data-driven plate computation for presurgical cleft lip and palate treatment. Int J Comput Assist Radiol Surg . 2023;18(6):1119–1125.

44.

Heike

Birgfeld

Evans

Maga

Morrison

Saltzman

Shapiro

Tse

. Measuring symmetry in children with unrepaired cleft lip: defining a standard for the three-dimensional midfacial reference plane. Cleft Palate Craniofac J. 2016;53(6):695–704.

45.

Nantha

Sathanarugsawait

Praneetpolgrang

. Cleft lip and palate classification through vision transformers and Siamese Neural Networks. J Imaging. 2024;10(11):271.

46.

Zhu

Han

Cao

Chen

Huang

Dou

Liang

Zhang

, et al. CLP-Net: an advanced artificial intelligence technique for localizing standard planes of cleft lip and palate by three-dimensional ultrasound in the first trimester. BMC Pregnancy Childbirth. 2025;25(1):10.

47.

Jeon

Jang

Chakraborty

Kim

Moon

Chung

Baek

. Prediction of palatoplasty timing for infants with cleft lip and palate using machine learning algorithm. J Craniofacial Surg. 2025;36(3):947–951.

48.

Dai

Itai

Pei

Yan

Chu

Jiang

Weinberg

Mukhopadhyay

Marazita

Simon

, et al. DeepFace: deep-learning-based framework to contextualize orofacial-cleft-related variants during human embryonic craniofacial development. Human Genetics Genomics Adv. 2024;5(3):100312. doi: https://doi.org/10.1016/j.xhgg.2024.100312

49.

Kang

Baek

S-H

Kim

D-H

Park

. Genetic risk assessment of nonsyndromic cleft lip with or without cleft palate by linking genetic networks and deep learning models. Int J Mol Sci. 2023;24(5):4557.

50.

Xiao

Jiao

Lin

Zuo

Han

Sun

Cao

Chen

Liu

. Chromatin conformation of human oral epithelium can identify orofacial cleft missing functional variants. Int J Oral Sci. 2022;14(1):43.

51.

Machado

de Oliveira Silva

Martelli-Junior

das Neves

Coletta

. Machine learning in prediction of genetic risk of nonsyndromic oral clefts in the Brazilian population. Clin Oral Investig. 2021;25(3):1273–1280.

52.

Machado

Nogueira

Martelli-Júnior

Reis

Persuhn

Coletta

. 2p24. 2 (rs7552) is a susceptibility locus for nonsyndromic cleft lip with or without cleft palate in the Brazilian population. Clin Genet.. 2018;93(6):1199–1204.

53.

Liu

Wang

Yuan

Schwender

Wang

Zhou

Zhu

, et al. Gene–gene interaction among cell adhesion genes and risk of nonsyndromic cleft lip with or without cleft palate in Chinese case-parent trios. Mol Genet Genomic Med. 2019;7(10):e00872.

54.

Zhang

S-J

Meng

Zhang

Jia

Lin

Wang

Chen

Wei

. Machine learning models for genetic risk assessment of infants with non-syndromic orofacial cleft. Genomics Proteomics Bioinformatics. 2018;16(5):354–364.

55.

Kim

Suktitipat

Hetmanski

Marazita

Duggal

Beaty

Bailey-Wilson

. Gene-gene interaction among WNT genes for oral cleft in trios. Genet Epidemiol. 2015;39(5):385–394.

56.

Jia

Xie

Yang

Dong

Luo

Wei

Liu

Cao

, et al. Combining lipidomics and machine learning to identify lipid biomarkers for nonsyndromic cleft lip with palate. JCI Insight. 2025;10(9):e186629.

57.

Shafi

Bukhari

Iqbal

Almustafa

Asif

Nawaz

. Cleft prediction before birth using deep neural network. Health Informatics J.. 2020;26(4):2568–2585.

58.

Cai

Huang

Liu

. Deep learning based detection and classification of fetal lip in ultrasound images. J Perinat Med.. 2024;52(7):769–777.

59.

Wang

Pastewait

Lian

Tejera

Lee

Lin

Wang

Shen

, et al. 3D morphometric quantification of maxillae and defects for patients with unilateral cleft palate via deep learning-based CBCT image auto-segmentation. Orthod Craniofac Res. 2021;24(Suppl 2):108–116.

60.

Shehab

Shedd

Alamah

Mardini

Bite

Gibreel

. Bridging gaps in health literacy for cleft lip and palate: the role of artificial intelligence and interactive educational materials. Cleft Palate Craniofacial J. 2024;63(1):65–71.

61.

Mahedia

Rohrich

Sadiq

KOS

Bailey

Harrison

Hallac

. Exploring the utility of chatgpt in cleft lip repair education. J Clin Med.. 2025;14(3):993.

62.

Rudy

Schreiber

Wake

Lesko

Gordon

Garfein

Tepper

. Intraoperative navigation in plastic surgery with augmented reality: a preclinical validation study. Plast Reconstr Surg. 2022;149(3):573e–580e. doi: https://doi.org/10.1097/prs.0000000000008875.

63.

Shah

Sayadi

Kassani

Vyas

. Projected augmented reality in surgery: history, validation, and future applications. J Clin Med.. 2025;14(22):8246.

64.

Wei

Yan

Wang

. A novel portable augmented reality surgical navigation system for maxillofacial surgery: Technique and accuracy study. Int J Oral Maxillofac Surg.. 2024;53(11):961–967. doi: https://doi.org/10.1016/j.ijom.2024.02.007

65.

Wadia

. AI In orthodontics. Br Dent J.. 2024;237(12):927. 10.1038/s41415-024-8238-2

66.

Nordblom

Buettner

Schwendicke

. Artificial intelligence in orthodontics. Critical review. J Dental Res. 2024;103(6):577–584. doi: 10.1177/00220345241235606

67.

Friebe

. AI In radiology and interventions: a structured narrative review of workflow automation, accuracy, and efficiency gains of today and what’s coming. Int J Comput Assist Radiol Surg.. 2026;21(1):1–10. doi: 10.1007/s11548-025-03547-2

68.

McCullough

Auslander

Yao

Campbell

Scherer

Magee

. Convolutional Neural Network Models for Automatic Preoperative Severity Assessment in Unilateral Cleft Lip. Plastic & Reconstructive Surgery. 2021;148(1):162–169. doi: 10.1097/PRS.0000000000008063.

69.

Tse

Shapiro

. Learning to Rank the Severity of Unrepaired Cleft Lip Nasal Deformity on 3D Mesh Data. Proceedings of the ... IAPR International Conference on Pattern Recognition. International Conference on Pattern Recognition. 2014;2014:460–464. doi: 10.1109/ICPR.2014.88.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.49 MB