Abstract
Background:
The accuracy of artificial intelligence (AI) in treatment planning and outcome prediction in orthognathic treatment (OGT) has not been systematically reviewed.
Objectives:
To determine the accuracy of AI in treatment planning and soft tissue outcome prediction in OGT.
Design:
Systematic review.
Data sources:
Unrestricted search of indexed databases and reference lists of included studies.
Data selection:
Clinical studies that addressed the focused question ‘Is AI useful for treatment planning and soft tissue outcome prediction in OGT?’ were included.
Data extraction:
Study screening, selection and data extraction were performed independently by two authors. The risk of bias (RoB) was assessed using the Cochrane Collaboration’s RoB and ROBINS-I tools for randomised and non-randomised clinical studies, respectively.
Data synthesis:
Eight clinical studies (seven retrospective cohort studies and one randomised controlled study) were included. Four studies assessed the role of AI for treatment decision making; and four studies assessed the accuracy of AI in soft tissue outcome prediction after OGT. In four studies, the level of agreement between AI and non-AI decision making was found to be clinically acceptable (at least 90%). In four studies, it was shown that AI can be used for soft tissue outcome prediction after OGT; however, predictions were not clinically acceptable for the lip and chin areas. All studies had a low to moderate RoB.
Limitations:
Due to high methodological inconsistencies among the included studies, it was not possible to conduct a meta-analysis and reporting biases assessment.
Conclusion:
AI can be a useful aid to traditional treatment planning by facilitating clinical treatment decision making and providing a visualisation tool for soft tissue outcome prediction in OGT.
Registration:
PROSPERO CRD42022366864.
Keywords
Introduction
Artificial intelligence (AI) is a computer-based technology that is capable of mimicking human decision-making and problem-solving abilities. Machine learning incorporates algorithms whose performance improves by being exposed to more data over time. Deep learning is a subcategory of machine learning, in which convolutional neural networks learn from extensive data by estimating complex non-linear associations between input and output variables (Benke and Benke, 2018; Choi et al., 2019). This is similar to how humans learn, but these algorithms can accomplish tasks at a faster pace (Choi et al., 2019). Multiple applications of AI algorithms have been developed in the medical field. Such machines are encrypted with data collected from health centres and research institutions to facilitate data-driven decision making (Allareddy et al., 2019; Bouletreau et al., 2019). In recent years, specialties in the medical field have used AI in multiple ways, including diagnosis of diabetic retinopathy, atrial fibrillation, cerebral haemorrhage, strokes and classification of attractiveness in plastic surgery-related interventions (Murphy and Saleh, 2020; Pethani, 2021). In clinical dentistry, AI models have been implemented in many specialties, from interpretation of dental images to treatment recommendations and/or projection of future diseases. For instance, in endodontics, periodontics and prosthodontics, AI is currently utilised for the detection of periapical radiolucencies, detection of vertical root fractures, assessment of profile stresses on the mandible during the implant process and to determine if teeth need to be restored, need a root canal therapy or need to be extracted (Bernauer et al., 2021; Boreak, 2020; Hung et al., 2020; Roy et al., 2018; Thurzo et al., 2022; Van Staden et al., 2008). Other AI models are able to computerise charting from radiographs, detect caries lesions and other oral lesions, such as dentigerous cysts and periapical cysts (Reyes et al., 2021). In addition, AI has been implemented in computational simulation via the finite element method to facilitate dental implant design optimisation and the prediction of success in implant dentistry (Revilla-León et al., 2021), as well as permit biomechanical simulations and tissue stress response evaluations in various medical and dental fields (Ammarullah et al., 2022; Jain et al., 2021; Phellan et al., 2021; Vurtur Badarinath et al., 2021; Xue et al., 2021).
In orthodontics, AI was initially utilised for radiographic landmark tracing, as well as for the determination of the patient’s skeletal classification from lateral cephalograms (Mohammad-Rahimi et al., 2021). Nowadays, new machine models have been used as a tool for treatment decision making, such as the determination of extraction/non-extraction decisions, extraction patterns and anchorage patterns (Albalawi and Alamoud, 2022; Etemad et al., 2021; Hung et al., 2020; Li et al., 2019; Khanagar et al., 2021b). Recently, promising applications of AI have emerged for patients receiving orthognathic treatment (OGT). It is well known that the main role of OGT is to correct dentofacial discrepancies that surpass the capacity of conventional orthodontic treatment (OT) (Bouletreau et al., 2019). To increase the success of the OGT, dental decompensations with the use of dental extractions might be required during the preoperative OT to achieve maximum correction of skeletal discrepancies (Larson, 2014). Simultaneously, these skeletal and dental changes might have a significant effect on the overlying soft tissues, changing patients’ appearance and aesthetics (Hung et al., 2020). In this respect, AI is potentially a valuable tool for the diagnosis and outcome simulation in patients requiring OGT as well as in the visual communication for orthodontists, surgeons and patients (Lee et al., 2022). In a study of 840 patients, Shin et al. (2021) found that a convolutional neural network predicted the need of OGT for the correction of skeletal malocclusion with a 90% accuracy when using posteroanterior (PA) and lateral cephalograms. When evaluating its utilisation for the diagnosis of surgery/non-surgery decision and extraction decision, a new AI model developed by Choi et al. (2019) was able to diagnose any need of surgery/non-surgery and extractions with a success rate of approximately 90%. Moreover, a case-series study using computed tomography (CT) scans and an AI software showed clinically relevant results when virtually simulating soft tissue changes of the face before and after bimaxillary OGT (Cunha et al., 2021). In contrast, in a clinical study by Bengtsson et al. (2017), it was shown that AI was not accurate for the prediction of soft tissue outcome for most of the soft tissue measurements assessed. Based on these results, it appears there is controversy in the existing literature regarding the applicability of AI for treatment planning in OGT.
A thorough search of the indexed literature revealed that the implementation of AI for treatment planning and soft tissue outcome prediction in OGT cases is yet to be reviewed systematically. Therefore, there is a need to compile and critically appraise the quality of available information. In this regard, our goal was to undertake a systematic review to determine the accuracy of AI in OGT, and to evaluate the quality of the available evidence.
Materials and methods
Protocol and focused question
The present study is a systematic review of the literature; therefore, prior ethical approval was not required by the Institutional Review Board. Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) guidelines (Page et al., 2021) were followed to answer the focused question: ‘Is AI useful for the treatment planning and soft tissue outcome prediction of OGT cases?’ The protocol of this systematic review was registered in PROSPERO (CRD42022366864).
Eligibility criteria
In this systematic review, randomised and non-randomised clinical studies including patients undergoing OGT were assessed. Case reports and case-series, letters to the editor, commentaries, reviews, experimental studies and cross-sectional studies were excluded. The PICOS format was as follows: Population (P = patients undergoing OGT); Intervention (utilisation of AI for treatment planning and/or soft tissue outcome prediction); Control (conventional doctor-based treatment planning and/or soft tissue outcome prediction without AI); Outcome (level of agreement between AI and non-AI in decision making and/or between AI-predicted and actual/obtained soft tissue treatment outcomes); and Study design (clinical studies).
Information sources, search strategy and study selection
Indexed databases (PubMed, Scopus, Web of Science, EMBASE, Dentistry & Oral Sciences Source) and Google Scholar were searched without language and time restrictions up to and including 9 September 2022. Keywords and medical/dental subject headings were combined with the use of Boolean operators (OR, AND) for the identification of relevant studies; a customised search strategy was developed by one author (DS) (Supplementary Table 1). Two authors (DS and FJ) screened the titles and abstracts of the retrieved studies, and full texts of the relevant studies were read independently. The reference lists of the relevant original studies and review articles were additionally hand-searched to avoid missing any relevant articles. Disagreements were resolved through consensus discussion and consultation with a third researcher (DM).
Data collection and data items
Data extraction from eligible studies was performed independently by two authors (DS and DM); disagreements were resolved through discussion with a third researcher (PER). The following information was recorded from the eligible studies: (1) authors; (2) study design; (3) power analysis; (4) sample size; (5) age; (6) sex of participants; (7) study groups; (8) malocclusion-skeletal problems; (9) study duration; (10) imaging methods; (11) AI models used; (12) AI usage; (13) treatment planning decision; (14) level of agreement between groups; (15) statistical significance testing; (16) measurements assessed; (17) mean absolute error; and (18) main study outcomes.
Risk of bias in individual studies
Two authors (DS and DM) assessed the risk of bias (RoB) of the included studies using the Cochrane Collaboration’s RoB tool and the ROBINS-I tool for randomised and non-randomised clinical studies, respectively. For the Cochrane Collaboration’s RoB tool (Higgins et al., 2011), the following parameters were assessed: (1) random sequence generation; (2) allocation concealment; (3) selective reporting; (4) blinding of investigators and participants; (5) blinding of outcome assessment; (6) incomplete outcome data; and (7) additional bias related to problems not discussed in the study. According to the above criteria, the RoB was considered as ‘low risk’, ‘high risk’ or ‘unclear risk’, with the last type indicating either uncertainty over the potential for bias or lack of information (Higgins et al., 2011). The following parameters were assessed with the ROBINS-I tool (Sterne et al., 2016): (1) bias due to confounding; (2) bias in selection of participants into the study; (3) bias in classification of interventions; (4) bias due to deviations from intended interventions; (5) bias due to missing data; (6) bias in measurement of outcomes; (7) bias in selection of the reported result; and (8) overall bias. According to the above criteria, the RoB was considered to be ‘low’, ‘moderate’, ‘serious’ or ‘critical’, with the last type indicating either uncertainty over the potential for bias or lack of information (Sterne et al., 2016). Any disagreements in the RoB assessment were resolved as previously mentioned.
Summary measures and synthesis of results
The authors planned to use the random effects method for meta-analysis given that a sufficient number of studies with compatible methodologies were identified (DerSimonian and Laird, 1986). Nonetheless, due to the nature of the topic of interest, a variability was expected related to the AI models used, study design and outcomes/parameters of interest.
RoB across studies and additional analyses
It was planned to assess the publication bias and quality of available evidence for meta-analyses if there were more than five studies (Higgins and Green, 2011).
Results
Study selection
The initial search identified 310 studies; after removing the duplicates, a total of 135 studies remained. Of them, 96 studies were excluded after evaluating the title and/or abstract. Of the remaining 39 studies, eight clinical studies were finally included in the present systematic review and processed for data extraction (Supplementary Table 2 and Figure 1).

PRISMA flow chart.
General characteristics of the included randomised controlled trials
AI for treatment planning
Four retrospective cohort studies were included (Choi et al., 2019; Kim et al., 2021; Lee et al., 2020; Shin et al., 2021). None of the studies performed a power analysis for sample size estimation. The sample size was in the range of 316–960 patients. The mean age of all the patients was in the range of 19–29 years, in the three studies reporting ages (Kim et al., 2021; Lee et al., 2020; Shin et al., 2021). Two studies (Kim et al., 2021; Lee et al., 2020) reported the distribution between sexes, with a percentage distribution in the range of 46%–51% for male patients and 49%–54% for female patients. The country of origin of all these patients was Korea. In the included studies, a subset of patients’ records was included in the AI learning group, which was used to construct/train the AI test group. After the system was trained, a test group was selected from the total sample to assess the AI model and compare with non-AI (doctor-based) treatment decision making. Two studies specified that the included patients exhibited Class II and Class III dento-skeletal relationships (Choi et al., 2019; Shin et al., 2021). The results are shown in Table 1.
General characteristics of the included randomised controlled clinical trials.
Values are given as mean ± SD (range).
Learning groups were used to construct and evaluate the performance of the AI software models.
2D, two-dimensional; 3D, three-dimensional; AI, artificial intelligence; CBCT, cone beam computed tomography; DL, deep learning; F, female; M, male; N/A, not available; SD, standard deviation.
AI for soft tissue outcome prediction after OGT
One randomised controlled trial (RCT) and three retrospective cohort studies were included (Bengtsson et al., 2017; Lee et al., 2022; Shafi et al., 2013; Ter Horst et al., 2021). The sample size was in the range of 10–133 patients. A power analysis was performed in only one study (Shafi et al., 2013). The ages of all the patients were 14 years and older. Three studies reported the distribution between sexes, with a percentage distribution that was in the range of 39%–53% for male patients and 47%–61% for female patients (Bengtsson et al., 2017; Shafi et al., 2013; Ter Horst et al., 2021). The country of origin of these patients were Sweden, UK, the Netherlands and China. The AI-based soft tissue outcome prediction (AI test group) was compared to the actual soft tissue outcomes obtained (comparison of pre- and post-OGT records). Patients presented with dentofacial Class II or Class III deformities. Patients were evaluated 6–12 months after OGT was performed. The study characteristics are shown in Table 1.
Study characteristics related to AI
AI for treatment planning
In all retrospective cohort studies, craniofacial imaging was performed using lateral cephalometric (LC) imaging (Choi et al., 2019; Kim et al., 2021; Lee et al., 2020; Shin et al., 2021). In the study by Shin et al. (2021), PA cephalometric imaging was also performed. Multiple AI models were evaluated among all these studies, including non-specified AI models, ResNet18, ResNet34, ResNet50, ResNet101, modified-Alexnet and MobileNet (Choi et al., 2019; Kim et al., 2021; Lee et al., 2020; Shin et al., 2021). All these studies utilised the AI models for surgery versus non-surgery decision making (Kim et al., 2021; Lee et al., 2020; Shin et al., 2021), except Choi et al., who also used AI for the determination of the need of dental extractions for preoperative dental decompensation (Choi et al., 2019). The study characteristics are shown in Table 2.
Study characteristics related to AI.
2D, two-dimensional; 3D, three-dimensional; 3D photos, facial stereophotogrammetric scans; AI, artificial intelligence; BSSO, bilateral sagittal split osteotomy; CBCT, cone beam computed tomography; CNN, convolutional neural networks; CT, computed tomography; DL, deep learning model; ext, extraction; Lat ceph, lateral cephalometric radiograph; LeFort I, LFI advancement; MTM, mass tensor model; non-ext, non-extraction; PA ceph, posteroanterior cephalometric; Proplan CMF, Synthes Proplan CMF software; VRO, vertical ramus osteotomy.
AI for soft tissue outcome prediction after OGT
In three studies, cone beam computed tomography (CBCT) was the preferred preoperative imaging method (Lee et al., 2022; Shafi et al., 2013; Ter Horst et al., 2021). One study used two different methods: lateral cephalograms and CT scans (Bengtsson et al., 2017). 3D photos were included in only two studies (Lee et al., 2022; Ter Horst et al., 2021). The following AI models were used by these authors: Facad; Simplant PRO 12.02 OMS; Maxilim; MTM IPS CaseDesigner; non-specified DL; and ProPlan CMF (Bengtsson et al., 2017; Lee et al., 2022; Shafi et al., 2013; Ter Horst et al., 2021). The purpose of all these AI models was to focus on predicting the soft tissue profile changes after one or two jaw OGT (Bengtsson et al., 2017; Lee et al., 2022; Shafi et al., 2013; Ter Horst et al., 2021). The study characteristics are shown in Table 2.
Results of individual studies
AI for treatment planning
RestNet-34 (Kim et al., 2021; Shin et al., 2021) and the new DL-based model developed by Choi et al. (2019) showed the highest level of agreement between the AI and non-AI groups among all the models evaluated. Overall, all the AI models tested were shown to be useful for decision making for OGT (Choi et al., 2019; Kim et al., 2021; Lee et al., 2020; Shin et al., 2021), except Mobile-Net, which showed the lowest accuracy of 83.8% (Lee et al., 2020). RestNet-50 showed different diagnostic accuracy in two studies (Kim et al., 2021; Lee et al., 2020). The results are summarised in Table 3.
Main study outcomes regarding the use of AI for treatment decision making.
Clinical usefulness of AI in the present systematic review was based on levels of agreement between the AI and non-AI groups of at least 90% (Tan et al., 2019).
AI, artificial intelligence.
AI for soft tissue outcome prediction after OGT
A variety of measurements were assessed in each study, as seen in Table 4. Two AI models had clinically acceptable accuracy predicting most of the assessed soft tissue measurements, except the upper lip (Shafi et al., 2013) and lower lip areas (Lee et al., 2022) after OGT. When assessing the two AI models evaluated by Ter Horst et al. (2021), the AI-DL model was useful towards predicting the actual values of the lower face and lower lip, while the AI-MTM model was only useful when predicting soft tissue outcomes of the lower face. Overall, the AI-3D and AI-2D models by Bengtsson et al. (2017) were not useful towards predicting most of the soft tissue measurements assessed.
Main study outcomes regarding the use of AI for soft tissue outcome prediction.
Values are given as mean ± SD.
Clinical usefulness of AI on the present systematic review was based on the percentage of simulations within a 2-mm margin of error that has been reported to be within clinically acceptable limits. AI-based models yielding percentages of simulations within a 2-mm margin of error of 80%–100% are considered by the authors as useful for the purposes of the present systematic review.
2D, two-dimensional; 3D, three-dimensional; 11/NSL, angle formed by upper incisor long axis and nasion/sella line plane; 31/ML, angle formed by lower incisor long axis and mandibular plane; A, point A; AI, artificial intelligence; B, point B; BSSO, bilateral sagittal split osteotomy; CNN, convolutional neural networks; DL, deep learning model; Gn, gnathion; LeFort I, LFI advancement; MTM, mass tensor model; N/A, not available; N-Gn, nasion-gnathion; NSL/ML, angle formed by the nasion/sella line and mandibular plane; Pog, pogonion; SD, standard deviation; SNA, sella/nasion/point-A; SNB, sella/nasion/point-B.
RoB assessment
The RCT by Bengtsson et al. (2017) had a moderate RoB (Table 5). Five retrospective studies (Choi et al., 2019; Kim et al., 2021; Lee et al., 2020, 2022; Shin et al., 2021) had a low RoB and two studies (Shafi et al., 2013; Ter Horst et al., 2021) had a moderate RoB (Table 6).
Risk of bias assessment using the Cochrane risk of bias tool.
Risk of bias assessment* for non-randomised clinical trials (ROBINS-1).
Low, moderate, serious or critical risk of bias.
Synthesis of the results, risk of bias across studies and additional analyses
Due to the limited number of existing clinical studies with inconsistent methodologies and varying outcomes assessed, it was impractical to conduct meta-analysis and additional analyses such as an assessment for ‘small-study effects’ and publication bias (Higgins and Green, 2011; Sterne et al., 2016).
Discussion
Summary of available evidence
In the present systematic review, the authors aimed to summarise and critically appraise the available evidence regarding the potential role of AI in clinical treatment decision making and soft tissue outcome prediction in OGT. After applying strict eligibility criteria, the authors identified eight relevant clinical studies (Bengtsson et al., 2017; Choi et al., 2019; Kim et al., 2021; Lee et al., 2020, 2022; Shafi et al., 2013; Shin et al., 2021; Ter Horst et al., 2021). Four retrospective cohort studies (Choi et al., 2019; Kim et al., 2021; Lee et al., 2020; Shin et al., 2021) assessed the level of agreement between AI model-based and non-AI/doctor-based decision making mainly regarding the determination of the need for OGT versus non-OGT in orthodontic patients with dentoskeletal deformities. The authors of the present study considered a level of agreement of at least 90% as clinically acceptable based on previously published clinical guidelines (Tan et al., 2019). A qualitative assessment of included retrospective studies indicated that most examined AI-based models had a clinically acceptable agreement (of at least 90%) compared with non-AI-based decision making regarding the surgical treatment decision. Moreover, one RCT and three retrospective cohort studies (Bengtsson et al., 2017; Lee et al., 2022; Shafi et al., 2013; Ter Horst et al., 2021) assessed the role of AI in soft tissue outcome prediction in OGT by comparing AI-predicted versus actual (obtained) soft tissue profile measurements after OGT and reporting mean (± SD) absolute errors. The authors of the present study perceive that it is challenging to determine clinically acceptable cut-off points for the mean absolute errors when comparing AI-predicted versus obtained treatment outcomes. In addition, from a clinical standpoint, acceptable margins of absolute errors (± SDs) may vary depending on the type of soft tissue measurements (linear vs. angular) and the locations assessed (such as the nose, upper lip, lower lip, lower face height and chin). Nonetheless, to produce a meaningful interpretation of the published results, the authors accepted a 2-mm margin of error as clinically acceptable based on previous published recommendations (Bengtsson et al., 2017; Lee et al., 2022). In addition, we sought to extract data related to the percentages of simulations within a 2-mm margin of error for the various soft tissue measurements assessed and arbitrarily selected a range of 80%–100% as clinically useful. Based on these cut-offs, we were able to interpret the clinical significance of the published data, which indicate that most AI-based models do not demonstrate clinically acceptable accuracy for soft tissue outcome assessment for most of the soft tissue measurements assessed and particularly pertaining to the areas of the lips and chin. It is pertinent to mention that by no means do the authors of the present systematic review recommend dismissing the use of AI for soft tissue outcome prediction in patients with dentoskeletal deformities seeking OGT. Soft tissue outcome prediction can be a valuable tool to facilitate patient–doctor communication, manage patient expectations, and provide a visualisation of the expected surgical soft tissue profile changes and anticipated results (Bengtsson et al., 2017; Lee et al., 2022; Shafi et al., 2013; Ter Horst et al., 2021). However, based on these findings, it is crucial for the practitioners, including orthodontists and orthognathic surgeons, to educate patients about the current limitations of AI-based models on fully predicting soft tissue changes after OGT. Furthermore, these findings highlight the need of further research on developing AI models with a higher predicting accuracy regarding soft tissue changes with OGT.
In a systematic review, Khanagar et al. (2021a) assessed the application and performance of AI in clinical dentistry. Based on the findings of their review, the authors concluded that the performance of AI matched the precision and accuracy of dentists and/or dental specialists regarding various dental diagnostic procedures, such as caries diagnosis, cephalometric analysis and the diagnosis of alveolar bone loss as well as cancerous lesions. Another comprehensive review similarly concluded that AI can be successfully used in the field of conventional orthodontics to improve the diagnostic accuracy in radiographic anatomic landmark identification and in determination of the cervical vertebral maturation stage on lateral cephalometric radiographs (Monill-González et al., 2021). Furthermore, in a systematic review by Evangelista et al. (2022), it was concluded that AI presents promising accuracy on supporting orthodontic tooth extraction decision making; however, caution was recommended due to a very low quality of available evidence. Our results, similarly, highlight the potential benefits and current limitations regarding the application of AI in OGT.
Limitations and strengths
It is tempting to conclude that AI is useful at least towards supporting clinical treatment decision making in patients with dentoskeletal deformities considering OGT. However, numerous factors may have influenced the individual study results. First, there was a high methodological heterogeneity among the included clinical studies (Choi et al., 2019; Kim et al., 2021; Lee et al., 2020; Shin et al., 2021). Regarding the study participants, there was a variability in the number of participants included (range = 10–413 patients). In addition, the clinical studies assessed included both adolescent and adult patients of varying ethnicities, such as Asian and Caucasian. Moreover, there was a variability in the types of skeletal deformities, including Class II and Class III malocclusion as well as facial asymmetry. The authors perceive that all these patient-related variables might affect treatment planning and outcome prediction of patients requiring OGT. There were methodological inconsistencies also noted in AI-related characteristics, such as the type of AI model used (non-reported AI, ResNet-18, ResNet-34, ResNet-50, ResNet-101, modified-Alexnet, MobileNet, Facad, Simplant PRO 12.02 OMS, MTM IPS CaseDesigner and ProPlan CMF) as well as the imaging methods (lateral cephalograms, PA cephalograms, CBCT, CT and 3D photos). Furthermore, there was an inconsistency noted among the included clinical studies regarding the soft tissue measurements assessed (Table 4), which further complicates efforts to compile the available evidence. The authors initially aimed to pool the available data for quantitative assessment (meta-analysis); however, this was unreasonable based on the aforementioned limitations. Based on the currently available evidence, it is not possible to recommend the ideal AI-based model for the treatment planning of patients requiring OGT. Further research is needed in this regard.
It is pertinent to mention that AI-based treatment decision making in the included studies was mainly supported by information obtained via 3D and/or 2D radiographic imaging and facial photos. However, other factors, such as patients’ medical history (such as systemic diseases and/or medications), psychosocial characteristics and concerns/preferences (particularly chief complaint), should be taken into consideration when developing individual treatment plans pertaining to tooth extraction and surgical decisions. Moreover, the four studies that evaluated the accuracy of AI in treatment decision making included solely patients from Korea, which warrants caution in the generalisability of the results to patients from other racial groups. In the future, AI models should potentially integrate additional patient characteristics such as age, sex, ethnicity/race, health characteristics, and patient’s concerns and preferences.
In the present systematic review, a limited number of clinical studies with a low to moderate RoB were identified. Problematic domains mainly included lack of blinding of the participants and for the outcome assessment (which is anticipated based on the type of intervention) as well as sample size limitations (lack of power analysis) for the majority of studies implicating the potential of type II error. Moreover, there was a considerable discrepancy among the total number of patients included (sample size) and number of patients who were actually analysed, due to the majority of patients being allocated to a learning group for the purposes of training of the AI software. It is well known that the AI algorithms utilise initial datasets of patient characteristics that serve as a ‘learning set’ to predict future outcomes (Gili et al., 2021). Therefore, the appropriateness and amount of data used to train AI models can significantly affect the performance and validity of AI models. It has been reported that a large amount of data is required for robust training and testing of AI algorithms (Elmore and Lee, 2021). In addition, biomedical data, including orthodontic data, are dynamic and complex in nature, posing challenges to the machine learning process due to factors such as high dimensionality, dynamicity, data incompleteness and statistical noise. Training AI algorithms on small samples of patients may result in over-adaptation of the algorithm to the training dataset, which may lead to poor performance on a new set of patients’ data (Gili et al., 2021). It is important to mention that the composition of datasets used for the training of AI algorithms was poorly defined in the included clinical studies. Moreover, heterogenicity of patient and/or malocclusion-related characteristics, inconsistencies in the radiographic landmarks/measurements imported into the AI systems, and amount of data used for the AI algorithms in the included clinical studies might have affected the performance of the AI models. Furthermore, the AI models utilised in the included clinical studies were developed in single centres, which may limit the generalisability of individual study results. It is also relevant to note that examiners who performed the training of the AI software were also responsible for conducting some of the study measurements, which may further introduce biases and overestimate the level of agreement between the AI and non-AI groups. Thus, caution is warranted in the interpretation of individual study results.
Although our intent was to include the highest level of available evidence (RCTs), in the majority of studies, records of patients were retrospectively obtained to compare AI versus clinician-based treatment decision making and/or AI-predicted versus actual/obtained soft tissue changes. In this regard, randomisation was not possible, and the risk of selection bias cannot be excluded. Nonetheless, these studies provide useful information in the context of validating/testing the accuracy of AI in clinical decision making and outcome prediction of OGT. Furthermore, the authors recognise that it is ethically challenging to conduct RCTs including patients who are randomly allotted to AI-based treatment planning as today there is a lack of adequate evidence to support the sole use of AI for treatment planning and decision making.
The strengths of the present review include an exhaustive and unrestricted search of indexed literature, the pre-registration of the review protocol, the application of strict eligibility criteria, the use of recommended tools to assess the RoB as well as the efforts of the authors to interpret the clinical relevance of individual study results by applying cut-off points for the acceptable level of agreements and margins of absolute errors.
Clinical recommendations and directions for future research
The AI models used in the included studies appear to have a clinically acceptable level of agreement with conventional diagnostics in treatment decision making regarding OGT, and, within limitations, can be used as a visualisation tool for soft tissue outcome prediction. In this regard, AI may be considered a useful aid to conventional diagnostics in OGT by automating diagnostic tasks, decreasing doctor uncertainty on clinical decisions, and facilitating patient communication regarding expected treatment outcomes. Nonetheless, based on the currently available studies, it remains unclear whether AI can improve the validity of conventional doctor-based treatment decision making, and whether it can serve as a substitute for conventional diagnostics. From a research standpoint, it is pertinent to develop formal recommendations regarding clinically acceptable levels of agreement and margins of errors related to the use of AI for treatment planning and outcome predictions. This will help researchers further refine AI-based models as well as compare the clinical usefulness of different models. Future studies should be power-adjusted and utilise more uniform methodologies related to the use of AI and outcomes evaluated to provide more standardised information. In addition, future studies should focus on the potential use of AI as a substitute/alternative to conventional diagnostics for clinical treatment decision making in OGT. Other factors that may help improve the methodological quality of future RCTs is the use of large-scale and publicly available datasets to ensure more robust training and testing of AI algorithms, as well as plan studies where outcome examiners are independent of the researchers who perform the AI training.
Conclusion
AI can be a useful aid to traditional treatment planning by facilitating treatment decision making and providing a visualisation tool for soft tissue outcome prediction in OGT. These results are based on a limited number of clinical studies. Further power-adjusted RCTs, with uniform methodologies related to the use of AI models and study outcomes, are needed to justify whether AI can be used as an alternative to traditional doctor-based treatment planning.
Supplemental Material
sj-docx-1-joo-10.1177_14653125231203743 – Supplemental material for Artificial intelligence for treatment planning and soft tissue outcome prediction of orthognathic treatment: A systematic review
Supplemental material, sj-docx-1-joo-10.1177_14653125231203743 for Artificial intelligence for treatment planning and soft tissue outcome prediction of orthognathic treatment: A systematic review by Daisy Salazar, Paul Emile Rossouw, Fawad Javed and Dimitrios Michelogiannakis in Journal of Orthodontics
Footnotes
Acknowledgements
The authors thank Rachel Becker who assisted in the development of the search strategy for the electronic databases used in this systematic review.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
