Abstract
Objective
Presurgical infant orthopedics (PSIO) is used to optimize anatomical outcomes in infants with cleft lip and palate to facilitate favorable surgical results. However, standardized, reliable tools to assess PSIO effectiveness are lacking due to phenotypic variability and diverse treatment protocols. This study aimed to develop and perform a preliminary reliability assessment of a novel phenotype-based clinical outcome assessment tool, the PSIO Assessment Tool (PAT), to assess PSIO-related morphological changes in unilateral and bilateral cleft lip with or without palate.vd
Design
Tool Development and Reliability Assessment Study.
Setting
Multicentre expert consensus involving craniofacial orthodontists from diverse global regions.
Participants
Standardized pretreatment and post-treatment clinical cases of unilateral and bilateral cleft lip and alveolus with or without palate were used for calibration and reliability assessment.
Intervention
A panel of 7 expert craniofacial orthodontists collaboratively developed the PAT through iterative calibration using clinical cases. The tool assesses cleft severity pretreatment and morphological correction post-treatment, grading cleft width, nasal symmetry, and alveolar alignment.
Main Outcome Measure(s)
Reliability was evaluated via inter-rater and intrarater agreement using Fleiss κ and quadratic weighted Cohen's κ statistics.
Results
The PAT demonstrated good preliminary inter-rater and intrarater reliability, with inter-rater Fleiss’ κ of 0.83 for pretreatment grading and weighted Cohen's κ of 0.75 for post-treatment grading. Intrarater reliability was substantial to almost perfect (κ = 0.70-0.81).
Conclusions
The PAT demonstrated encouraging preliminary reliability among experienced craniofacial orthodontists evaluating standardized PSIO records. Further multicenter studies are needed to establish broader validity and clinical applicability.
Keywords
Introduction
Comprehensive and interdisciplinary care is essential in managing cleft lip and palate (CLP), given its complex impact on feeding, breathing, speech, dentofacial growth, and psychosocial development.1–3
CLP presents a spectrum of phenotypes, ranging from microforms to complete clefts. 4 A Unilateral cleft can be complete or incomplete, with the latter sometimes appearing as a Simonart's band.4,5 Associated deformities include distortion of the vermilion, rotation of Cupid's bow and philtrum toward the noncleft side, lateral and posterior displacement of the alar cartilage, a depressed and deviated nasal tip, and a shortened, deviated columella. The nostril on the cleft side often has a more horizontal orientation. The nasal cartilage may or may not be deficient.2,4,5 However, bilateral cleft lip and palate, presents additional reconstructive challenges. The premaxilla is separated from the palatine processes, and the prolabium lacks defined philtral ridges, Cupid's bow, and orbicularis oris muscle fibers. The nasal cartilage is displaced laterally, resulting in a flattened nasal tip and a markedly shortened columella.2,4,5
Presurgical infant orthopedics (PSIO) has long been used to optimize cleft anatomy and facilitate surgical outcomes. McNeil 6 first introduced passive acrylic plates, followed by Latham et al's 7 active appliance utilizing surgically anchored pins. In 1993, Grayson developed nasoalveolar molding (NAM), integrating nasal stents with passive alveolar plates to mold both structures simultaneously. 8 PSIO techniques have continuously evolved, including the use of nasal elevators, such as DynaCleft or steri-strips, which use elastic tension to apply light pressure and mold the alveolar ridge and nasal cartilage preoperatively. 9 Mejia et al 10 introduced the Presurgical Orthopedic Appliance, and most recently, the 3-dimensional (3D)-printed Rhinoplasty Appliance System was developed for early nasal molding. 11 Advances in digital technology have enabled the development of digital NAM and OrthoAligner NAM, offering improved precision and efficiency. 12
Although over 100 PSIO protocols have been described, outcome measures used to evaluate their effectiveness remain highly variable and heterogeneous.13,14 The evaluation of clinical outcomes, through standardized protocols and auditing, is essential for improving care and enabling meaningful comparisons among providers and centers.13,14
Several approaches have been proposed for comprehensive CLP evaluation, ranging from standardized frameworks such as the International Consortium for Health Outcomes Measurement to broader outcome measures including patient-reported outcome measures and expert or layperson aesthetic ratings.15,16 However, many of these methods rely on subjective impressions, which may lack consistency and reproducibility. Reliable outcome assessment tools must be both valid and reproducible. However, many existing cleft outcome frameworks are designed for broader interdisciplinary or long-term outcome assessment and do not specifically provide a simple, phenotype-based method for standardized evaluation of PSIO-related morphological changes during the presurgical infant stage. While reliability testing is relatively straightforward, validity requires consensus from expert panels, especially in the absence of the gold standard. An ideal tool should be practical, quick to administer, and not require specialized training or equipment. 13
Despite efforts toward international standardization of outcome measures in CLP, this goal is challenged by the wide phenotypic variability and diversity in treatment approaches. 14 These limitations hinder consistent and reproducible assessment of morphological changes associated with PSIO treatment. Given the absence of universally accepted objective outcome measures for PSIO and the ongoing variability in treatment approaches, there remains a need for practical and reproducible clinical assessment frameworks capable of documenting observable morphological changes in a standardized manner. Additionally, the long-term effectiveness and stability of PSIO-related outcomes remain subjects of ongoing debate within cleft care literature. Hence, this study aimed to develop and preliminarily assess the reliability of a novel tool to assess morphological outcomes associated with PSIO treatment based on cleft phenotype. We hypothesized that experienced cleft orthodontists would demonstrate acceptable inter-rater and intrarater agreement when applying a standardized phenotype-based assessment framework to evaluate cleft severity and PSIO-related morphological changes.
Materials and Method
Study Design
The present study was designed to develop and perform a preliminary reliability assessment a clinical outcome assessment tool in PSIO treatment in infants born with unilateral and bilateral cleft lip and alveolus, with or without cleft palate.
Ethical Considerations
The data were collected from Smile Train Express Records. All patients’ legal guardians had previously provided informed consent for clinical documentation and use of de-identified records for research and educational purposes according to institutional and Smile Train documentation protocols. Only anonymized and de-identified photographic records were shared with the expert panel for evaluation, without any patient identifiers or treatment center information. Data sharing across participating international collaborators was conducted using a secure de-identified records in accordance with applicable institutional ethical standards and principles governing research involving human participants. The study was approved by the Institutional Ethical Committee (Ref No.: MRIIRS/MRDC/SDS/IEC/2024/129).
Data Source and Image Standardization
For each patient, standardized 2-dimensional (2D) clinical photographs were available both before and after PSIO treatment. These included extra-oral frontal and basal nasal views, as well as intraoral maxillary occlusal views, obtained as part of routine Smile Train documentation protocols. All images were captured following Smile Train's standardized clinical photography guidelines to ensure consistency in head position, lighting, magnification, and framing. Only cases with complete and adequate-quality pretreatment and post-treatment image sets were included for evaluation. To minimize assessment bias, all images were anonymized and de-identified prior to distribution. The images were presented in a standardized format and sequence, without any patient identifiers, treatment center information, or timing cues, allowing blinded evaluation by the expert panel.
Working Group and Sample
The working group comprised of 7 craniofacial orthodontists (AF, MD, PB, MM, TC, JP and RHL), members of a Smile Train Global Orthodontics Advisory Group, with experience in CLP treatment in different global regions (United States, India, Brazil, Mexico and Philippines). All participating raters had substantial clinical experience in cleft orthodontics and PSIO management, with involvement in multidisciplinary cleft care programs and international cleft initiatives. Pretreatment records were obtained within the first month after birth, and post-treatment outcomes were evaluated immediately after completion of PSIO and prior to any primary lip repair, ensuring that all assessments reflected PSIO-related changes only, without surgical influence. The primary goal of the working group was to develop and perform a preliminary reliability assessment of an orthodontic treatment outcomes assessment tool of PSIO treatment considering cleft type and severity. Although 7 experts participated in the development and calibration phases of the expert-based orthodontic assessment tool, only 6 evaluators completed all rating rounds and were therefore included in the final reliability analysis. The development and reliability datasets included both unilateral and bilateral cleft cases representing varying severities of deformity.
Case Selection
Cases were retrospectively selected from the Smile Train database. Inclusion criteria included: (1) infants with unilateral and bilateral cleft lip and alveolus with or without palate undergoing PSIO, (2) availability of complete standardized pretreatment and post-treatment photographic records, and (3) adequate image quality for evaluation. Exclusion criteria included syndromic clefts, incomplete records, prior surgical intervention, and poor-quality or nonstandardized photographs.
Development of the PSIO Assessment Tool
Conceptual Basis for Tool Development
The initial framework for PSIO assessment tool (PAT) was informed by clinical experience, literature describing common cleft morphological characteristics evaluated during PSIO, and a previously developed unpublished internal grading framework created by AF. This unpublished framework served only as a conceptual starting reference and underwent substantial modification during expert consensus discussions.
Parameters were selected based on their clinical relevance, visibility on standardized 2D photographs, and frequent use in routine PSIO assessment. Angular and dimensional thresholds were intended as visual clinical reference guides rather than precise photogrammetric measurements. No digital angular measurement software was used during scoring. The proposed angular and dimensional references were used only as visual clinical guides to support ordinal grading rather than as direct quantitative measurements obtained from photographs. Accordingly, formal measurement error analysis for photogrammetric landmark identification was not performed. Instead, reproducibility of the assessment framework was evaluated through inter-rater and intrarater reliability testing. These included cleft width, nasal symmetry, columella deviation, alar cartilage displacement, premaxillary position, and alveolar alignment. Parameters requiring advanced imaging or difficult reproducibility in retrospective photographic records were not included.
The process involved 2 main phases: Development of Assessment Tool and reliability analysis.
Phase 1: Development of the Assessment Tool
In the first phase, a panel of experts was convened to develop a tool. During the initial meeting, the group discussed and defined the key parameters to be included in the tool, using illustrative clinical cases as references. The parameters included in the tool were divided into pretreatment grade (Depending on the severity of the deformity: Mild, Moderate, and Severe) and post-treatment grade (Grades 1-3 Depending on the quality of outcomes based on pretreatment to post-treatment changes), in both unilateral and bilateral cleft lip and alveolus with or without palate. The development process was planned and discussed through a series of in-person and online meetings. Following this, 15 cases were randomly selected from a data bank, and the experts were asked to rate the pre treatment and post-treatment grade, based on the tool.
The finalized PAT comprised 2 components—a pretreatment grade assessing the initial cleft severity and a post-treatment grade evaluating morphological correction following PSIO. In unilateral cleft cases, the pretreatment grade included parameters such as cleft width, columella angle, and alar cartilage displacement, while in bilateral cases, it included cleft width, premaxillary deviation, nasolabial angle, alar cartilage displacement, and nasal tip projection. The post-treatment grade assessed improvements in alveolar alignment, nasal symmetry, and soft tissue contour. Each criterion was graded on a 3-point scale (1-3) corresponding to the percentage of correction from baseline. During the development phase, 15 unilateral and 15 bilateral cleft lip and alveolus cases with or without palate undergoing PSIO were included for expert calibration and assessment. Figure 1 illustrates the anatomical landmarks and visual reference parameters incorporated into the PAT framework for unilateral and bilateral cleft assessment. A 3-point ordinal grading system was intentionally selected to balance simplicity, reproducibility, and clinical applicability. More complex grading systems with additional categories were considered more susceptible to subjective variability, particularly when applied to retrospective 2D photographic records. The development and reliability datasets included both unilateral and bilateral cleft cases representing varying severities of deformity.

PAT parameters: (a) Width of the cleft
Calibration
Calibration was performed through structured review sessions involving representative unilateral and bilateral cleft cases with varying severities. During these sessions, the panel reviewed grading discrepancies, discussed interpretation of each parameter, and refined operational definitions to improve consistency. Consensus was achieved through iterative discussion and modification of grading descriptors.
Phase 2: Validation and Reliability Analysis
In the second phase, an additional set of 10 new clinical cases was distributed to the same panel of experts. The experts rated the pre and post-treatment grades, and the agreement was analyzed to assess an inter-rater agreement. To evaluate the intrar-rater accordance, the experts evaluated the same cases, in different orders, with an interval of 10 days (about 1 and a half weeks) between each evaluation.
Statistical Analysis
To evaluate the accordance between the experts the intrarater and inter-rater reliability during pretreatment evaluation were calculated using Fleiss κ and post-grade were calculated using quadratic weighted Cohen's κ with a 95% confidence interval, using JAMOVI version 2.3. Fleiss’ κ was used for assessing agreement among multiple raters, while quadratic weighted Cohen's κ was applied for ordinal pairwise data to account for partial agreement between categories. Although Landis and Koch 17 interpretation categories were used for consistency with prior reliability studies, their limitations and potential optimism have been acknowledged, described in Table 1.
Qualitative Interpretation of κ According to Landis and Koch, 1977.
Results
PSIO Treatment Outcome Assessment Tool
The finalized PAT was systematically organized into 2 sections: pretreatment and post-treatment grading allowing comprehensive evaluation of PSIO outcomes in unilateral and bilateral cleft cases.
Table 2 summarizes the pretreatment grading parameters used to categorize the initial cleft severity as mild, moderate, or severe. Agreement was generally higher for pretreatment severity grading than for post-treatment correction grading, likely reflecting increased subjectivity in estimating treatment-related morphological improvement. Table 3 presents the post-treatment grading criteria reflecting the degree of morphological correction achieved following PSIO, based on nasal symmetry, alveolar alignment, and soft tissue improvement. The structured scoring framework demonstrated clear differentiation among varying severities and outcomes, supporting its potential utility as a standardized descriptive assessment framework.
Presurgical Infant Orthopedics (PSIO) Pretreatment Grade for Unilateral and Bilateral Cleft lip and Alveolus with or Without Cleft Palate.
Presurgical Infant Orthopedics (PSIO) Post-treatment Grade for Unilateral and Bilateral Cleft lip and Alveolus with or Without Cleft Palate.
*For unilateral cleft, cases that have collapsed with the treatment in the alveolus are considered: severe collapse = 1, moderate collapse = 2.
**For bilateral cleft, at least 3 out of the 5 to be considered in one category. If the premaxilla is deflected due to treatment, consider one point less.
Inter-rater Reliability
Pretreatment Grade Reliability
The inter-rater reliability for pretreatment grades showed a mean κ of 0.83 (95% CI: 0.63-1.00), indicating almost perfect agreement overall. Pairwise κ values ranged from 0.67 to 1.0, with percentage agreement between 81.8% and 100%. Agreement interpretations, according to Landis and Koch criteria, ranged from substantial to almost perfect (Table 4).
Interrater Reliability for Pretreatment and Post-treatment Grades.
Pairwise κ Values, 95% Confidence Intervals (Truncated at 1.0), Percentage Agreement, and Agreement Interpretation are Shown.
Post-treatment Grade Reliability
For post-treatment grades, the mean inter-rater κ was 0.75 (95% CI: 0.52-1.00). Pairwise κ values ranged from 0.65 to 1.0, with percentage agreement between 80% and 100%, indicating substantial to almost perfect agreement (Table 4).
Intrarrater Reliability Test
Pretreatment Grade Reliability
Intrarater reliability for pretreatment grades showed a mean κ of 0.81 (95% CI: 0.57-1.00), with percentage agreement ranging from 75% to 100%. Agreement interpretations ranged from moderate to almost perfect (Table 5).
Intrarater Reliability for Pretreatment and Post-treatment Grades.
κ Values, 95% Confidence Intervals (Truncated at 1.0), Percentage Agreement, and Agreement Interpretation are Presented.
Post-treatment Grade Reliability
For post-treatment grades, the mean κ was 0.70 (95% CI: 0.60-0.80), with percentage agreement between 80% and 90.9%, indicating substantial to almost perfect agreement (Table 5).
Discussion
The absence of standardized PSIO appliances, treatment protocols and assessment tools has made it challenging to evaluate and measure treatment effectiveness in patients with unilateral and bilateral clefts of the lip with or without cleft palate. This lack of consistency has made it more challenging for cleft teams to systematically monitor outcomes and refine care strategies. The present study primarily demonstrates that experienced cleft orthodontists were able to apply the PAT with acceptable consistency when evaluating standardized retrospective photographic records. The observed inter-rater and intrarater agreement suggests that the tool may provide a reproducible framework for describing pretreatment cleft severity and post-treatment morphological changes following PSIO therapy. However, the present findings should be interpreted as preliminary reliability data rather than comprehensive validation of the tool.
The PAT was designed to provide a structured and standardized framework for documenting observable morphological changes associated with PSIO treatment in infants with CLP. The tool allows consistent assessment of features such as cleft width, alveolar alignment, nasal symmetry, columella position, and soft tissue morphology using standardized clinical photographs. Unlike advanced quantitative approaches such as 3D imaging, digital morphometric analysis, or stereophotogrammetry, the PAT was intentionally designed as a simple, clinically accessible, and low-resource assessment framework that can be readily implemented in routine cleft care settings. While the present study demonstrated encouraging reliability among experienced orthodontists, the study was not designed to evaluate the biological effectiveness or long-term clinical benefit of PSIO itself. Ongoing debate persists regarding the magnitude, stability, and durability of PSIO-related outcomes, particularly with respect to long-term facial growth, dental arch development, and functional outcomes. Accordingly, PAT should be viewed as a descriptive assessment framework rather than as evidence supporting any specific PSIO protocol or treatment approach. The PAT is therefore not intended to replace high-end quantitative analyses, but rather to complement them by providing a practical and reproducible clinical assessment tool applicable across diverse healthcare environments. Previous studies have suggested that although PSIO may improve early presurgical morphology and facilitate surgical approximation of tissues, evidence regarding sustained long-term craniofacial benefits remains inconclusive. Some authors have also questioned whether early orthopedic changes necessarily translate into meaningful long-term growth advantages, emphasizing the need for cautious interpretation of short-term treatment effects.18,19
At the same time, the ability to reliably document early morphological changes may still hold important clinical relevance. Standardized assessment of presurgical changes can assist multidisciplinary cleft teams in communication, treatment planning, longitudinal record keeping, and comparison of outcomes across institutions and treatment protocols. Furthermore, a reproducible tool such as PAT may facilitate future prospective multicenter studies investigating the relationship between early PSIO-induced morphological improvements and longer-term surgical, esthetic, functional, and growth-related outcomes. In this context, the PAT may contribute not only to short-term treatment assessment but also to the broader understanding of the long-term impact of PSIO within comprehensive cleft care.18,19
The PAT was designed to be intuitive and clinically applicable, requiring only minimal calibration for consistent use. As the included parameters, cleft width, nasal symmetry, columella angle, and alveolar alignment are routinely evaluated by clinicians trained in PSIO, the learning curve for implementing the PAT is expected to be low. A brief calibration session involving review of reference cases and consensus on grading criteria was sufficient to achieve high inter-rater and intrarater agreement among the expert panel. It is important to note that this study was not designed to quantify the biological or clinical treatment effect of PSIO, but rather to assess the preliminary reliability of a standardized, expert-based tool capable of reliably detecting and grading PSIO-related changes. The grading categories were not intended to dictate surgical technique or establish treatment thresholds, but rather to provide standardized descriptive assessment of presurgical morphology. Further studies are necessary to determine the applicability of the tool across broader clinical settings and multidisciplinary teams, including validation among nonorthodontic cleft care providers and clinicians with varying levels of PSIO experience.
Future developments in cleft outcome assessment are likely to incorporate artificial intelligence (AI), machine learning, and advanced digital imaging technologies. Recent advances in automated image analysis, 3D facial scanning, stereophotogrammetry, and AI-based morphometric assessment have demonstrated the potential to improve the objectivity, precision, and reproducibility of craniofacial evaluations.20–23 In this context, standardized frameworks such as the PAT may provide an important foundation for the development of automated or semi-automated assessment systems. The structured parameters included in the PAT could potentially be integrated with digital image analysis algorithms to facilitate objective quantification of cleft morphology, automated scoring of treatment-related changes, and large-scale multicenter outcome comparisons. Furthermore, machine learning models trained on standardized datasets may help identify predictors of treatment response and support more individualized treatment planning. 21 While such applications remain exploratory and require rigorous validation, the PAT may serve as a clinically relevant framework that complements future AI-driven approaches to cleft outcome assessment.
Limitation(s)
This study has several limitations. First, the reliability analysis was conducted on a relatively small retrospective sample without formal sample size calculation, limiting the precision and generalizability of the agreement estimates. Second, the study utilized retrospective standardized 2D photographic records, which may not fully capture the 3D complexity of cleft morphology and may introduce variability related to image quality, head positioning, and timing of image acquisition. Third, all evaluators were experienced orthodontists with cleft and PSIO expertise, and, therefore, the findings may not be generalizable to nonorthodontist clinicians or less experienced raters. Fourth, although the PAT demonstrated encouraging inter-rater and intrarater reliability, the present study did not evaluate construct validity, criterion validity, responsiveness, or correlation with objective morphometric measurements or long-term surgical and functional outcomes. In addition, several post-treatment grading criteria relied on subjective estimation of percentage correction from baseline photographs, which may introduce observer variability despite calibration efforts. The study was also limited to records obtained through a single documentation system and has not yet undergone external multicenter validation using independent datasets. Future studies with larger and more diverse samples, objective morphometric comparisons, and longitudinal follow-up are necessary to further establish the clinical utility and validity of the PAT. The interpretation of κ values using Landis and Koch criteria should be interpreted cautiously, particularly in small-sample reliability studies. Furthermore, the tool focuses primarily on morphological outcomes and does not capture functional or psychosocial impacts.
Conclusion
The PAT demonstrated encouraging preliminary inter-rater and intrarater reliability among experienced cleft orthodontists evaluating standardized retrospective photographic records. The tool may provide a structured framework for describing pretreatment cleft severity and post-treatment morphological changes following PSIO therapy in unilateral and bilateral cleft cases. However, the present findings represent an initial reliability assessment rather than comprehensive validation. Further studies involving larger and more diverse populations, external validation, objective morphometric comparisons, and correlation with surgical and functional outcomes are necessary before broader clinical implementation.
Footnotes
Ethical Statement
The ethical approval for the study was obtained from the Institutional Ethical Committee of Manav International Institute of Research and Studies(Ref No: MRIIRS/MRDC/SDS/IEC/2024/129).
Informed Consent
Not applicable.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability
Data related to this article are available from the corresponding author upon reasonable request.
