Abstract
Objective:
Early diagnosis of autism spectrum disorder (ASD) can facilitate timely intervention and improved developmental outcomes, but many children wait over a year for ASD evaluation. Telehealth assessments by pediatricians may address geographic and workforce barriers for children awaiting ASD evaluation. The goal of this study was to assess the reliability, accuracy, and timeliness of ASD telehealth evaluations by pediatricians. We hypothesized this approach would demonstrate reliability with specialist evaluation, concordance with in-person diagnosis, and wait-time reductions.
Methods:
Thirty-two pediatricians received standardized training, completed fidelity/reliability testing, and utilized a novel evaluation model employing validated screening, parent-interview, and child observation tools. Interrater reliability of 32 pediatricians who administered 200 telehealth cases was assessed relative to blinded clinical supervisors. The diagnostic accuracy of 27 pediatricians who watched 19 video-recorded telehealth cases (494 total assessments) was compared to in-person diagnosis for the 19 children. Wait time for a telehealth assessment was compared to wait time for an in-person assessment for 2,483 children.
Results:
There was 91.5% scoring agreement (95% CI: 0.893–0.932) between pediatricians and clinical supervisors (p < 0.001, K = 0.206) on 200 cases. On 19 video-recorded cases, there was 93.5% accuracy (PPV = 0.957; NPV = 0.811) between the in-person diagnosis and the 494 diagnoses provided by 27 pediatricians. The average wait time for 2,483 children who received telehealth evaluations (11.7 days) was shorter (W = 3.08 × 106, p < 0.001) than the average in-person wait time (11.8 months).
Conclusions:
These results demonstrate that telehealth evaluations by general pediatricians offer a reliable, accurate, and timely approach to ASD assessment through a scalable, nationwide model.
Introduction
Autism spectrum disorder (ASD) is a complex neurodevelopmental diagnosis, characterized by difficulties with social interactions and repetitive behaviors or interests. 1 The Centers for Disease Control and Prevention (CDC) estimates that about 1 in 31 children in the United States (U.S.) will receive a diagnosis of ASD. 2 While specialists can reliably diagnose ASD as early as 14 months and pediatricians screen children for ASD early in development, the average age of ASD diagnosis in the U.S. remains 4.5 years. 3 Early intervention is crucial for improving outcomes in autistic individuals. Research has shown that children who start treatment before age 4 can experience improved cognition, language, and social abilities. 4 Despite this evidence, there is still a delay in diagnosis, with a report from 2023 indicating that patients across the nation are waiting over 2 years from screening to a final diagnosis.3,5
Several barriers contribute to delayed diagnosis, including a limited number of specialists, geographic barriers to accessing in-person appointments, increased prevalence and screening practices contributing to high demand for evaluations, a traditionally lengthy evaluation process, and long clinic waitlists for an appointment postpandemic. The U.S. faces a scarcity of developmental-behavioral pediatricians, with an estimated ratio of one specialist for every 100,000 children, leaving certain states devoid of such expertise. 6 Furthermore, approximately 84% of counties across the nation lack diagnostic facilities, forcing families to travel considerable distances to the nearest clinic. 7 Traditional evaluations involve multidisciplinary assessments that can last 4 h, placing significant burdens on families, particularly those traveling from remote areas.
One effective strategy to address the lengthy duration of traditional evaluations is the adoption of streamlined assessments performed through a telehealth platform by pediatricians. Recent research highlights the reliability of telehealth evaluations in detecting ASD, presenting a viable alternative to in-person diagnosis.8–10 This approach also addresses geographic disparities by connecting families from the 84% of counties without a diagnostic facility directly to a virtual specialty clinic. Studies show that with additional training, pediatricians can accurately identify children with ASD, helping improve early diagnosis and access to care, especially in underserved areas.11–14
The objective of this study was to assess the efficacy of a telehealth model utilizing pediatricians with specialized ASD diagnostic training to perform remote ASD assessments. We hypothesized that pediatricians would accurately diagnose ASD while reducing wait times for ASD evaluation on a national scale. These hypotheses were tested with a prospective study involving 14,175 children evaluated for ASD by 32 pediatricians over a 21-month period.
Methods
STUDY POPULATION
This prospective study included 14,175 children, ages 16 months to 10 years 11 months, evaluated for suspected ASD by 32 board-certified pediatricians from August 23, 2022, to May 20, 2024, using a novel telehealth paradigm. Participants were included if a parent consented at intake and spoke English or Spanish, as the measures were validated only in those languages.
PEDIATRICIAN TRAINING
Pediatricians received ∼120 h of training modules on ASD symptomatology and assessment. Each module consisted of 1) video lectures by clinical psychologists, developmental pediatricians, and ASD researchers; and 2) written content adapted from the CDC, Extension of Community Health Outcomes Autism, Autism Speaks, and peer-reviewed literature. Before beginning clinical assessments on the telehealth platform, pediatricians were required to pass fidelity testing by successfully administering the TELE-ASD-PEDS (TAP), Childhood Autism Rating Scale, Second Edition Standard version (CARS-2-ST), and High Functioning version (CARS-2-HF) on three practice patients. They were also required to pass reliability testing on each observation tool by successfully scoring 3 video-recorded cases within 1.5 points of a clinical supervisor. Clinical supervisors included a developmental pediatrician, clinical psychologists, and a general pediatrician with extensive training and experience administering various ASD-specific assessments. Continuing education involved weekly 30-min lectures by clinical supervisors and national experts in ASD. Each pediatrician received feedback on clinical technique and diagnostic accuracy from clinical supervisors for approximately 10 cases in the first 3 months. Quarterly quality metrics, including visit length and diagnostic rate, monitored observation time and potential diagnostic bias.
DIAGNOSTIC PROCESS
Caregivers completed a Modified Checklist for Autism in Toddlers-Revised or Autism Symptom Dimensions Questionnaire (ASDQ) and provided information about their child’s medical, developmental, family, and social history during the intake process. Each child completed three recorded visits: (1) an initial visit involving a DSM-5 caregiver interview; (2) an observation visit involving the TAP (age 16–35 months) or CARS-2 (ages 3 years to 10 years, 11 months); and (3) a results visit, where the pediatrician explained the diagnosis and made recommendations for support services. Forty-five minutes were allotted for each visit. Differential diagnoses were supported by a virtual IPM, Autism Analytica, which collected and analyzed up to nine standardized caregiver-reported questionnaires targeting executive functioning, adaptive skills, motor skills, attention difficulties, anxiety, challenging behaviors, sleep difficulties, mood, and family quality of life. Autism Analytica was utilized to increase data collection efficiency and to gather additional objective data. Diagnostic support was provided by a machine-learning algorithm estimating ASD probability from demographic, screening, and observational data. Pediatricians had access to peer review for any case with unclear probability (40–60%). Cases with diagnostic disagreement between pediatricians were reviewed by a clinical supervisor. Inconclusive cases were referred for in-person assessment.
STUDY OUTCOMES AND PROCEDURES
The primary outcome for assessing reliability was the total score on the TAP or CARS-2. A subset of 200 children was randomly selected for reliability analysis (n = 200/14,175; 1.4%). Scores were assigned by 32 pediatricians who had completed training and assessed for interrater reliability relative to blinded scores from a clinical supervisor (developmental pediatrician (n = 78), clinical psychologist #1 (n = 18), clinical psychologist #2 (n = 23), or general pediatrician with 8 years of diagnostic experience (n = 81).
The primary outcome for assessing diagnostic accuracy was assignment of ICD-10 ASD diagnosis (F84.0), based on DSM-5-TR criteria, for 494 assessments. Children whose parents consented and uploaded medical records of previous in-person evaluations by a developmental specialist using a standardized assessment (e.g., Autism Diagnostic Observation Schedule [ADOS], CARS-2, Autism Diagnostic Interview, Revised) were included. Telehealth visits were recorded. Clinical supervisors initially reviewed the telehealth evaluations and determined the presence or absence of a DSM-5-TR diagnosis while blinded to the in-person or telehealth diagnosis. Their ratings were consistent with the in-person evaluations’ diagnostic findings for 19 cases. Pediatricians (n = 27) reviewed video-recorded telehealth visits for 19 cases and determined the presence or absence of DSM-5-TR diagnostic criteria using child history, ASDQ screening scores, and DSM-5 interview results while blinded to both in-person and telehealth diagnoses. Diagnostic agreement (27 pediatricians × 19 cases) was calculated to evaluate telehealth diagnostic accuracy.
Finally, wait time reduction was evaluated by comparing telehealth appointment timing with caregiver-reported wait times for in-person evaluations (n = 2,483/5,191; 47.8%).Wait time from referral to telehealth assessment was abstracted from the medical record and compared to parent-reported wait time from referral to in-person evaluation.
DATA COLLECTION
Demographic and clinical data were extracted from the electronic medical record, including child age, sex, race, ethnicity, insurance status, screening scores, observational assessment scores, ASD (F84.0) diagnosis, ASD severity level, physician confidence score (0–100%), and dates of intake, initial visit, and results visit. Pediatricians self-reported their sex, age, race, ethnicity, clinical degree, years in practice, and months working on the telehealth platform at the time of the diagnostic accuracy portion of the study.
STATISTICS
Interrater reliability was assessed using Light’s Kappa. Agreement between pediatricians and clinical supervisors on total scores and test item scores for each observation tool (i.e., TAP, CARS-2-ST, CARS-2-HF) was reported using an intraclass correlation coefficient with 95% confidence interval, based on 100-fold bootstrapping. Two multivariable linear regressions were used to assess the relationship between child factors (age, sex, race, ethnicity, insurance status, and ASD diagnosis) on rater reliability and clinician factors (observation tool, supervising physician, and general pediatrician self-reported confidence) on rater reliability. Wait time differences were assessed using within-subject nonparametric tests.
Results
PARTICIPATING GENERAL PEDIATRICIANS
Pediatricians had a mean age of 43 (±7) years. Most were female (84.4%, 27/32), non-Hispanic (83.3%, 25/30), and had an allopathic medical degree (71.9%, 23/32). There were similar proportions of Asian (9/30, 30.0%), Black (7/30, 23.3%), and White (14/30, 46.7%) pediatricians. On average, pediatricians had 13 (±6) years of clinical experience and 8 (±5) months performing telehealth ASD evaluations.
INTERRATER RELIABILITY
There were 200 telehealth observation visits completed by a pediatrician, reviewed by a clinical supervisor, and assessed for rater reliability; 64 employed the TAP, 86 employed the CARS-2-ST, and 50 employed the CARS-2-HF. The 200 children in these cases had an average age of 47 months (IQR = 44), were mostly male (136/200, 68%), White (98/186, 52.7%), and non-Hispanic (126/187, 67.4%) with public insurance (151/200, 75.5%). The 200 children resided across 26 states, and 64% received a diagnosis of ASD (128/200). There was 91.5% agreement (95% CI: 0.893–0.932) between pediatricians and clinical supervisors (p < 0.001, K = 0.206) ( Fig. 1 ). Agreement was similar for total scores on the TAP (0.917, 95% CI: 0.881–0.958, K = 0.257, p < 0.001), the CARS-2-ST (0.922; 95% CI: 0.861–0.983; K = 0.233, p < 0.001), and the CARS-2-HF (0.879; 95% CI: 0.802–0.973; K = 0.064, p = 0.023). There was significant agreement (p < 0.05) for all test items on the TAP, CARS-2-ST, and CARS-2-HF ( Table 1 ). Agreement was highest on the test item “relating to people” on the CARS-2-ST (0.897, 95% CI: 0.841–0.952) and lowest for “object use” on the CARS-2°-HF (0.701, 95% CI: 0.539–0.883). A multivariable linear regression showed that rater agreement on the total score was not associated with child factors (adjusted R2 = 0.013, p = 0.313). There was no effect of ASD diagnosis (F = 1.53, p = 0.21), child age (F = 1.43, p = 0.23), child sex (F = 0.25, p = 0.61), race (F = 1.04, p = 0.39), ethnicity (F = 2.28, p = 0.081), or insurance (F = 0.56, p = 0.75). Similarly, rater agreement on the total score was not associated with clinician factors (adjusted R2 = 0.013, p = 0.19). Observation tool (i.e., CARS-2 vs. TAP; F = 0.75, p = 0.52), supervising clinician type (i.e., developmental pediatrician vs. clinical psychologist; F = 2.61, p = 0.076), and pediatrician self-reported confidence level (F = 1.39, p = 0.23) had no effect on rater reliability.

Reliability of general pediatricians using autism observational tool via telehealth. The line-of-identity plot displays the relationship between scores provided by 32 general pediatricians and scores provided a clinical supervisor using the TELE-ASD-PEDS or the Childhood Autism Rating Scale, Second Edition for 200 cases. Scores of general pediatricians displayed 91.5% agreement (95% CI: 0.893–0.932) with clinical supervisors (p < 0.001, K = 0.206). The red dotted line represents plot slope (Est. = 1.07; 95% CI: 0.98–1.16).
Subscale Rater Reliability
Interrater reliability between 32 general pediatricians and four clinical supervisors (2 child psychologists, 1 developmental pediatrician, 1 general pediatrician) on the TELE-ASD-PEDS (TAP; n = 64) and the Childhood Autism Rating Scale, Second Edition, based on patient observation (CARS-2), standard version (ST; n = 86) and high functioning version (HF; n = 50).
DIAGNOSTIC ACCURACY
Of the 19 children who received both an in-person and a telehealth assessment, 16 (84.2%) received an in-person ASD diagnosis. Most in-person evaluations were performed by a clinical psychologist (13/19; 68.4%), and most used the ADOS-2 (12/19; 63.1%). Eleven children received a severity level with the ASD diagnosis (two level 1, three level 2, and six level 3). The children had an average age of 5.5 (±2.3) years, and 14 (73.6%) were male. Telehealth assessments occurred, on average, 2.3 (±1.7) years after the in-person assessment. The 27 pediatricians who watched the 19 video-recorded cases provided 494 diagnostic assessments and made an ASD diagnosis 85.0% of the time (420/494). They displayed 96.6% sensitivity, 76.9% specificity, and 93.5% accuracy, with a positive predictive value of 95.7% and a negative predictive value of 81.1%. A multivariable logistic regression showed that diagnostic accuracy was associated with pediatrician factors (adjusted R2 = 0.100, p < 0.001, AIC = 226). There was no effect of pediatrician degree (X2 = 5.05, p = 0.080), years of practice (X2 = 0.33, p = 0.56), months spent providing full-time ASD telehealth assessments (X2 = 3.06, p = 0.080), or observational assessment tool (X2 = 1.07, p = 0.30) on accuracy. However, pediatrician confidence in each case was inversely related to diagnostic accuracy (X2 = 19.6, p < 0.001).
WAIT TIME REDUCTION
The 3,914 families indicated their child was on a waitlist for an in-person ASD evaluation at the time of telehealth registration. Four hundred and twenty-seven (10.9%) families provided subjective descriptions, such as “They never gave me a date for my appt, said the waitlist was over a year and they would call to schedule,” or “We are on a waitlist to be scheduled.” An additional 2,483 (63.4%) families provided a numerical wait time for their child’s in-person appointment. The 2,483 children had an average age of 55 (±28) months, were mostly male (1,697/2,470, 68.7%), White (1,491/2,483, 60.0%), and non-Hispanic (1,689/2,483, 68.0%) with public insurance (1,688/2,483, 67.9%). They resided across 47 states, and 61% (1,162/1,905) received a diagnosis of ASD via telehealth. The average wait for an in-person appointment was 11.8 months (SEM = 0.15). In comparison, those 2,483 children had their initial evaluation appointment through telehealth in 11.7 days (SEM = 0.15), a significant reduction (W = 3.08 × 106, p < 0.001, Est = 1.0). The three-visit process was completed in a median time of 23 days. A linear regression analysis showed that wait time for in-person evaluation was associated with age (F = 30.59, p < 0.01) and state (F = 5.57, p < 0.01), but not sex (F = 0.11, p = 0.73), race (F = 0.65, p = 0.65), ethnicity (F = 0.59, p = 0.55), or insurance (F = 0.42, p = 0.86). The model accounted for 8.1% of variance in the data (F = 4.89, p < 0.01). There was a direct relationship between child age and length of in-person wait time (Est = 13.10, 95% CI: 8.4–17.7). States with the longest wait times included Illinois (758 days ± 120), New Mexico (633 days ± 153), Washington (570 days ± 49), South Carolina (562 days ± 54), Minnesota (527 days ± 153), and Vermont (521 days ± 91). Wait times for a telehealth evaluation varied by state (F = 5.6, p <0.001) but no other factors (p > 0.05). The longest telehealth wait times were for patients in New Hampshire (21 days ± 6), Iowa (20 days ± 8), Kentucky (17 days ± 10), Vermont (16 days ± 8), Montana (16 days ± 9), and Maine (15 days ± 6).
Discussion
The results of this study demonstrate that pediatricians can conduct virtual ASD assessments with adequate reliability and diagnostic accuracy using structured telehealth tools. Consistent with prior research on in-person assessments, pediatricians were more accurate at ruling in ASD than ruling out ASD. 10 Notably, diagnostic accuracy was not associated with years of clinical experience or months spent performing full-time ASD assessments, suggesting that this training model may be scalable across a broad pediatric workforce. Particularly reassuring was the inverse relationship between diagnostic confidence and accuracy, indicating that pediatricians appropriately identified cases requiring referral for multidisciplinary evaluation.
Findings also support the premise that virtual ASD assessments may promote health care equity.15–16 Among the 2,483 children awaiting in-person evaluations, nearly half were from racial or ethnic groups that have historically experienced disparities in ASD care, and over two-thirds were publicly insured.17–18 Reliability and reductions in wait time were comparable across racial, ethnic, and insurance groups. A key finding of this study was that virtual assessment reduced wait times by nearly 1 year. On average, children were evaluated within 2 weeks and received a diagnosis within 1 month. These reductions were observed across states, despite substantial geographic variability in traditional in-person wait times. Importantly, wait times remained stable over the study period even as 32 pediatricians were trained and monthly evaluation volume expanded from 176 children per month to 1,792 children per month, supporting the scalability of this training and diagnostic model.
Although reliability was not broadly associated with child age, pediatrician agreement was slightly lower for older children assessed using the CARS2-HF (ages 6–10). This may reflect the greater clinical complexity and nuanced developmental histories often present in older children. Accordingly, while virtual pediatrician-led assessments appear effective across a broad age range, this model may not be equally optimal for all developmental stages or clinical presentations.
Limitations
Several limitations warrant consideration. Participants were recruited through a self-selected, help-seeking sample, which may introduce volunteer bias and limit generalizability to broader populations. However, this approach enhances ecological validity by reflecting families actively seeking diagnostic services.
The diagnostic tools used in this study (TAP and CARS-2) are not universally recognized by payers, and low reimbursement rates for telehealth services may threaten the sustainability of this care model. 19 Certain states, such as Ohio, continue to question the validity of diagnoses by pediatricians, despite mounting evidence that pediatricians may accurately recognize ASD.11–13 Policy and reimbursement alignment will be essential for widespread implementation. Additionally, assessments are currently validated only in English and Spanish, limiting accessibility for families who speak other languages.
The feasibility of the training model is another consideration. Although the required 120 h of training represents a substantial commitment, it remains shorter than subspecialty training pathways and may be feasible for pediatricians seeking diagnostic specialization. Integration into clinical practice may require restructuring appointment scheduling, as evaluation length exceeds typical pediatric visits.
Methodologically, in-person wait times were self-reported and may underestimate true delays. Over 1,000 families who reported their child was on an in-person wait list were still awaiting an appointment date or were unaware of their appointment date. Therefore, it is likely that these results underestimate in-person wait times, as recent studies have estimated they may exceed 24 months. 3
Additional research is needed to evaluate long-term diagnostic outcomes, applicability across diverse linguistic and cultural populations, feasibility of large-scale training implementation, and sustainability within current reimbursement structures.
Conclusion
This study provides a scalable blueprint for reducing wait times for childhood ASD evaluation. A model that employs pediatricians and provides them with rigorous training and diagnostic support has the ability to yield reliable assessments and accurate ASD diagnoses while reducing wait times nationally. Such a model does not aim to replace developmental specialists but compliment their services while improving wait times for many families seeking diagnostic clarification. By leveraging telehealth to reach rural and underserved communities, pediatricians can identify children with overt ASD symptoms, allowing development specialists to focus on complex patients who require multidisciplinary care.
Ethical Considerations
This study was approved by the WCG Institutional Review Board (WCG IRB) under the study ID: #20223713. The research involved pediatric participants, and written informed consent was obtained from each participant’s parent or legal guardian prior to the initiation of any study procedures and data collection. All study procedures were conducted in accordance with applicable ethical standards and regulations governing research involving human participants.
Authors’ Contributions
The authors each confirm and accept their status as authors based on the following criteria: (1) substantial contributions to the conception or design of the study; or the acquisition, analysis, or interpretation of data for the study; (2) drafting the article or revising it critically for important intellectual content; (3) final approval of the version to be published; and (4) agreement to be accountable for all aspects of the study in ensuring that questions related to the accuracy or integrity of any part of the study are appropriately investigated and resolved.
Footnotes
Acknowledgments
The authors thank Corinna Rea, MD, MPH for research support.
Author Disclosure Statement
All authors have completed and submitted the ICMJE disclosures form. The authors have made the following disclosures: K.M., N.T.N., and S.H. are former employees of As You Are. T.B. is currently the Chief Medical Officer at As You Are.
Funding Information
No funding was provided for this study. The study was performed as part of the authors’ regular employment duties. No competing financial interests exist.
