Gaze-Tracking-Based Tests for Autism in Children: A Diagnostic Test Accuracy Systematic Review and Meta-Analysis

Abstract

Atypical gaze patterns are consistently reported in autism, reflecting differences in social attention and interest. Gaze-tracking paradigms provide an objective way to quantify these differences and may serve as early indicators of autism. This diagnostic test accuracy systematic review and meta-analysis evaluated the performance of eye-tracking-based gaze measures in children. Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy (PRISMA-DTA) guidance, studies published between 2015 and 2025 that compared gaze-tracking paradigms with standardized autism diagnoses were synthesized. Pooled diagnostic odds ratio (DOR), sensitivity, and specificity were estimated using random-effects and hierarchical summary receiver operating characteristic models. Risk of bias was assessed with QUADAS-2 and funnel plots. Seventeen studies (n = 4,256) from six countries met the inclusion criteria. Tasks included social-geometric preference, motherese-nonsocial speech, and visual-orienting paradigms analyzed with rule-based or machine-learning methods. The pooled area under the hierarchical summary receiver operating characteristic curve (HSROC AUC) was 0.845; DOR 15.03 (95% CI 8.00–28.50); sensitivity 0.77 (95% CI 0.65–0.85); and specificity 0.80 (95% CI 0.75–0.84). Although heterogeneity was high (I² = 87.78%), effect directions were consistent. Dynamic social stimuli and higher-frequency tracking systems achieved the best performance. Gaze-tracking tests distinguished autistic and nonautistic children across diverse settings, supporting their potential role as a quantitative, observer-independent adjunct for early identification and clinical decision support.

Lay abstract

Autism is a form of neurodiversity characterized by differences in social communication, sensory processing, and patterns of attention and interest, which often shape how autistic people look at and interpret the world around them. Eye-tracking technology records where a person looks on a screen and how long their gaze remains on elements, such as people, faces, or objects. Because it is objective and does not rely on language or complex instructions, eye-tracking may support earlier identification of autism. This study reviewed 17 research papers published between 2015 and 2025 that explored how eye-tracking distinguishes autistic and nonautistic children. Together, these studies included over 4,000 participants and compared attention to social scenes, like people talking or playing, with attention to nonsocial or geometric patterns. On average, eye-tracking correctly identified autism about 77% of the time and nonautistic children about 80% of the time, with the best results achieved with dynamic social videos and high-quality tracking cameras. These findings suggest that gaze-based measures capture meaningful differences in social attention and could complement existing diagnostic approaches through earlier, more objective assessment.

Keywords

autism diagnostic test accuracy gaze-tracking meta-analysis systematic review

Introduction

Autism is a neurodevelopmental condition, defined by persistent difficulties in social communication and interaction across multiple contexts, alongside restricted, repetitive patterns of behaviors, interests, or activities, and atypical sensory responsivity. These are present from early developmental stages, although functional impairment may only become evident if social demands surpass adaptive capacity (American Psychiatric Association, 2022).

Clinical presentation is highly heterogeneous, complicating early recognition and delaying diagnosis (Masi et al., 2017). Reliable identification is possible from 18 to 24 months (Dawson et al., 2023), yet most children are diagnosed near age 4, with wide cross-national and sociodemographic variation (Fombonne et al., 2016; Maenner et al., 2023). Such delays are associated with reduced access to early intervention and poorer developmental outcomes (Brian et al., 2019; Lord et al., 2018).

From a public health perspective, the global prevalence of autism is estimated at around 0.7% to 1% of children, though estimates vary by region and methodological approach (Baxter et al., 2015; Zeidan et al., 2022).

Efforts to reduce diagnostic delay have driven extensive biomarker research. A systematic review identified over 900 candidate biomarkers across molecular, neurophysiological, and behavioral domains; however, none have yet demonstrated adequate clinical validity for inclusion in diagnostic algorithms (Parellada et al., 2023; Zhuang et al., 2024). Among these, gaze behavior has emerged as a promising candidate because it is noninvasive and directly linked to social attention, a hallmark of autism.

Atypical visual attention is among the earliest and most consistent features of autism. Studies show reduced fixation on eyes and faces, diminished preference for biological motion, and altered attention to social scenes, alongside heightened interest in geometric or repetitive stimuli. These gaze differences emerge within the first year of life and can predict later social and language outcomes, supporting their potential as early behavioral markers of neurodevelopmental divergence (Elsabbagh et al., 2013, 2014; Tönsing et al., 2025).

Recent studies employing dual eye-tracking during live interactions have revealed atypical behavior in autism, characterized by less frequent initiation and greater avoidance of eye contact (Tönsing et al., 2025). Large-scale consortia have reported good diagnostic performance of gaze-based metrics for clinical trial applications, due to the feasibility of valid data acquisition, verification of construct performance, and stability over 6 weeks (Shic et al., 2022). Gaze-based measures offer practical advantages over molecular or neuroimaging biomarkers: eye tracking is noninvasive, relatively brief, and well tolerated by young children (Falck-Ytter et al., 2013).

Technological advances have increased the translational potential of gaze-tracking. Traditional research-grade systems provide high spatial and temporal precision but are costly and limited to laboratory settings (Falck-Ytter et al., 2013). Gaze-tracking paradigms record visual attention to structured social and nonsocial stimuli, such as faces, scenes, or biological motion, and analyze fixation duration, gaze preference, or scanpath patterns (Papagiannopoulou et al., 2014). Derived metrics, whether rule-based or machine-learned, capture atypical social attention patterns characteristic of autism (Chita-Tegmark, 2016).

This line of research is particularly valuable given that gaze-tracking is objective, noninvasive, scalable (requiring minimal operator training), and potentially automatable (Klin et al., 2015). In contrast to conventional screening tools that rely on caregiver reports or clinical observation, gaze-tracking provides objective, quantifiable indices of social attention, thereby minimizing subjectivity and cultural bias. Understanding the classification accuracy of these tasks across contexts, technologies, and developmental stages is essential before integration into primary care workflows or utilization as a triage instrument.

Despite its promise, the literature remains fragmented, with studies differing substantially in paradigm design, measurement approaches, and analytic strategies. Reported diagnostic performance varies considerably in primary (author-selected) classification approaches, and the relative merits of distinct paradigms (e.g., faces vs. geometric shapes, static vs. dynamic stimuli) remain unclear. This heterogeneity underscores the need for a systematic synthesis of diagnostic test accuracy evidence. The present review aims to evaluate the current available evidence on gaze-tracking paradigms for classifying autism, to clarify screening and diagnostic performance across tasks, devices, and settings.

This diagnostic test accuracy systematic review and meta-analysis followed PRISMA-DTA to address the review question: among individuals aged 12 months to 18 years, how accurately do gaze-tracking paradigms classify autism compared with validated diagnostic standards? The primary objective was to synthesize diagnostic performance across tasks and settings by summarizing paradigm characteristics and estimating pooled sensitivity and specificity using hierarchical models, while assessing risk of bias (ROB) with QUADAS-2.

Methodology

The protocol was approved by the Research, Ethics, and Biosafety Committees of our institution prior to study initiation. External registration (e.g., PROSPERO) was not pursued.

Eligible studies included children and adolescents (12 months to 18 years) who completed gaze-tracking paradigms and were evaluated for diagnostic classification of autism against a validated reference standard. Accordingly, all included studies enrolled both reference-standard positive (autism) and reference-standard negative (nonautistic) participants, permitting construction of 2 × 2 classification outcomes (Table 1). Index tests comprised gaze-tracking measures derived from either handcrafted metrics (e.g., fixation duration, gaze preference, scanpath measures) or algorithmic classifiers (e.g., machine-learning models). Reference standards included validated diagnostic assessments (e.g., Autism Diagnostic Observation Schedule–2nd Edition [ADOS-2], Autism Diagnostic Interview–Revised [ADI-R]) and clinical consensus diagnosis based on Diagnostic and Statistical Manual of Mental Disorders (5th edition; DSM-5) criteria. Studies were required to report diagnostic accuracy outcomes (e.g., sensitivity/specificity, area under the curve [AUC], likelihood ratios) or provide sufficient data to derive these metrics. We included peer-reviewed English-language articles published 2015–2025 and excluded case reports, reviews, editorials, conference abstracts, and other nonpeer-reviewed records. We restricted the search to studies published from January 2015 onward to align with the DSM-5 diagnostic criteria (American Psychiatric Association, 2013), the updated generation of autism diagnostic tests, and current hardware and analytical approaches, supporting a synthesis oriented toward contemporary and future clinical translation.

Table 1.

Study-level characteristics and primary diagnostic-accuracy metrics of the 17 included gaze-tracking studies for autism in children.

Authors	Location	Design	N autism	N nonautistic	Age range	Se/Sp/AUC	Device/Task Summary	Reference Standard	Group details
Frazier et al. (2016)	United States	Single-gate cohort	40	39	3–9 y	0.80/0.82/0.89	SMI RED-m 120 Hz/RED250 60 Hz; multiparadigm battery (e.g., faces, biological vs. nonbiological, dynamic scenes).	Best-estimate clinical diagnosis (multidisciplinary consensus).	Referred for multidisciplinary autism evaluation; comparators were referred children not meeting autism criteria.
Moore et al. (2018)	United States	Single-gate cohort	76	51	12–48 mo	0.35/0.94/0.75	Tobii T120; GeoPref preferential looking (dynamic social vs. geometric).	ADOS-2 plus clinical best estimate.	Autism center cohort from pediatrician screening and self-referral; comparators included delay, typical development, siblings, and other diagnoses.
Frazier et al. (2018)	United States	Single-gate cohort	91	110	1.6–17.6 y	0.87/0.83/0.86	Remote eye-tracking battery; composite indices (e.g., ARI/symptom index) derived from multiple gaze features.	ADOS-2 (within best-estimate autism evaluation).	Consecutive tertiary referrals for autism evaluation; comparators were referred youth not meeting autism criteria.
Kou et al. (2019)	China	Two-gate case-control	34	35	2–7 y	0.79/0.82/0.87	Multiple tasks including social vs. geometric, biological motion, and toy interaction paradigms.	ADOS-2.	Clinic-recruited autism; typical-development controls recruited via kindergartens/advertising, no neurodevelopmental diagnosis reported.
He et al. (2021)	China	Two-gate case-control	50	24	5 ± 1 y	0.96/0.88/0.93	Tobii X120 120 Hz; computerized task with static social images (visual orienting/gaze-following style).	DSM-5 clinical diagnosis.	Autism group recruited from a specialized preschool; TD children age-matched, recruitment source not reported.
Cilia et al. (2021)	France	Two-gate case-control	29	30	2–8 y	0.83/0.80/0.90	SMI RED250; social-cognition images/videos; scanpath-based features with ML classification.	Clinical diagnosis supported by ADI-R and ADOS (autism group).	Autism diagnosis confirmed by clinicians using standardized tools; typical-development controls defined by parent report (records not accessed).
Jensen et al. (2021)	Peru	Two-gate case-control	28	73	36–99 mo	0.93/0.63/0.78	Low-cost webcam; GeoPref (social vs. abstract), combined classifier with M-CHAT-R (GP + M-CHAT-R).	Clinical autism diagnosis consistent with DSM-5, supported by standardized diagnostic work-up; ADOS-2 used at enrollment screening.	Autism identified through child development centers; typical-development controls from public schools/daycare, ADOS-2 screen-positive excluded.
Wen et al. (2022)	United States	Single-gate cohort	725	1,138	12–48 mo	0.33/0.84/0.76	Tobii T120; GeoPref fixation and saccade metrics (social vs. geometric).	ADOS-2 within blinded diagnostic evaluation.	Primary-care universal screening referrals; nonautistic group included typical development plus language/global delay and other diagnoses.
Al-Shaban et al. (2023)	Qatar	Two-gate case-control	144	96	3–15 y	0.70/0.72/0.73	SMI RED250; Arabic “Autism Index” using social vs. nonsocial stimuli.	ADOS-2 (autism group) and clinical characterization.	Clinic-confirmed autism; controls from primary care/research contacts, including siblings, typical development, and developmental delay.
Pierce et al. (2023)	United States	Single-gate cohort	283	147	1–4 y	0.34/0.95/0.84	Tobii T120/Pro Spectrum; motherese vs. traffic/abstract (gaze-contingent preferential looking).	ADOS-2 (blinded), within best-estimate evaluation.	Community referrals plus population screening cohort; nonautistic groups included delay, autism features, typical development, and typical siblings.
Meng et al. (2023)	China	Two-gate case-control	117	44	12–60 mo	0.77/0.75/0.81	SMI RED500; face scanning to real vs. animated faces; ML classifier from gaze features.	ADOS-2.	Recruited in two Chinese cities; autism DSM-5 plus ADOS-confirmed; controls described as typical development, recruitment source not specified.
Sun et al. (2023)	China	Two-gate case-control	32	27	2–4 y	0.90/0.79/0.77	Eye-tracking plus EEG features in a restrictive interest paradigm; multimodal classifier.	ADOS-2.	Autism recruited from a maternal-child hospital; controls from the same area screened by a developmental pediatrician to exclude neuropsychiatric conditions.
Jones et al. (2023)	United States	Single-gate cohort	221	254	16–30 mo	0.71/0.81/0.82	Automated eye-tracking measurement of social visual engagement during naturalistic social scenes.	Expert best-estimate clinical diagnosis (blinded to index test).	Consecutive referrals to specialty clinics; comparators were referred toddlers without autism diagnosis after expert evaluation.
Wang et al. (2024)	United States	Single-gate cohort	22	17	1.5–7 y	0.88/0.88/—	Tobii X3-120; AOI-based composite metrics (ASC, FAS, AVC) from social scenes/faces/objects.	Expert DSM-5 diagnosis with ADOS-2 administered (severity scoring reported).	High-risk cohort (caregiver/clinician concern or familial risk); comparators were high-risk children not meeting autism criteria.
Keehn et al. (2024)	United States	Single-gate cohort	102	44	14–48 mo	0.91/0.87/0.90	EyeLink Portable Duo; multiparadigm battery (GeoPref, gap-overlap, PLR, resting gaze, visual exploration), composite biomarker.	Blinded expert reference-standard diagnosis following standardized evaluation.	Consecutive primary care referrals via EAE hubs; comparators were referred children not meeting autism criteria after expert evaluation.
de Belen et al. (2024)	Australia	Two-gate case-control	57	17	4.6 ± 0.47 y (0.8 in autism group)	1.00/0.76/0.96	Tobii X2-60; dynamic social videos; visual-attention features with RF/DT classifiers.	DSM-5 diagnosis supported by ADOS.	Autism recruited from autism-specific early learning and hospital child-development services; controls from preschool services, no developmental diagnoses.
Sun et al. (2024)	China	Two-gate case-control	32	27	2–4 y	0.54/0.58/0.53	Multimodal ET plus pupil plus EEG plus demographic features; classifier.	ADOS-2 and CARS.	Hospital-based recruitment; controls screened to exclude psychiatric/neurologic conditions including developmental delay, recruitment channel not fully specified.

Note. n: number; TD: Typical Development; y: year, mo: months.

Sensitivity and specificity inferred from Figure 4 at Youden’s J point.

A comprehensive search was conducted across the PubMed (MEDLINE), Scopus, Web of Science, and APA PsycINFO databases. The strategy combined controlled vocabulary and free-text terms related to autism, child populations, gaze or eye tracking, diagnostic standards, and accuracy metrics. Search filters were restricted to English-language studies published since January 2015. The complete list of search queries is provided in Supplement 1.

All retrieved records were imported into ASReview (version 2.1.1), an open-source machine-learning framework for systematic reviews (Van De Schoot et al., 2021). The software was used as a collaborative screening interface and to randomize the initial screening order. All records were screened manually through a sequential review of titles and abstracts. Duplicate entries were removed prior to screening. The screened records were then analyzed by full-text eligibility assessment. Discrepancies between reviewers were resolved through discussion and consensus.

Data extraction was independently conducted by three reviewers using a standardized form, with verification by a fourth investigator to ensure accuracy and consistency. Extracted variables included study characteristics (design, sample size, age range, and setting), index test details (task type, stimulus, device model, sampling rate, calibration method, and analysis software), algorithmic approach (rule-based or machine-learning), and diagnostic performance measures (AUC, sensitivity, specificity, predictive values, and likelihood ratios). When studies reported multiple thresholds or subgroup-specific results (e.g., by sex or stimulus type), the metric representing the study’s primary or overall classification performance, typically the one designated or optimized by the authors, was selected. Each study was therefore treated as a single data point reflecting its most representative diagnostic accuracy estimate.

ROB and applicability were independently evaluated by two reviewers using the QUADAS-2 tool (Whiting et al., 2011), adapted to the context of gaze-tracking tasks for autism screening. Each study was assessed across the four domains of the tool: patient selection, index test, reference standard, and flow and timing. Discrepancies were resolved by consensus or, when necessary, through arbitration by a third reviewer. To minimize potential bias, the two reviewers responsible for this assessment did not participate in the data extraction process. Visualization of QUADAS-2 results was performed using the RobVis tool (McGuinness & Higgins, 2021).

Statistical Analysis

All data processing and descriptive analyses were conducted using Python 3.13 with pandas (v2.3.3) (The Pandas Development Team, 2025) and SciPy (v 1.16.1) (Virtanen et al., 2020) for descriptive statistics. When reconstructing the confusion matrices, the modified Haldane-Anscombe correction was applied by adding 0.5 to matrices that contained zero values, following the recommendation of Weber et al. (2020). For the random-effects meta-analysis of log diagnostic odds ratios (logDOR), study-level standard errors were derived from reconstructed 2 × 2 tables and inverse-variance weights were applied (weights shown in Figure 4).

Heterogeneity assessment, forest plots, and summary statistics were examined in JASP (version 0.95.3). The hierarchical summary receiver operating characteristic (HSROC) model was estimated in R (version 4.4.1) using the mada package (version 0.5.12) (Doebler, 2012), which employs the Reitsma bivariate random-effects framework to model sensitivity and specificity jointly. Between-study variability was summarized using τ and I² statistics, and 95% confidence and prediction intervals were calculated to describe uncertainty around pooled estimates.

Results

A total of 146 records were identified through database searches. After removal of duplicates, 105 papers were screened, resulting in 21 studies selected for full-text assessment (84/105 [80%] were excluded at title and abstract screening). Following a detailed assessment, four studies were excluded for not reporting diagnostic accuracy metrics (sensitivity, specificity, or AUC metrics) and did not provide enough data to derive them. Consequently, 17 peer-reviewed studies were included in the final qualitative and quantitative synthesis (Figure 1). We did not code mutually exclusive exclusion reasons at title/abstract screening; reasons are therefore summarized qualitatively in Figure 1.

Figure 1.

Study Selection. PRISMA 2020 flow diagram summarizing study selection (Page et al., 2021).

The combined sample across included studies comprised 2,083 autistic children and 2,173 nonautistic controls, yielding a total of 4,256 subjects and a near-balanced case:control ratio of 0.96:1, across multiple-country settings (United States, China, Qatar, Peru, Australia, and France). Participant age varied substantially across cohorts: most studies enrolled toddlers or preschoolers (mean age < 6 years), whereas Frazier et al. (2016, 2018) and Al-Shaban et al. (2023) recruited broader age ranges extending into school age and early adolescence. Most studies were published between 2021 and 2024.

Included index tests spanned recurrent gaze-tracking task families, including social-versus-nonsocial preferential-looking paradigms (including GeoPref-type designs), motherese versus nonsocial speech preference, social-scene allocation/orienting tasks, and broader or multimodal batteries (e.g., gap-overlap, pupillary responses, EEG-integrated approaches). Across paradigms, gaze-derived measures included fixation proportion/dwell-time preference indices, area-of-interest transitions, saccade features, disengagement latency, and related composite metrics. Most studies used commercial remote eye trackers (predominantly Tobii, SMI/RED, or EyeLink systems; typically 60–300 Hz) with standardized five or nine-point calibration, although lower-cost/community-oriented implementations have also been explored; Jensen et al. (2021) was the only study without explicit child-by-child calibration. Analytic strategies ranged from rule-based thresholds and ROC-derived cutoffs (including thresholds selected to maximize overall accuracy or minimize false positives) to multivariable and machine-learning classifiers (e.g., logistic regression, random forests, CNN-based scanpath/saliency models), with some studies using hold-out or cross-validation and multimodal pipelines integrating EEG or broader clinical measures (e.g., ADOS-2; Modified Checklist for Autism in Toddlers, Revised [M-CHAT-R]; Vineland Adaptive Behavior Scales [VABS]). Detailed task characteristics, dependent-variable operationalization, hardware specifications, calibration procedures, and analytic features are provided in Supplementary Tables 2 and 4.

Table 1 presents the study-level characteristics and primary diagnostic-accuracy metrics for the 17 included studies, including location, design, sample sizes (autistic/nonautistic), age range, device/task summary, reference standard, and group details. Primary (author-selected) diagnostic performance varied widely across studies, with sensitivity ranging from 0.33 to 0.96, specificity from 0.63 to 0.95, and AUC from 0.73 to 0.93.

ROB and Applicability

As shown in Figure 2 and summarized in Figure 3, across the 17 included studies, the ROB was most frequently rated as high in the domains of patient selection and index test, reflecting the widespread use of two-gate case-control designs and nonblinded interpretation of index test results. Less than half of studies employed single-cohort or population-based sampling strategies (Frazier et al., 2016, 2018; Jones et al., 2023; Keehn et al., 2024; Moore et al., 2018; Pierce et al., 2023; Wang et al., 2024; Wen et al., 2022).

Figure 2.

Risk of bias visualization. Judgments across QUADAS-2 domains: D1, patient selection; D2, index test; D3, reference standard; D4, flow and timing. Green (+) = low, yellow (−) = some concerns, red (x) = high.

Figure 3.

Risk of bias summary. Summary of risk of bias across included studies according to QUADAS-2 domains. Green (first) indicates low risk, yellow (second) some concerns, and red (third) high risk.

Concerns related to flow and timing were identified as the second cause of high ROB. Seven out of the seventeen studies were rated at high ROB, primarily due to differential verification or nonuniform application of the reference standard, most often in two-gate case-control designs or where diagnostic status was established outside the study procedures (Al-Shaban et al., 2023; Cilia et al., 2021; de Belen et al., 2024; He et al., 2021; Meng et al., 2023; Sun et al., 2023; Wen et al., 2022).

Three cohort studies were rated as having some concerns in this domain, due to incomplete reporting of participant exclusions and/or test sequencing, including pooled sequential recruitment periods (Frazier et al., 2016), unclear exclusions (Pierce et al., 2023), and limited reporting on the interval and ordering of index and reference assessments (Wang et al., 2024). The remaining 7/17 studies were judged at low risk for this domain.

The study by Wen et al. (2022) was assigned a high ROB from the domain of timing and flow since 444 subjects of their total sample came from previous studies and not from their population-based sampling (Pierce et al., 2011, 2016); moreover, a significant portion came from a 2011 study before the ADOS-2 was published (Lord et al., 2012).

Overall applicability concerns were low, as most paradigms directly addressed early detection of autism through eye-tracking–based visual engagement tasks, using clinically confirmed diagnoses (ADOS-2, DSM-5) as reference standards.

Meta-Analysis

Figure 4 presents the results of the random-effects meta-analysis, yielding a pooled log diagnostic odds ratio (logDOR) of 2.71 (95% CI: 2.08–3.35, p < 0.001), corresponding to a diagnostic odds ratio (DOR) of 15.03 (95% CI 8.00–28.50). Thus, autistic participants were, on average, over 15 times more likely to be correctly classified by gaze-tracking paradigms than typically developing controls, indicating strong discriminative ability across paradigms.

Figure 4.

Forest plot of the meta-analysis of gaze-tracking tasks for autism identification, grouped by overall risk of bias (high, orange; low, blue).

Studies were grouped according to their ROB assessment. Despite wider confidence intervals in the higher ROB subgroup, the pooled subgroup logDORs were broadly similar, suggesting limited differences in summary accuracy by ROB in this analysis.

Moderate to high heterogeneity was observed, Q(16) = 147.05, p < 0.001; I² = 87.78%, reflecting variability in sample size, stimulus type (social vs. geometric; dynamic vs. static), and analytic strategies (rule-based vs. machine learning). Despite this variability, all effects estimated pointed in the same direction.

Figure 5 displays the funnel plot evaluating publication bias and small-study effects. The distribution was largely symmetrical around the pooled effect, suggesting no substantial asymmetry or selective reporting. One study (Sun et al., 2024) appeared as a minor outlier, consistent with its visual deviation in the forest plot and its comparatively small sample size and unclear sampling strategy (Figure 4).

Figure 5.

Residual funnel plot.

A hierarchical summary receiver operating characteristic (HSROC, Figure 6) meta-analysis was conducted using the Reitsma bivariate random-effects model. As seen in Figure 6, the pooled sensitivity was 0.77 (95% CI 0.65–0.85) and the specificity of 0.80 (95% CI 0.75–0.84), indicating balanced diagnostic accuracy across studies. The summary AUC was 0.845, and the normalized partial AUC was 0.717, reflecting good overall discrimination between autistic children and typically developing children. Between-study variance was substantial (τ = 1.116 for sensitivity and 0.492 for specificity), suggesting methodological and paradigm-related heterogeneity. The positive correlation between sensitivity and false-positive rate (ρ = 0.519) denotes threshold variability among studies.

Figure 6.

Hierarchical summary receiver operating characteristic (HSROC) curve.

Finally, sensitivity analysis (Supplement 3) excluding Wen et al. (2022), which accounted for approximately 43% of the total cohort and was identified as having potential risks of bias, yielded a pooled LogOR of 2.83 (95% CI: 2.22–3.44; p < 0.001), corresponding to an OR of 16.95. This represents a minor increase in the OR from the primary analysis, with largely overlapping confidence intervals. The pooled sensitivity and specificity also remained highly consistent at 0.79 for sensitivity and 0.8 for specificity, confirming the robustness of the primary meta-analysis finding.

Among the 17 included studies, only two (11.8%) reported preregistration or trial registration (Jones et al., 2023; NCT03469986; Kou et al., 2019; NCT03286621). Only Jones et al. (2023) reported a prespecified diagnostic decision threshold, corresponding to 5.9% of the included studies.

Discussion

Gaze behavior has emerged as a promising and objective biomarker for autism. Over the past two decades, advances in eye-tracking technology have enabled precise, real-time measurement of gaze direction, pupillary responses, saccades, and scanpath dynamics. When combined with carefully designed visual or auditory paradigms, these technologies generate quantitative indices of social attention and information processing that can assist clinicians in screening and diagnostic assessment (Chita-Tegmark, 2016; Papagiannopoulou et al., 2014).

This diagnostic test accuracy review included a large and culturally diverse sample (n = 4,256), encompassing studies from North and South America, Europe, Oceania, and Asia, and found that gaze-tracking paradigms discriminated autistic from nonautistic children across diverse tasks, devices, and settings. Despite substantial methodological heterogeneity, the hierarchical summary ROC indicated strong overall performance (pooled HSROC AUC = 0.845), supporting gaze behavior as a promising objective marker for autism. Consistency of findings across cohorts suggests that gaze-based indices capture attentional features relevant to autism across multiple languages and cultural contexts, although the available evidence remains uneven across regions and settings.

Nine of the 17 included studies were rated as high ROB. The most frequent source was study design, particularly two-gate case-control sampling, which can inflate apparent diagnostic performance through spectrum and selection effects (Reitsma et al., 2023; Whiting et al., 2011). Additional concerns included incomplete or unclear blinding, partial or delayed application of reference standards, and insufficient reporting of index–reference timing, with risk of differential verification bias (Reitsma et al., 2023; Whiting et al., 2011).

Because only two studies reported preregistration and only one used a prespecified threshold, the large analytic degrees of freedom typical of eye-tracking pipelines may have contributed to optimistic accuracy estimates through post hoc operating-point selection.

These limitations do not negate the observed signal, but they reduce direct transportability of reported accuracy to routine clinical pathways; accordingly, pooled performance should be interpreted with greater weight on single-gate cohorts with uniform reference-standard application and prespecified thresholds (Reitsma et al., 2023).

Wen et al. (2022) reported a prospective single-gate cohort of 1,863 toddlers (12–48 months) identified through universal primary-care screening and assessed with the GeoPref paradigm. Using the final model proposed by the authors, classification performance was AUC = 0.76, specificity = 96%, sensitivity = 33%, and overall accuracy = 71%. The authors selected a threshold that prioritized a low rate of false positives, thus explaining the low sensitivity in light of the achieved AUC. Approximately 444 participants (~24%) were drawn from prior datasets by the same group, which may increase selection bias and raises potential duplication concerns for pooled analyses. A sensitivity analysis excluding this study confirmed consistent pooled results (Supplement 3).

Beyond ROB, heterogeneity reflected genuine variation in index-test architecture, so gaze-tracking should be interpreted as a measurement modality applied to multiple partially distinct constructs rather than a single ‘social attention’ test. Paradigms varied by stimulus domain and format (static vs. dynamic), feature construction (single handcrafted metrics vs. composite or algorithmic classifiers), device and data-quality constraints, and threshold strategy. Despite this diversity, index tests clustered into recurring families: paired-stream preferential-looking competition (including social vs. geometric preference and biological-motion contrasts), gaze allocation within socially informative scenes (ROI dwell time, transitions, anticipatory looking), speech-directed attention (percent fixation preference for motherese vs. nonsocial controls), nonsocial interest capture, and domain-general oculomotor or arousal measures (for example disengagement latency, saccade dynamics, pupil reactivity) used alone or in multifeature batteries. Eligibility was defined by diagnostic test accuracy design with extractable 2 × 2 outcomes rather than restricting paradigms a priori; paradigm-level tasks and dependent-variable operationalizations are summarized in Supplementary Table 4 (see also Table 1).

Across these architectures, reported accuracy was sensitive to operating-point selection. GeoPref-style tasks prioritize specificity over sensitivity in unselected cohorts, with sensitivity recovering when saccade features are added (Wen et al., 2022). Motherese paradigms can be parameterized similarly by applying low fixation thresholds, producing very high specificity at the expense of sensitivity (Pierce et al., 2023). In elevated-likelihood clinical cohorts, multifactorial indices that integrate dwell time, switching, and vacancy patterns can recover sensitivity without substantial loss of specificity (Wang et al., 2024). Machine-learning approaches based on scanpaths or saliency-derived features often perform well within-study, but frequent reliance on enriched sampling and internal validation motivates external validation in single-gate cohorts to establish generalizability (Cilia et al., 2021; de Belen et al., 2024).

One approach that appears closest to clinical viability is the use of gaze-based paradigms as adjuncts to established screening workflows rather than as stand-alone diagnostics. The clearest translational example is the combination of gaze preference with M-CHAT-R in community settings, which supports an incremental-value model in which questionnaire-based pretest risk is refined with an objective behavioral signal (Jensen et al., 2021). This positioning is more realistic for routine care than replacing validated screeners and is consistent with threshold strategies that prioritize specificity in constrained clinical systems (Moore et al., 2018; Pierce et al., 2023; Wen et al., 2022).

A key interpretive limitation is incomplete control for developmental and language confounding. Across studies, IQ/DQ and language were inconsistently measured and were typically reported as cohort descriptors rather than incorporated into the diagnostic decision rule, which was often a fixed gaze threshold or an eye-tracking–only classifier (Al-Shaban et al., 2023; Jensen et al., 2021; Keehn et al., 2024; Moore et al., 2018; Pierce et al., 2023; Wang et al., 2024; Wen et al., 2022 Supplement 5). Consequently, part of the observed discrimination may reflect developmental level effects on attention and task engagement rather than autism-specific signal.

Real-world differential diagnosis remains incompletely established. Most studies compared autism with typically developing controls or heterogeneous nonautistic clinical samples, and only some included developmental-delay comparators (Al-Shaban et al., 2023; Frazier et al., 2016, 2018; Moore et al., 2018; Pierce et al., 2023; Wang et al., 2024). The included studies were not designed around prespecified, clinically relevant nonautistic neurodevelopmental comparator cohorts, despite evidence that gaze differences during socioemotional processing, including reduced attention to the eye region, can be transdiagnostic and vary across conditions (Martinez-Cedillo et al., 2026).

In addition, applicability to early pediatric screening should be interpreted cautiously because the evidence base is heterogeneous across pediatric age groups, with most cohorts restricted to toddlers or preschoolers, whereas Frazier et al. (2016, 2018) and Al-Shaban et al. (2023) extended recruitment into school-age and early adolescence. Developmental context and baseline gaze ecology differ substantially across these age bands. Although this heterogeneity supports robustness of the construct across settings, it weakens direct transportability to a single clinical pathway unless age-specific models and thresholds are externally validated.

Rather than pursuing a single universal paradigm, the most plausible route to clinical readiness is a workflow-integrated marker strategy for screening. In practice, this means embedding gaze-based outputs into existing pathways (e.g., M-CHAT centered triage) to provide incremental risk stratification, not replacing standardized instruments (Jensen et al., 2021). Within that framework, operating-point selection is central: the high-specificity/low-sensitivity configurations used in GeoPref-style models are not merely a limitation but a deliberate rule-in strategy that can reduce false positives, unnecessary referrals, and downstream diagnostic burden in low-prevalence screening settings (Moore et al., 2018; Wen et al., 2022). A second requirement for equitable deployment is hardware translation. Much of the field still depends on laboratory-grade eye trackers and tightly controlled acquisition conditions, whereas broader implementation will require robust pipelines on consumer-grade platforms (tablet or webcam-based systems) with explicit quality-control, calibration, and failure-rate reporting (Jensen et al., 2021; Vargas-Cuentas et al., 2017). Under this model, high-cost physiological extensions (e.g., pupillometry-derived indices) remain valuable for mechanistic enrichment, while scalable low-cost gaze tools drive real-world screening utility. Importantly, these tools should be regarded as adjunctive to, not replacements for, standardized instruments such as the ADOS-2 or the Modified Checklist for Autism in Toddlers, Revised (M-CHAT-R).

Beyond binary classification, several studies report clinically interpretable phenotype correlations. Reduced fixation to social-affective regions was associated with greater Autism Diagnostic Observation Schedule social-affect impairment and lower adaptive functioning (Frazier et al., 2016, 2018; Kou et al., 2019; Moore et al., 2018). In contrast, higher geometric preference and restricted-interest orienting were associated with greater restricted and repetitive behavior burden (Sun et al., 2023, 2024); while child-directed speech paradigms were influenced by language level and joint-attention development (Pierce et al., 2023; Wang et al., 2024). Together with the broad distribution of percent fixation to dynamic geometric images by Wen et al. (2022), these findings are compatible with dimensional heterogeneity and possible behavioral subgroups within autism. Clinically, this supports using gaze measures as phenotypic enrichers within screening and assessment workflows; however, claims about treatment-response prediction remain preliminary and require prospective longitudinal validation with adjustment for IQ/DQ and language.

One important limitation of this work is the restrictiveness of the queries used for retrieving relevant studies, which may have missed essential studies emphasizing biomarker validation or subtype identification. One noteworthy case is the study by Wen et al. (2022), which was not initially captured by automated database queries but was later identified through manual search and included in the final analysis.

However, the most important limitation of this diagnostic test accuracy meta-analysis is a potential model-selection and outcome-extraction bias: several studies reported multiple models or operating points, and we extracted the primary or final model emphasized by authors, which may over-represent optimized configurations and contribute to the predominance of AUC values above 0.70. Despite this, extracting the study-defined primary model provides a pragmatic summary of intended use, and the stability of pooled estimates in sensitivity analyses supports a reproducible discriminative signal, while absolute accuracy should be interpreted cautiously pending prospective single-gate validation.

The present findings support gaze behavior as a clinically relevant, objective marker of attentional allocation in autism, with plausible utility as an adjunct within tiered screening and diagnostic pathways rather than a stand-alone diagnostic. Translation to routine care will depend on prospective single-gate validation in representative clinical populations, prespecified thresholds and analysis plans, transparent reporting of failure rates and data quality, and demonstration of incremental value over established screening instruments. Parallel work is needed to adapt paradigms across languages and cultural contexts and to validate performance on scalable, low-cost eye-tracking platforms, to support equitable implementation for early autism identification and timely intervention.

Conclusion

Gaze-tracking paradigms demonstrated consistent diagnostic accuracy for autism across a wide range of experimental tasks, analytic methods, and populations, including infants under 2 years of age, although translation to clinical practice requires explicit measurement and adjustment for developmental level (DQ/IQ), language, and other clinically relevant nonautistic neurodevelopmental conditions that can influence gaze behavior. Quantitative gaze metrics can capture atypical social attention and restricted-interest patterns, supporting their promise as objective behavioral biomarkers when key methodological and clinical-translation requirements are met. Despite ongoing methodological heterogeneity, the pooled evidence indicates robust discriminative performance and substantial translational potential for early detection and clinical decision support, particularly as an adjunct within established screening and assessment workflows rather than as a stand-alone diagnostic test. Future research should emphasize prospective, single-gate validation studies with prespecified and preregistered thresholds and analytic pipelines, protocol standardization, explicit reporting of data-quality constraints and failure rates on scalable hardware, and multimodal integration to advance reproducible, scalable and clinically implementable applications of gaze-tracking in autism assessment.

Supplemental Material

sj-docx-1-aut-10.1177_13623613261451896 – Supplemental material for Gaze-Tracking-Based Tests for Autism in Children: A Diagnostic Test Accuracy Systematic Review and Meta-Analysis

Supplemental material, sj-docx-1-aut-10.1177_13623613261451896 for Gaze-Tracking-Based Tests for Autism in Children: A Diagnostic Test Accuracy Systematic Review and Meta-Analysis by Delaflor-Wagner Christian Alejandro, Suárez-Cuenca Juan Antonio, Alcaraz-Estrada Sofía Lizeth, Téllez-González Mario Antonio, Coral-Vázquez Ramón Mauricio, Toledo-Lozano Christian Gabriel and García Silvia in Autism

Footnotes

Acknowledgements

No professional writing or editorial assistance was received, and the submission was prepared and submitted by the authors themselves.

ORCID iDs

Delaflor-Wagner Christian Alejandro

Suárez-Cuenca Juan Antonio

Alcaraz-Estrada Sofía Lizeth

Téllez-González Mario Antonio

Coral-Vázquez Ramón Mauricio

Toledo-Lozano Christian Gabriel

García Silvia

Ethical Considerations

The study was reviewed and approved by the Ethics and Research Committees of the National Medical Center “20 de Noviembre” (RPI code: RPI.CMN.138.2025).

Consent Publication

As this work is a systematic review and meta-analysis of previously published studies, informed consent was not required.

Author Contributions

Delaflor-Wagner Christian Alejandro: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Software; Supervision; Writing – original draft; Writing – review & editing.

Suárez-Cuenca Juan Antonio: Investigation; Writing – review & editing.

Alcaraz-Estrada Sofía Lizeth: Investigation; Writing – review & editing.

Téllez-González Mario Antonio: Investigation; Writing – review & editing.

Coral-Vázquez Ramón Mauricio: Methodology; Supervision; Writing – original draft; Writing – review & editing.

Toledo-Lozano Christian Gabriel: Investigation; Writing – review & editing.

García Silvia: Investigation; Methodology; Supervision; Writing – review & editing.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Study-level data underlying the meta-analysis are presented in Table 1, Figure 2, and . Additional information may be made available from the corresponding author upon reasonable request.

Participatory Statement

This study was a systematic review and meta-analysis of published diagnostic test accuracy studies and did not involve recruitment, data collection, or direct interaction with participants. We did not formally involve autistic people, family members, advocates, clinicians, or other autism community representatives in formulating the research question, selecting outcomes, designing eligibility criteria, extracting data, conducting analyses, interpreting results, or drafting the manuscript. Accordingly, methodological choices, analytic decisions, and reporting were made by the author team, guided by established standards for diagnostic accuracy reviews and by the information available in the included publications. We present this transparently to clarify the scope of input informing the review and to support appropriate interpretation of the findings.

Supplemental Material

Supplemental material for this article is available online.

References

*Al-Shaban

F. A.

Ghazal

Thompson

I. R.

Klingemier

E. W.

Aldosari

Al-Shammari

Al-Faraj

El-Hag

Tolefat

Ali

Nasir

Frazier

T. W.

(2023). Development and validation of an Arabic language eye-tracking paradigm for the early screening and diagnosis of autism spectrum disorders in Qatar. Autism Research, 16(12), 2291–2301. https://doi.org/10.1002/aur.3046

American Psychiatric Association (Ed.). (2013). Diagnostic and statistical manual of mental disorders: DSM-5 (5th ed.).

American Psychiatric Association (Ed.). (2022). Diagnostic and statistical manual of mental disorders: DSM-5-TR (5th ed., text revision).

Baxter

A. J.

Brugha

T. S.

Erskine

H. E.

Scheurer

R. W.

Vos

Scott

J. G.

(2015). The epidemiology and global burden of autism spectrum disorders. Psychological Medicine, 45(3), 601–613. https://doi.org/10.1017/S003329171400172X

Brian

J. A.

Zwaigenbaum

(2019). Standards of diagnostic assessment for autism spectrum disorder. Paediatrics & Child Health, 24(7), 444–451. https://doi.org/10.1093/pch/pxz117

Chita-Tegmark

(2016). Social attention in ASD: A review and meta-analysis of eye-tracking studies. Research in Developmental Disabilities, 48, 79–93. https://doi.org/10.1016/j.ridd.2015.10.011

*Cilia

Carette

Elbattah

Dequen

Guerin

J.-L.

Bosche

Vandromme

Le Driant

(2021). Computer-aided screening of autism spectrum disorder: Eye-tracking study using data visualization and deep learning. JMIR Human Factors, 8(4), 1–11APAPsycInfo<2021>. https://doi.org/10.2196/27706

Dawson

Rieder

A. D.

Johnson

M. H.

(2023). Prediction of autism in infants: Progress and challenges. The Lancet Neurology, 22(3), 244–254. https://doi.org/10.1016/S1474-4422(22)00407-0

*de Belen

R. A. J.

Eapen

Bednarz

Sowmya

. (2024). Using visual attention estimation on videos for automated prediction of autism spectrum disorder and symptom severity in preschool children. PLOS ONE, 19, Article 0282818. https://doi.org/10.1371/journal.pone.0282818

10.

Doebler

(2012). mada: Meta-analysis of diagnostic accuracy (p. 0.5.12) [Dataset]. https://doi.org/10.32614/CRAN.package.mada

11.

Elsabbagh

Bedford

Senju

Charman

Pickles

Johnson

M. H.

, & The BASIS Team. (2014). What you see is what you get: Contextual modulation of face scanning in typical and atypical development. Social Cognitive and Affective Neuroscience, 9(4), 538–543. https://doi.org/10.1093/scan/nst012

12.

Elsabbagh

Gliga

Pickles

Hudry

Charman

Johnson

M. H.

(2013). The development of face orienting mechanisms in infants at-risk for autism. Behavioural Brain Research, 251, 147–154. https://doi.org/10.1016/j.bbr.2012.07.030

13.

Falck-Ytter

Bölte

Gredebäck

(2013). Eye tracking in early autism research. Journal of Neurodevelopmental Disorders, 5(1), Article 28. https://doi.org/10.1186/1866-1955-5-28

14.

Fombonne

Marcin

Manero

A. C.

Bruno

Diaz

Villalobos

Ramsay

Nealy

(2016). Prevalence of autism spectrum disorders in Guanajuato, Mexico: The Leon survey. Journal of Autism and Developmental Disorders, 46(5), 1669–1685. https://doi.org/10.1007/s10803-016-2696-6

15.

*Frazier

T. W.

Klingemier

E. W.

Beukemann

Speer

Markowitz

Parikh

Wexberg

Giuliano

Schulte

Delahunty

Ahuja

Eng

Manos

M. J.

Hardan

A. Y.

Youngstrom

E. A.

Strauss

M. S.

(2016). Development of an objective autism risk index using remote eye tracking. Journal of the American Academy of Child & Adolescent, 55(4), 301–309. https://doi.org/10.1016/j.jaac.2016.01.011

16.

*Frazier

T. W.

Klingemier

E. W.

Parikh

Speer

Strauss

M. S.

Eng

Hardan

A. Y.

Youngstrom

E. A.

(2018). Development and validation of objective and quantitative eye tracking-based measures of autism risk and symptom levels. Journal of the American Academy of Child & Adolescent, 57(11), 858–866. https://doi.org/10.1016/j.jaac.2018.06.023

17.

*He

Wang

Wei

(2021). Automatic classification of children with autism spectrum disorder by using a computerized visual-orienting task. PsyCh Journal, 10(4), 550–565. https://doi.org/10.1002/pchj.447

18.

*Jensen

Noazin

Bitterfeld

Carcelen

Vargas-Cuentas

N. I.

Hidalgo

Valenzuela

Roman-Gonzalez

Krebs

Clement

Nolan

Barrientos

Mendoza

A. K.

Noriega-Donis

Palacios

Ramirez

Vittet

Hafeez

Torres-Viso

Zimic

(2021). Autism detection in children by combined use of gaze preference and the M-CHAT-R in a resource-scarce setting. Journal of Autism and Developmental Disorders, 51(3), 994–1006. https://doi.org/10.1007/s10803-021-04878-0

19.

*Jones

Klaiman

Richardson

Aoki

Smith

Minjarez

Bernier

Pedapati

Bishop

Ence

Wainer

Moriuchi

Tay

S.-W.

Klin

(2023). Eye-tracking-based measurement of social visual engagement compared with expert clinical diagnosis of autism. JAMA: Journal of the American Medical Association, 330(9), 854–865. https://doi.org/10.1001/jama.2023.13295

20.

*Keehn

Monahan

Enneking

Ryan

Swigonski

McNally Keehn

(2024). Eye-tracking biomarkers and autism diagnosis in primary care. JAMA Network Open, 7, Article 11190. https://doi.org/10.1001/jamanetworkopen.2024.11190

21.

Klin

Shultz

Jones

(2015). Social visual engagement in infants and toddlers with autism: Early developmental transitions and a model of pathogenesis. Neuroscience & Biobehavioral Reviews, 50, 189–203. https://doi.org/10.1016/j.neubiorev.2014.10.006

22.

*Kou

Lan

Chen

Zhao

Becker

Kendrick

K. M.

(2019). Comparison of three different eye-tracking tasks for distinguishing autistic from typically developing children and autistic symptom severity. Autism Res, 12(10), 1529–1540. https://doi.org/10.1002/aur.2174

23.

Lord

Elsabbagh

Baird

Veenstra-Vanderweele

(2018). Autism spectrum disorder. The Lancet, 392(10146), 508–520. https://doi.org/10.1016/S0140-6736(18)31129-2

24.

Lord

Rutter

DiLavore

P. C.

Risi

Gotham

Bishop

S. L.

(2012). Autism Diagnostic Observation Schedule, Second Edition (ADOS-2). Western Psychological Services.

25.

Maenner

M. J.

Warren

Williams

A. R.

Amoakohene

Bakian

A. V.

Bilder

D. A.

Durkin

M. S.

Fitzgerald

R. T.

Furnier

S. M.

Hughes

M. M.

Ladd-Acosta

C. M.

McArthur

Pas

E. T.

Salinas

Vehorn

Williams

Esler

Grzybowski

Hall-Lande

Shaw

K. A.

(2023). Prevalence and characteristics of autism spectrum disorder among children aged 8 years—Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2020. MMWR. Surveillance Summaries, 72(2), 1–14. https://doi.org/10.15585/mmwr.ss7202a1

26.

Martinez-Cedillo

A. P.

Delaflor Wagner

C. A.

Albores-Gallo

Foulsham

(2026). Callous–unemotional traits and their association with neurodevelopmental disorders: Insights from gaze behaviour during emotion recognition. Children, 13(2), Article 303. https://doi.org/10.3390/children13020303

27.

Masi

DeMayo

M. M.

Glozier

Guastella

A. J.

(2017). An overview of autism spectrum disorder, heterogeneity and treatment options. Neuroscience Bulletin, 33(2), 183–193. https://doi.org/10.1007/s12264-017-0100-y

28.

McGuinness

L. A.

Higgins

J. P. T.

(2021). Risk-of-bias VISualization (robvis): An R package and Shiny web app for visualizing risk-of-bias assessments. Research Synthesis Methods, 12(1), 55–61. https://doi.org/10.1002/jrsm.1411

29.

*Meng

Yang

Xiao

Zhang

Liu

Luo

(2023). Machine learning-based early diagnosis of autism according to eye movements of real and artificial faces scanning. Frontiers in Neuroscience, 17, Article 170951. https://doi.org/10.3389/fnins.2023.1170951

30.

*Moore

Wozniak

Yousef

Barnes

C. C.

Cha

Courchesne

Pierce

(2018). The geometric preference subtype in ASD: Identifying a consistent, early-emerging phenomenon through eye tracking. Molecular Autism, 9, Article 19. https://doi.org/10.1186/s13229-018-0202-z

31.

The Pandas Development Team. (2025). pandas-dev/pandas: Pandas (Version v2.3.3) [Computer software]. https://doi.org/10.5281/ZENODO.3509134

32.

Papagiannopoulou

E. A.

Chitty

K. M.

Hermens

D. F.

Hickie

I. B.

Lagopoulos

(2014). A systematic review and meta-analysis of eye-tracking studies in children with autism spectrum disorders. Social Neuroscience, 9, 610–632. https://doi.org/10.1080/17470919.2014.934966

33.

Parellada

Andreu-Bernabeu

Á.

Burdeus

San José Cáceres

Urbiola

Carpenter

L. L.

Kraguljac

N. V.

McDonald

W. M.

Nemeroff

C. B.

Rodriguez

C. I.

Widge

A. S.

State

M. W.

Sanders

S. J.

(2023). In search of biomarkers to guide interventions in autism spectrum disorder: A systematic review. American Journal of Psychiatry, 180(1), 23–40. https://doi.org/10.1176/appi.ajp.21100992

34.

*Pierce

Wen

T. H.

Zahiri

Andreason

Courchesne

Barnes

C. C.

Lopez

Arias

S. J.

Esquivel

Cheng

(2023). Level of attention to motherese speech as an early marker of autism spectrum disorder. JAMA Network Open, 6, Article 55125. https://doi.org/10.1001/jamanetworkopen.2022.55125

35.

Pierce

Conant

Hazin

Stoner

Desmond

(2011). Preference for geometric patterns early in life as a risk factor for autism. Archives of General Psychiatry, 68(1), 101–109. https://doi.org/10.1001/archgenpsychiatry.2010.113

36.

Pierce

Marinero

Hazin

McKenna

Barnes

C. C.

Malige

(2016). Eye tracking reveals abnormal visual preference for geometric images as an early biomarker of an autism spectrum disorder subtype associated with increased symptom severity. Biological Psychiatry, 79(8), 657–666. https://doi.org/10.1016/j.biopsych.2015.03.032

37.

Reitsma

J. B.

Rutjes

A. W.

Whiting

Yang

Leeflang

M. M.

Bossuyt

P. M.

Deeks

J. J.

(2023). Assessing risk of bias and applicability. In Deeks

J. J.

Bossuyt

P. M.

Leeflang

M. M.

Takwoingi

(Eds.), Cochrane handbook for systematic reviews of diagnostic test accuracy (1st ed., pp. 169–201). Wiley. https://doi.org/10.1002/9781119756194.ch8

38.

Shic

Naples

A. J.

Barney

E. C.

Chang

S. A.

McAllister

Kim

Dommer

K. J.

Hasselmo

Atyabi

Wang

Helleman

Levin

A. R.

Seow

Bernier

Charwaska

Dawson

Dziura

Faja

McPartland

J. C.

(2022). The autism biomarkers consortium for clinical trials: Evaluation of a battery of candidate eye-tracking biomarkers for use in autism clinical trials. Molecular Autism, 13(1), Article 15. https://doi.org/10.1186/s13229-021-00482-2

39.

*Sun

Calvert

E. I.

Mao

Liu

Wang

R. K.

Wang

X.-Y.

Z.-L.

Wei

Kong

X.-J.

(2024). Interest paradigm for early identification of autism spectrum disorder: An analysis from electroencephalography combined with eye tracking. Frontiers in Neuroscience, 18, Article 150245. https://doi.org/10.3389/fnins.2024.1502045

40.

*Sun

Wang

Wei

Feng

Z.-L.

Yassin

Stone

W. S.

Lin

Kong

X.-J.

(2023). Identification of diagnostic markers for ASD: A restrictive interest analysis based on EEG combined with eye tracking. Front Neurosci, 17, Article 1236637. https://doi.org/10.3389/fnins.2023.1236637

41.

Tönsing

Schiller

Vehlen

Nickel

Van Elst

L. T.

Domes

Heinrichs

(2025). Altered interactive dynamics of gaze behavior during face-to-face interaction in autistic individuals: A dual eye-tracking study. Molecular Autism, 16(1), Article 12. https://doi.org/10.1186/s13229-025-00645-5

42.

Van De Schoot

De Bruin

Schram

Zahedi

De Boer

Weijdema

Kramer

Huijts

Hoogerwerf

Ferdinands

Harkema

Willemsen

Fang

Hindriks

Tummers

Oberski

D. L.

(2021). An open source machine learning framework for efficient and transparent systematic reviews. Nature Machine Intelligence, 3(2), 125–133. https://doi.org/10.1038/s42256-020-00287-7

43.

Vargas-Cuentas

N. I.

Roman-Gonzalez

Gilman

R. H.

Barrientos

Ting

Hidalgo

Jensen

Zimic

(2017). Developing an eye-tracking algorithm as a potential tool for early diagnosis of autism spectrum disorder in children. PLoS ONE, 12(11), Article e0188826. https://doi.org/10.1371/journal.pone.0188826

44.

Virtanen

Gommers

Oliphant

T. E.

Haberland

Reddy

Cournapeau

Burovski

Peterson

Weckesser

Bright

Van Der Walt

S. J.

Brett

Wilson

Millman

K. J.

Mayorov

Nelson

A. R. J.

Jones

Kern

Larson

Vázquez-Baeza

(2020). SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods, 17(3), 261–272. https://doi.org/10.1038/s41592-019-0686-2

45.

*Wang

R. K.

Kwong

Liu

Kong

X.-J.

(2024). New eye tracking metrics system: The value in early diagnosis of autism spectrum disorder. Frontiers in Psychiatry, 15, Article 1518180. https://doi.org/10.3389/fpsyt.2024.1518180

46.

Weber

Knapp

Ickstadt

Kundt

Glass

. (2020). Zero-cell corrections in random-effects meta-analyses. Research Synthesis Methods, 11(6), 913–919. https://doi.org/10.1002/jrsm.1460

47.

*Wen

T. H.

Cheng

Andreason

Zahiri

Xiao

Bao

Courchesne

Barnes

C. C.

Arias

S. J.

Pierce

(2022). Large scale validation of an early-age eye-tracking biomarker of an autism spectrum disorder subtype. Scientific Reports, 12(1), Article 4253. https://doi.org/10.1038/s41598-022-08102-6

48.

Whiting

P. F.

Rutjes

A. W. S.

Westwood

M. E.

Mallett

Deeks

J. J.

Reitsma

J. B.

Leeflang

M. M. G.

Sterne

J. A. C.

Bossuyt

P. M. M.

, & The QUADAS-2 Group*, . (2011). QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine, 155(8), 529–536. https://doi.org/10.7326/0003-4819-155-8-201110180-00009

49.

Zeidan

Fombonne

Scorah

Ibrahim

Durkin

M. S.

Saxena

Yusuf

Shih

Elsabbagh

(2022). Global prevalence of autism: A systematic review update. Autism Research, 15(5), 778–790. https://doi.org/10.1002/aur.2696

50.

Zhuang

Liang

Qureshi

Ran

Feng

Liu

Yan

Shen

(2024). Autism spectrum disorder: Pathogenesis, biomarker, and intervention therapy. Medcomm, 5(3), Article e497. https://doi.org/10.1002/mco2.497

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.04 MB