Abstract
Objective:
To evaluate the effectiveness of MazeOut, an adaptive serious game for motor rehabilitation, in individuals with autism spectrum disorder (ASD), by comparing their performance and usability with that of individuals with typical development (TD) and assessing the impact of adaptive (AG) versus nonadaptive gameplay on task performance.
Materials and Methods:
A mixed-design study with 30 participants (15 ASD, 15 TD), aged 8 to 40 years, had each participant experience both adaptive and nonadaptive interventions in randomized order, allowing within- and between-subject comparisons. Performance was measured using overall scores (based on maze navigation speed and coin collection), and usability was assessed with the System Usability Scale (SUS). Data analysis was conducted using R software, with performance trends evaluated through segmented regression and the Kruskal–Wallis test.
Results:
The TD group outperformed the ASD group across all conditions (TD median score: 27.54; ASD median score: 23.79, P < 0.001). Notably, participants in both groups achieved significantly better performance when AG was introduced first (ASD: 24.04 vs. 19.1, P < 0.001; TD: 30.2 vs. 24.31, P = 0.005), suggesting that the adaptation facilitates initial task learning. ASD participants reported slightly higher usability (mean SUS = 77.2) than TD participants (74.6), with the highest scores among younger users (81.9).
Conclusions:
Adaptive serious games can enhance motor performance, particularly for individuals with ASD. The findings suggest that early exposure to AG may improve task performance. Future studies with larger samples and longer interventions are needed to assess long-term benefits.
Introduction
Technology has revolutionized rehabilitation, with serious games emerging as engaging tools that combine therapy and gameplay.1,2 Traditional methods, often repetitive, struggle to maintain patient adherence, particularly for long-term treatments.3,4 Serious games address these limitations by gamifying rehabilitation tasks, fostering motivation, and improving therapeutic outcomes.2,5 These games create interactive environments tailored to patient needs, providing both motor6–8 and cognitive9–11 benefits while enabling healthcare professionals to monitor progress.12–15
Adaptive serious games dynamically adjust to diverse rehabilitation needs, 3 leveraging metrics such as facial expressions, motor trajectories, and biosignals to align challenges with patients’ capabilities.4,15–17 In a previous work, 18 a framework for adaptive games was proposed, favoring adjustments in difficulty level and task complexity based on user performance.
Among the populations that can benefit most from these advances are neurodivergent individuals, who often face unique motor, cognitive, and sensory challenges that traditional therapies struggle to address.19,20 Individuals with autism spectrum disorder (ASD), in particular, represent a key population for whom adaptive serious games offer significant promise. ASD is characterized by persistent difficulties in social communication, restricted and repetitive behaviors, and notable motor impairments, including poor coordination, balance, and fine motor control.21,22 These challenges, when coupled with heightened sensory sensitivities, can create barriers to engagement and success in traditional therapeutic activities, underscoring the need for more personalized, adaptive interventions.23–25 Additionally, the considerable heterogeneity within the ASD population demands flexible tools that can be tailored to each individual’s unique needs. 26
For this population, gamified environments can progressively challenge motor skills while reinforcing positive behaviors, creating a balance between task performance and sustained engagement.27,28 Adaptive technological interventions can be customized to accommodate sensory preferences, offering structured and predictable environments with visual and auditory support to enhance comprehension and participation. 29 By reinforcing desired motor behaviors and providing real-time, performance-based feedback, these systems promote both motor and cognitive development. This dual focus highlights the potential of adaptive serious games as a transformative tool for improving quality of life for individuals with ASD. 30
This study extends the MazeOut serious game 18 to individuals with ASD by refining the platform with features such as enhanced scoring systems, dynamic difficulty adjustments, and personalized feedback. Additionally, it compares the performance and usability of individuals with ASD to those with typical development (TD), aiming to better understand the unique therapeutic needs of the ASD population.
The study hypothesizes that: (1) adaptive gameplay (AG) will lead to significant improvements in motor and task performances compared with nonadaptive gameplay (NAG) in both groups; (2) due to the motor-sensory difficulties associated with ASD, TD individuals will demonstrate better overall performance in the game; and (3) the personalized nature of the game will enhance usability and motivation among participants, with individuals with ASD demonstrating higher usability scores due to the tailored design, as evidenced by gameplay metrics and qualitative feedback.
Materials and Methods
Study design and ethical approval
This study is a cross-sectional analysis, approved by the Research Ethics Committee of the University of São Paulo (CAAE: 55773822500005390) and adhered to Resolution 466/2012 of the National Health Council and the Declaration of Helsinki (1964). All participants and their legal guardians provided written consent, and data were stored securely in restricted-access databases. The study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines. 31
Settings
The study was conducted using a custom-designed serious game, providing a controlled and standardized virtual environment. Participants were recruited and participated in the study at two locations: ESPAÇO CEL, a neurorehabilitation clinic in Brazil (for ASD participants), and the University of São Paulo (for TD participants).
Participants
Participants aged 8–42 years were invited to participate in this study. Individuals with ASD had a formal diagnosis from a neurologist or psychiatrist, with symptom severity confirmed using the Childhood Autism Rating Scale; 32 only those scoring 30–36 (mild-moderate range) were included. The TD control group, recruited via convenience sampling, consisted of individuals without neurodevelopmental disorders. Inclusion criteria required participants to understand instructions and provide informed consent (or assent for minors under 18 years). Exclusion criteria included: (1) inability to perform the game task after explanation (failure to move in the maze within two minutes during familiarization, based on the protocol by Almeida et al. 8 ) or (2) motor impairments preventing task execution.
Outcomes and instruments
The primary outcomes of this study were task performance and usability, assessed using data from the MazeOut serious game and the System Usability Scale (SUS), 33 respectively. MazeOut, developed by Kira et al., 18 is a motor rehabilitation game where players navigate a maze to collect coins using upper limb movements detected via a device camera.
In this study, the game was improved with a performance-driven scoring system (10 divided by time-per-coin, defined as the time taken to collect each evenly spaced coin) coupled with real-time score displays and personal best records (see Fig. 1). This design aligns with flow theory principles, 34 where dynamic feedback and reward systems optimize challenge-skill balance. Following a clinical protocol designed by physiotherapists, players begin with a five-level tutorial before progressing to AG and NAG modes (10 levels each).

MazeOut game interface during calibration, showing the maze, coins, and score.
The adaptive mechanism operates as follows:
Calibration: The initial level records the player’s movement speed and amplitude (capturing their full range of motion within the game’s interactive area) to establish baseline performance metrics. Boundary adjustment: Subsequent AG levels dynamically scale maze boundaries to 50%–80% of the player’s maximum reach from their previous performance, ensuring appropriate challenge levels. Path adaptation: Using time-per-coin ratios from the prior level for each movement direction (up, down, left, right), the system adjusts subsequent paths. Slower performance triggers shorter paths in that direction, while faster completion increases path length.
In NAG mode, difficulty increases linearly based on clinician-defined parameters, providing a controlled benchmark for comparison.
Procedure
The study followed a structured experimental protocol (Fig. 2). First, all individuals were briefed on the experiment and provided with detailed instructions. Participants then signed the informed consent form, followed by the completion of a demographic questionnaire to collect background information.

Study design flowchart. SUS, System Usability Scale.
Participants were randomly assigned to one of two sequences as follows:
Sequence A: AG intervention first, followed by NAG intervention; Sequence B: NAG intervention first, followed by AG intervention.
All participants completed the tutorial to familiarize themselves with the game mechanics. After completing the two gameplay sessions, participants completed the SUS questionnaire to assess their experience with the game.
Data analysis
Data analysis was conducted using R software (version 4.3.3) and consisted of two main components: performance evaluation and usability assessment.
Performance Analysis: performance was measured using a standardized score based on maze navigation speed and coins collected, accounting for maze size and individual variability. Segmented regression with z-scores
35
assessed performance trends across levels, enabling comparisons between groups (ASD and TD) and within-subjects factors, including interventions (AG and NAG) and sequences (A: AG first; B: NAG first). Due to asymmetric score distributions (with median-based analysis being most appropriate) and multiple group comparisons, the Kruskal–Wallis test (α = 0.05) was selected to determine the influence of intervention type and sequence on performance. Effect sizes were calculated using eta-squared (η2), interpreted as small (η2 ≥ 0.01), medium (η2 ≥ 0.06), or large (η2 ≥ 0.14).
36
Usability Assessment: the SUS questionnaire, adapted for the maze game, included 10 questions rated on a Likert scale (1 = strongly disagree; 5 = strongly agree). SUS scores were calculated for each participant to evaluate their perception of the game’s usability (see Results: Usability assessment for detailed analysis).
Results
In total, 30 participants were included (none were excluded due to inability to perform the task during the familiarization procedure), evenly split between TD and ASD individuals. Demographic information is presented in Table 1.
Demographic Characteristics of the Sample
SD, standard deviation; TD, typical development; ASD, autism spectrum disorder; CARS, Childhood Autism Rating Scale.
Performance analysis
The analysis revealed significant differences in performance between the ASD and TD groups, as well as effects related to the type of intervention and the sequence in which they were administered (Fig. 3). Overall, the TD group outperformed the ASD group across all conditions. When comparing the groups as a whole, the TD group demonstrated a significantly higher median performance (27.54) compared with the ASD group (23.79, P < 0.001, η2 = 0.028). This pattern persisted across intervention types: the TD group performed better in both AG (TD: 29.91, ASD: 24.60, P = 0.003, η2 = 0.026) and NAG interventions (TD: 26.59, ASD: 23.53, P = 0.002, η2 = 0.030). Similarly, the TD group showed superior performance in both sequences: Sequence A (TD: 30.65, ASD: 25.45, P = 0.006, η2 = 0.024) and Sequence B (TD: 26.55, ASD: 20.95, P < 0.001, η2 = 0.056). Notably, when examining the intervention type for each group separately, without considering sequence, neither ASD (P = 0.181) nor TD participants (P = 0.131) showed significant differences between AG and NAG.

Performance of the study groups according to the sequence (A or B). Red bars correspond to the ASD group, while blue bars represent the TD group. The hatched bars indicate AG intervention, whereas solid bars represent NAG intervention. Error bars show the standard error of the median. ASD, autism spectrum disorder; TD, typical development; AG, adaptive gameplay; NAG, nonadaptive gameplay.
Regarding the effect of sequence, both groups performed better when starting with the AG intervention. For the ASD group, performance was higher in Sequence A (25.45) compared with Sequence B (20.95, P < 0.001, η2 = 0.052); the same occurred for the TD group (Sequence A: 30.65, Sequence B: 26.55, P = 0.002, η2 = 0.030). When comparing NAG interventions, performance improved when the AG intervention was administered first. For the ASD group, performance in the NAG intervention was higher in Sequence A (26.58) than in Sequence B (19.16, P < 0.001, η2 = 0.226), and a similar pattern was observed for the TD group (Sequence A: 31.42, Sequence B: 24.31, P < 0.001, η2 = 0.086).
When comparing the interventions directly, no significant overall difference was found between AG (27.52) and NAG interventions (24.54, P = 0.053, η2 = 0.005). However, a closer examination of performance improvement between the first and second interventions revealed nuanced effects. For Sequence A, no significant difference was observed between AG and NAG conditions in the TD group (AG: 30.20, NAG: 31.42, P = 0.735), but a significant difference was found in the ASD group (AG: 24.04, NAG: 26.58, P = 0.047, η2 = 0.019). For Sequence B, significant differences were observed in both the TD (NAG: 24.31, AG: 28.75, P = 0.014, η2 = 0.028) and ASD groups (NAG: 19.16, AG: 25.60, P < 0.001, η2 = 0.103), supporting the hypothesis that adaptation positively influenced performance.
In terms of the first intervention, AG yielded statistically greater performance than NAG for both groups (ASD: 24.04 vs. 19.1, P < 0.001, η2 = 0.106; TD: 30.2 vs. 24.31, P = 0.005, η2 = 0.046). However, in the second intervention, no significant differences were observed for either group (TD: P = 0.136; ASD: P = 0.102). Temporal analysis (Fig. 4) indicated that the rate of improvement was less pronounced in the second intervention, with the only statistically significant difference observed in ASD participants from Sequence B (P = 0.004), whose learning rate declined in the second segment.

Performance trends across levels: segmented regression analysis for Sequence A (upper subplot) and Sequence B (lower subplot), grouped by ASD (green) and TD (purple) participants. ASD, autism spectrum disorder; TD, typical development.
Usability assessment
Preliminary usability data, with an overall SUS score of 76.3, suggest that the system can be classified as “Good” according to the SUS scale. 37 The results indicate a slight variation between ASD and TD groups, with mean scores of 77.2 and 74.6, respectively, suggesting that individuals with ASD perceived the system as more accessible or useful. By age group, children and adolescents scored the highest (81.9), while young adults and adults gave a lower average (70.8), indicating greater ease of use or interest among younger users.
To further explore these differences, Figure 5 details the distribution of SUS responses on a Likert scale, providing insights into usability perceptions between the ASD and TD groups. The analysis reveals that ASD participants tended to provide more positive responses across most questions, indicating a generally favorable perception of the system’s usability. In contrast, the TD group showed an overall greater variability in their responses, with a more balanced distribution across the Likert scale options. This difference suggests that individuals with ASD may have perceived the system as more intuitive and accessible, while the TD group demonstrated a wider range of experiences and opinions.

Distribution of SUS responses on a Likert scale for ASD and TD groups. SUS, System Usability Scale; ASD, autism spectrum disorder; TD, typical development.
Discussion
Our findings partially supported the initial hypotheses, showing improvement in task performance across all groups, with the TD group outperforming the ASD group as anticipated. However, a key finding was the superior performance observed when the first session was conducted in AG, particularly for the ASD group. Additionally, the serious game demonstrated high levels of usability, especially among ASD individuals.
Comparison between TD and ASD groups
Significant differences in task performance were observed between groups, with the TD group outperforming the ASD group across both AG and NAG interventions, regardless of the sequence. Although performance differences might be partially attributed to age, this is a point of controversy in the literature. While some studies report that adults demonstrate superior motor planning and task comprehension,38,39 others show that children exhibit greater motor variability and no significant differences in adaptation compared with adults.40,41 Together, these findings suggest that the performance gap is more strongly linked to ASD-related deficits in sensory processing, motor planning difficulties, and challenges n interpreting complex multimodal stimuli, such as visual and spatial cues within the maze environment.25,42
Improvement in task performance: Adaptive vs. nonadaptive interventions
While no significant overall difference was found between AG and NAG interventions, a detailed comparison revealed important benefits of adaptation.
Isolated comparison of adaptive and nonadaptive interventions
Participants who began with AG demonstrated significantly better performance in their first session compared with those who started with the NAG, particularly among individuals with ASD. Adaptive intervention provided immediate, personalized support, helping participants gradually adapt to task demands without becoming overwhelmed.4,15–17 In contrast, the fixed-difficulty structure of NAG may have posed a greater initial challenge, especially for ASD participants, whose motor planning and sensory integration difficulties could hinder performance.25,42
Effect of sequence and intervention order on performance
Our findings highlight that the order of interventions significantly influenced performance outcomes, particularly for participants with ASD. Starting with AG (Sequence A) not only improved initial performance but also facilitated better carryover when transitioning to nonadaptive tasks, helping ASD participants reach performance levels comparable to their TD peers. In contrast, starting with NAG (Sequence B) led to slower improvement rates and less steady gains, suggesting that early adaptive support is central for refining motor strategies and building a strong performance baseline.43,44
These results reinforce the importance of sequencing in rehabilitation strategies, aligning with research showing that progressive, tailored adaptations help individuals increase motor performance and reduce cognitive overload.45,46 Unlike systems that rely solely on predefined difficulty levels or delayed adjustments, adaptive interventions can optimize motor coordination and task execution, particularly for neurodivergent populations such as individuals with ASD.25,45,47
Usability during task performance
To assess usability, we analyzed participants’ responses using the SUS. Interestingly, ASD participants rated the game’s usability slightly higher (mean SUS score = 77.2) than the TD group (74.6). A closer analysis revealed that ASD participants found the system easier to use and less complex, suggesting that the structured and predictable interface helped them navigate the game more comfortably (Q1 and Q2 from Fig. 5). They also reported greater confidence in using the system (Q5), likely due to the clear instructions and feedback, which provided a sense of control. Additionally, their higher willingness to use the game regularly (Q6, Q7, and Q9) suggests that they found it engaging and motivating, which are critical factors for maintaining participation in rehabilitation.
The positive results may stem from two main factors: (1) the game’s simple, tailored, and structured design, which likely provided an intuitive, engaging, and cognitively supportive experience for ASD participants;44–46 and (2) its dynamically adaptive mechanics, which adjusted difficulty to match individual performance, reducing frustration and enhancing a sense of competence, both of which are key for maintaining engagement and improving usability perceptions.8,27
Limitations and future studies
This study has several limitations that should be acknowledged. First, the small sample size (15 ASD, 15 TD) limits generalizability, and the age difference between groups (TD older than ASD) may have influenced performance outcomes. Second, the brief intervention duration may have restricted motor skill development and environmental habituation. Third, we did not account for cognitive profiles, sensory sensitivities, or co-occurring conditions (e.g., ADHD, anxiety) that might affect performance. Finally, while participants experienced both AG and NAG, we did not directly compare their usability and engagement. The SUS may also lack sensitivity to younger or neurodivergent users’ unique interaction patterns.
Future research should employ larger, more diverse samples across ASD severity levels to examine long-term effects on performance and engagement. Studies should assign separate AG and NAG groups and incorporate child-specific usability measures to systematically compare modes and identify which better promotes engagement in neurodivergent populations. Addressing these gaps could further refine adaptive serious games as effective tools for supporting individuals with ASD in their rehabilitation process.
Conclusion
The findings of this study highlight the potential of adaptive serious games for motor rehabilitation in individuals with ASD. The results showed that while the TD group outperformed the ASD group, the AG intervention significantly improved task performance, particularly when introduced first. ASD participants also rated the game’s usability higher than TD participants, suggesting that this rehabilitation tool was particularly effective for this population. Moving forward, this work underscores the value of tailored therapeutic gaming approaches that can flexibly accommodate individual needs while maintaining engagement. Building on these results, we plan to investigate more sophisticated adaptation systems using machine learning, expand participant diversity to better reflect ASD heterogeneity, and examine how sustained gameplay influences long-term motor skill development.
Footnotes
Acknowledgments
Parts of this work were previously presented at: IEEE Conference on Serious Games and Applications for Health (SeGAH), 2024.
Author Disclosure Statement
The authors declare no conflict of interest. All authors were responsible for the content and writing of this article.
Funding Information
This work was partially funded by the São Paulo Research Foundation (FAPESP grants #2023/05884-6, #2023/12760-1, and #2014/50889-7 – for the National Institute of Science and Technology Medicine Assisted by Scientific Computing, INCT-MACC), the Brazilian National Council for Scientific and Technological Development (CNPq grants #305663/2021-6, #307710/2022–0, and #406029/2023-7), the Brazilian Federal Agency for Support and Evaluation of Graduate Education (CAPES Finance Code 001), and the Natural Sciences and Engineering Research Council of Canada (NSERC Discovery Grant RGPIN-2018-05917).
