Abstract
In recent years, several scholars have argued that the influence of deliberate practice on expertise has been overstated. Others have contended that these critiques conflate deliberate practice with less effective forms of training. We analyzed a large, longitudinal cohort of Chess.com players (N = 44,213) using objective, time-stamped measures of both practice activity and performance. We tested whether deliberate practice-aligned activities predict greater rating improvement than playing games. Multilevel models revealed that, despite more than 90% of player time being spent on games, deliberate practice was substantially more efficient for learning. Although not all deliberate practice-aligned activities were equally effective, the category as a whole was associated with a 3.61× advantage in learning efficiency relative to gameplay (ps < .001). These findings offer rare real-world evidence in a long-standing theoretical debate about learning efficiency. How individuals train, not just how much, fundamentally shapes the trajectory of skill development.
For decades, researchers have debated the relative importance of talent versus training in the acquisition of expertise. Why do some people improve dramatically while others plateau, despite committing to a similar number of hours of practice? One explanation is that some individuals are simply more talented than others. Another is that how they train is just as important as how much. Although the role of practice in skill development has been studied extensively, the field has yet to resolve the degree to which different types of practice lead to systematically different outcomes.
This distinction has major implications for how we measure and understand expertise. Several studies have estimated practice effects using general engagement in domain-relevant activities, such as number of chess games played (e.g., Howard, 2014; Vaci et al., 2019), or broadly defined “structured practice” (e.g., Macnamara et al., 2014, 2016). But as Ericsson (2016, 2021) noted, the associations between broadly defined operationalizations and expertise may not reflect the types of improvements that are possible under more specified conditions. The failure to differentiate among practice types may help explain inconsistent findings across studies (see Debatin et al., 2023).
The study of expertise has long focused on how individuals acquire and apply domain-specific knowledge—and, often enough, this work has investigated how this happens in chess. Early work (Chase & Simon, 1973; de Groot, 1965) suggested that expert chess performance was rooted in pattern recognition and sophisticated knowledge structures rather than raw memory capacity or processing speed. These insights laid the groundwork for theories emphasizing the role of practice in acquiring expertise. Building on these findings, Ericsson and colleagues (1993) introduced the expert-performance framework, arguing that skill develops most efficiently through deliberate practice when training tasks (a) are well defined with a clear goal for improvement, (b) are able to be performed solitarily, (c) involve immediate, informative and actionable feedback, (d) provide opportunities to repeatedly perform the same or similar tasks, and (e) are individually designed under the guidance of a capable teacher (see Ericsson, 2018; Ericsson & Harwell, 2019). Later theory and empirical work provided support for the deliberate-practice framework within the domain of chess (e.g., Charness et al., 2005; Ericsson & Charness, 1994) and reinforced the idea that expert performance is cultivated through deliberate practice more so than mere accumulation of experience.
However, critics have argued that many deliberate practice studies rely on retrospective self-reports of practice, small sample sizes, and correlational designs that limit causal inference (e.g., Hambrick et al., 2016; Macnamara et al., 2014). Others have suggested that innate cognitive abilities may play a larger role in explaining individual differences in expertise attainment than Ericsson acknowledged (e.g., de Bruin et al., 2014; Grabner, 2014; Ullén et al., 2016), citing positive relationships between general-ability measures and chess performance. Together, these critiques have renewed debate not only about how deliberate practice should be defined, but also about whether it is truly central to the development of expert performance.
Although researchers continue to debate which activities qualify as deliberate practice (see Ericsson, 2021; Hambrick et al., 2020), a more fundamental question remains—are some forms of practice more effective at improving performance than others? If proponents of deliberate-practice theory are right, training that follows deliberate-practice principles should be more effective than general engagement. Yet to date, no large-scale longitudinal study has tested this hypothesis using objective, behavioral measures of both engagement and improvement. Although some archival data sets (e.g., Howard, 2006; Roring & Charness, 2007; Vaci & Bilalić, 2017) include tournament play and rating, none have captured detailed, time-stamped records of practice activity across a developmental window. In this investigation, we analyzed data from over 40,000 Chess.com players to test whether different activities—gameplay, reviews, lessons, and puzzles—differentially predict rating improvement over a 6-month period. Chess.com is a popular online platform for playing against human or computer opponents, as well as providing interactive resources for studying chess. Notably, several practice features provide players with individual feedback (during or immediately after completing the activity), allow for repeated attempts, and provide activity recommendations on the basis of the player’s estimated skill level. In our preregistration, we categorized reviews, lessons, and puzzles as “deliberate-practice aligned” because it is possible to meet the five deliberate practice criteria described above while engaging in these activities. In contrast, games do not provide opportunities to repeatedly target areas of weakness in a systematic way and to receive immediate feedback about how to improve in these areas (see Charness et al., 2005). Unlike prior work based on retrospective reports, our design tracked verified platform behavior and performance in real time, providing an ecologically valid test of whether different practice types influence learning rate.
Research Transparency Statement
General disclosures
Study disclosures
Method
Participants
We analyzed archival data from 44,213 Chess.com users who registered between March 1 and March 6, 2024. To be included, players had to establish a stable rating and remain active for 6 months. Players varied widely in initial skill level (M = 647.60; SD = 373.89; range = 100–2,505). No demographic data were available. This study used archival, deidentified user data and was determined exempt from Institutional Review Board review under institutional ethical guidelines. Additional data cleaning and inclusion criteria are described in the Supplemental Material available online.
Procedure
Practice activity and performance data were recorded at three 2-month intervals following the establishment of a stable “rapid rating”—which is Chess.com’s estimate of a player’s skill in longer time-control games. To evaluate the impact of different practice activities on chess skill development, we tracked four types of engagement: playing rapid games, reviewing games, watching video lessons, and solving puzzles. Improvement was assessed using each player’s rapid rating on Chess.com.
Activity engagement was initially recorded as the number of events (e.g., games played, puzzles attempted), but to enhance interpretability and align with prior research, these counts were later converted into estimated hours using a secondary data set of Chess.com session times (see the Supplemental Material for conversion procedures). Each measure is described in detail below.
Measures
Deliberate-practice-aligned activities
Game reviews
After each completed game (or at a later time), players can review their performance using Chess.com’s game-review tool. This feature displays engine-generated evaluations of every move and position, along with suggested best moves. If a player spots an error—or if the engine identifies a mistake—they can retry the position to search for a better move. Alternatively, players can click a “best move” icon to reveal the ideal sequence of moves for both sides, helping them explore potential lines and learn from the position. The tool also provides annotated feedback, including positional themes and instructional commentary relevant to each moment in the game.
For analysis, game-review activity was initially recorded as total reviews initiated and then converted to estimated time spent using an average duration of 2.38 min per review (SE = 0.05), based on the same secondary data set of Chess.com session logs.
Video lessons
The Chess.com website offers a library of recorded video lessons covering key areas of strategy and gameplay, including chess basics, openings, end games, and positional concepts. Some lessons include embedded questions (“challenges”) to test player understanding and reinforce the material. Lessons can vary in length and depth but are generally designed to support knowledge acquisition through direct instruction.
For analysis, lesson activity was initially recorded as total lessons started and then converted to estimated time spent using an average duration of 2.12 min (SE = 0.07) per lesson, derived from the same secondary data set of Chess.com session logs.
Tactics puzzles
Chess puzzles are a common way for players to test their tactical abilities. Each puzzle presents a position in which one side (black or white) can force a win by checkmate, gain a decisive material advantage, or salvage a draw in a seemingly lost position. The Chess.com website presents a daily puzzle to all users, and also offers various formats including Puzzle Rush, Puzzle Battle, and custom puzzles grouped by tactical theme. After each attempt, players receive immediate feedback on the optimal move sequence, as determined by Chess.com’s integrated chess engine (a machine-learning model trained to analyze positions).
For analysis, puzzle activity was initially recorded as total attempts and then converted to estimated time spent using an average duration of 0.73 min per attempt (SE = 0.01), relying on the same secondary data set.
Nondeliberate practice activities
Chess.com offers several timed game formats, including bullet, blitz, rapid, and daily games. We focused on rapid games, which are commonly used for tournament-style play and are officially defined by the International Chess Federation (FIDE) as games in which each player has more than 10 min on his or her clock (FIDE Handbook, 2023). Although labeled “rapid,” actual game durations vary on the basis of user behavior.
For analysis, game activity was initially recorded as total rated rapid games played and then converted to estimated time spent using an average duration of 5.88 min per game (SE = 0.03), relying on a large secondary data set of Chess.com session logs.
Performance measure
We used rapid rating as our primary outcome variable. Rapid games typically allow more time for strategic decision-making and are less influenced by mouse speed, reaction time, or Internet lag than faster formats. (Chess.com also calculates ratings for bullet, blitz, daily games, and chess variants. These were not included in the study.) The site’s rating system is broadly comparable to other Elo-derived systems used by FIDE and the United States Chess Federation (USCF); however, Chess.com ratings are not interchangeable with, and should not be considered as proxies for, off-platform rating systems. To provide context when discussing Chess.com skill-rating differences, we group them using USCF rating categories and descriptions.
Statistical analysis plan
Our preregistered primary hypothesis predicted that engaging in chess practice activities more closely aligned with the elements of deliberate or purposeful practice (i.e., game reviews, puzzles, and lessons) would be associated with increased rate of improvement in skill rating compared with the effect of playing rated games. To examine the relative contribution of practice activities and gameplay on chess rating, we conducted a random intercept general linear mixed-effects model analysis across the three bimonthly intervals (Time). Our analysis controlled for the number of days until participants achieved a stable player rating (Days)—which corresponded with initial engagement on the site (i.e., the more games played, the faster a stable rating was established). Adjusting for this variable helps to ensure that improvement estimates are anchored to comparably stable baseline ratings and that they avoid residual volatility during the rating-stabilization period.
To address the large number of zeros in the data (and potential systematic differences between users who did or did not engage in a given activity), we modeled each activity as a two-part predictor, including a binary exposure component and a linear count component. A final model included interactions between time and the binary and linear components of each activity (see the Supplemental Material available online for additional details on our rationale for selecting this approach over the originally planned structural equation model).
After testing our primary research question and exploring the unique contribution of deliberate-practice-aligned activities versus gameplay, we conducted exploratory follow-up analyses to investigate whether effects varied by players’ initial skill level. Specifically, we grouped players into three rating classes: Classes J and I (ratings from 0 to 399); Classes H, G, and F (ratings from 400 to 999); and Class E and above (ratings > 999; see Fig. 1 for distribution across rankings). These analyses were not preregistered but were motivated by prior work suggesting differences in practice patterns by skill level (see the Supplemental Material for additional details).

Percentage of participants in each chess-rating classification. USCF = United States Chess Federation.
Results
Consistent with our preregistered hypothesis, engaging in deliberate practice-aligned activities was associated with markedly greater improvements than gameplay. Specifically, 1 hour spent on deliberate practice-aligned activities was associated with a 3.11-point gain in rapid rating (b = 3.11, 95% confidence interval, or CI = [2.93, 3.29], p < .001), compared with only 0.86 points gained per hour of gameplay (b = 0.86, 95% CI = [0.83, 0.89], p < .001)—a 3.61× efficiency gap. This difference was significant, χ2(1) = 561.38, p < .001, suggesting meaningful learning costs for players who disproportionately allocated time toward gameplay.
Despite these differences, players spent an average of 90.39% of their on-platform time on gameplay (SD = 12.58), leaving less than 10% for deliberate practice-aligned activities—with 4.61% devoted to reviews (SD = 6.50), 0.99% to lessons (SD = 2.93), and 4.01% to puzzles (SD = 7.94). Contrary to the notion that gameplay is a valid proxy for practice (see Vaci et al., 2019), time spent on gameplay was only weakly associated with time on deliberate practice-aligned activities (reviews: r = .16, p < .001; lessons: r = .07, p < .001; puzzles: r = .08, p < .001).
In addition to our primary analysis, we explored whether more advanced players allocated a larger share of their time to deliberate practice-aligned activities (see Fig. 2). For example, expert-level players (ratings ≥ 2,000)—a small, elite subset of the Class E+ group—spent 14.31% of their time on reviews, compared with just 3.77% among Class J players (ratings < 200), Welch’s t(112.8) = 7.59, p < .001. Lesson usage remained low across groups, with slightly higher percentages among Class J players (1.82%) than expert players (0.86%), t(135.85) = −4.06, p < .001. Puzzle time also increased from 2.94% among Class J players to 7.58% among expert players, t(113.6) = 4.00, p < .001.

Percentage of time spent on chess activities by chess rating class. USCF = United States Chess Federation.
We also explored the extent to which each specific deliberate practice-aligned activity differed from gameplay in terms of learning efficiency. As shown in Table 1, both game reviews and video lessons yielded substantially greater returns than gameplay. Specifically, the estimated rating gain per hour was 4.41 points for reviews (b = 4.41, 95% CI = [4.06, 4.77], p < .001) and 5.21 points for lessons (b = 5.21, 95% CI = [2.99, 7.44], p < .001). In contrast, completing puzzles yielded only a 0.73-point rating gain per hour (b = 0.73, 95% CI = [0.43, 1.03], p < .001). Chi-square tests confirmed that reviews, χ2(1) = 389.09, p < .001, and lessons, χ2(1) = 15.14, p < .001, substantially outperformed gameplay. However, there was no significant difference between puzzles and gameplay, χ2(1) = 0.18, p = .67. A full description of all model estimates in the metric of the counts for each practice activity, including binary predictors and linear main effects, are reported in the Supplemental Material, available at https://osf.io/vkryf.
Per-Hour Rating Improvement by Practice Activity and Comparative Tests Against Gameplay
Note: CI = confidence interval.
When reviews, lessons, and puzzles are aggregated, the gameplay estimate is slightly different (.86 vs .79) than when these activities are modeled separately.
Finally, we explored whether efficiency patterns varied by player ability. To do so, we grouped players into three tiers on the basis of their initial Chess.com rapid rating: Classes I and J (beginners, ratings below 400), Classes F, G, and H (intermediate, ratings 400–999), and Class E and above (advanced, ratings 1,000+).
Patterns were generally consistent across skill levels, though the magnitude of effects varied (see Fig. 3). Among beginners, gameplay yielded an estimated 1.40 points per hour (95% CI = [1.33, 1.47], p < .001), reviews 6.46 points per hour (95% CI = [5.68, 7.24], p < .001), lessons 17.81 points per hour (95% CI = [11.97, 23.65], p < .001), and puzzles 0.73 points per hour (95% CI = [0.11, 1.35], p = .022). Reviews, χ2(1) = 154.26, p < .001, and lessons, χ2(1) = 30.28, p < .001, were more efficient than gameplay; puzzles, χ2(1) = 4.41, p = .04, were not.

Rating gain per hour by starting rating level across practice activities, with 95% confidence intervals (CIs.)
Among intermediate players, training effects were somewhat smaller than those observed among beginners. Gameplay yielded 0.91 points per hour (95% CI = [0.87, 0.95], p < .001), reviews 4.87 points per hour (95% CI = [4.40, 5.35], p < .001), lessons 5.21 points per hour (95% CI = [2.11, 8.31], p < .001), and puzzles 0.61 points per hour (95% CI = [0.25, 0.97], p = .001). Time spent on reviews, χ2(1) = 258.75, p < .001, and lessons, χ2(1) = 7.36, p = .007, was more predictive of improvement than time spent on games. Improvement associated with puzzles was not different from games, χ2(1) = 2.78, p = .10.
Among advanced players, training effects were the most modest overall—corroborating the general trend of diminishing returns per hour of practice as player skill increases, indicative of logarithmic growth in chess ratings over time (Charness et al., 2005; Howard, 2012). Gameplay yielded 0.38 points per hour (95% CI = [0.33, 0.43], p < .001), reviews 2.25 points per hour (95% CI = [1.58, 2.92], p < .001), lessons 3.42 points per hour (95% CI = [–1.32, 8.16], p = .157), and puzzles 0.63 points per hour (95% CI = [–0.39, 1.66], p = .226). Reviews were more efficient than gameplay, χ2(1) = 29.54, p < .001, whereas lessons, χ2(1) = 1.58, p = .21, and puzzles, χ2(1) = 0.23, p = .63, were not. Although these patterns provide clues about which activities may matter most at different stages of development, the stratified analyses were not preregistered and should be interpreted as exploratory.
Collectively, these findings suggest that not all practice activities contribute equally to skill development. Although gameplay accounted for the vast majority of player time, it yielded a relatively low return on investment. Reviews and lessons were significantly more efficient—by a factor of 5.6 and 6.6, respectively—whereas puzzles showed no consistent advantage over gameplay. Game reviews, in particular, predicted improvement more reliably than any other activity. These patterns were broadly consistent across skill levels, though the advantages of deliberate practice-aligned activities were most pronounced for beginners and intermediate players. Among advanced players, only game reviews remained significantly more efficient than gameplay.
Discussion
This study provides a large-scale, real-world test of a central claim in the expert-performance framework: that practice that more closely aligns with deliberate practice principles leads to greater improvements. Across more than 40,000 players, we found that, as a whole, deliberate practice-aligned activities were associated with a 3.61× relative advantage in learning efficiency compared with gameplay—although there were meaningful differences in learning efficiencies across these activities. Despite this large overall advantage for deliberate practice-aligned activities, players nevertheless spent roughly 90% of their time playing games.
Although some prior research has suggested that gameplay may account for as much or more variance in performance as deliberate practice (e.g., de Bruin et al., 2007; Howard, 2012), our findings provide strong evidence to the contrary. Unlike earlier studies that relied on retrospective self-reports, our analysis used objective, time-stamped data to track both engagement and improvement over a 6-month period. These results offer a high-resolution, ecologically valid estimate of how practice type shapes skill development.
Although our results broadly supported our preregistered prediction that deliberate practice-aligned activities are more effective than gameplay, exploratory analyses suggest that these activities may not be equally beneficial in practice. Lessons showed the largest average improvement, followed closely by reviews—which showed the most consistent benefits across skill levels. Puzzles were less effective than both lessons and reviews, producing roughly the same benefit as gameplay. All three activities were preregistered as deliberate practice-aligned activities because each theoretically can accommodate the principles outlined by Ericsson et al. (1993).
One may ask, if all three can accommodate deliberate practice, why were improvement rates so different? It is possible that user behavior may have led to differing degrees of adherence to deliberate practice principles. For example, game reviews contain built-in features that may promote closer adherence to deliberate practice (e.g., individualized learning that directly addresses a user’s own mistakes, and opportunities to repeatedly practice those situations). By contrast, some puzzle formats (such as speed-based modes) may incentivize quickly solving puzzles one already understands rather than improvement goals. Notably, our objective is not to modify our operationalization of deliberate practice post hoc; doing so risks unfalsifiable interpretations. Distinctions about the degree to which users engaged in reviews, lessons, or puzzles in a manner consistent with deliberate practice principles remain speculative and untested within our data.
Our findings offer an alternative explanation for the well-documented “fan-spread effect” in skill development (see Ackerman, 2007; Gagné, 2005), in which initially higher-performing individuals improve more quickly over time. Rather than reflecting innate ability differences, this pattern may emerge from systematic differences in practice strategy. To illustrate the potential cumulative impact of different approaches, consider three hypothetical players with the same initial rating—629.87, our sample mean (see Fig. 4): one who follows the practice protocol of the average Class J player (complete novice), one who mirrors the protocol of the average Class A player (very skilled amateur), and one who adopts the protocol of the average expert (elite performer). Using our model results as a reference, the player who emphasizes the practice mix observed among expert players should show a substantially steeper improvement slope even though all three log the same number of hours engaging in chess activities—widening the gap over time not because of talent, but because of training choices. It is important to note that this illustration is intended to highlight relative differences in learning trajectories within the observed range of our data and should not be interpreted as implying continued linear growth, asymptotic outcomes, or the inevitable attainment of master-level performance or elite performance.

Fan spread as a function of practice protocol and hours of chess engagement. Growth trajectories reflect estimated rating gains per hour for three hypothetical players with a sample mean initial chess rating of 629.87. Improvement rate was calculated on the basis of average time allocation across the four chess activities, as observed in our sample, combined with the estimated rating gains per hour of each activity: gameplay (0.79), game reviews (4.41), video lessons (5.21), and puzzles (0.73).
One practical implication of the present investigation is that greater improvement did not require a human coach. As AI-based learning tools become more widely available, they offer a rare opportunity to deliver the key elements of deliberate practice at scale—potentially reducing the cost and logistical barriers typically associated with expert coaching. Although further research is needed to compare the effectiveness of human- versus model-based instruction, our findings suggest that AI systems may help democratize high-quality skill development.
Limitations and future directions
This investigation has a few noteworthy limitations that point to important directions for future research. First, we were unable to track off-platform training—such as over-the-board games, private coaching, or other nondigital forms of practice—which likely introduced noise and may have biased effect sizes. These activities may have been more common among advanced players, potentially obscuring the true magnitude of differences in training efficiency across skill levels.
Second, we lacked direct measures of cognitive ability, motivation, or psychological traits, limiting our ability to model how learner characteristics interact with practice type and our ability to rule out potential confounding from differences in initial motivation (e.g., learning-focused vs. recreational use). Although some studies (e.g., Vaci et al., 2019) have explored ability-practice interactions, they relied solely on gameplay counts and did not account for other practice activities. Future research should investigate the extent to which individual differences in intelligence, personality, and motivation moderate the effects of deliberate practice on improvement.
Third, our observational design falls short of the empirical ideal proposed by Ericsson and Williams (2007), who described the ultimate validation of practice effects as experimental assignment of expert performers to different training conditions. However, our study approximates their recommendation to conduct “longitudinal studies of skilled individuals where the performance and practice activities are recorded objectively” to better understand “who develops expertise and who does not” (p. 120).
Fourth, the sample is composed primarily of recreational players and developing learners; by FIDE standards, the vast majority would be classified as novices rather than experts. As a result, our findings speak most directly to learning trajectories in the early and intermediate stages of skill acquisition and may not generalize to master-level or elite international competitors. Nevertheless, within this largely recreational sample, the elite players in our sample devoted a substantially greater share of their time than did novices to deliberate practice-aligned activities, a pattern consistent with the learning-efficiency differences observed in the present investigation and suggesting that these practice-efficiency differences extend into the upper extremes of the observed skill range.
Finally, our analyses necessarily apply to users who remained engaged during the observation window, as the archival data set did not include users who discontinued activity on Chess.com prior to the end of the study period. Despite its limitations, the structure and granularity of these data provide strong, causally suggestive evidence about how different forms of practice shape learning trajectories in the wild.
Conclusion
Although much work remains to disentangle the relative contributions of practice, general abilities, and other personal attributes to skill development, one conclusion from this study is clear: Not all practice is created equal. The assumption that differences in learning rate primarily reflect stable cognitive traits should be reevaluated. Indeed, what has traditionally been labeled a fan-spread effect—where those who start ahead improve faster over time—may be, at least in part, a consequence of differences in practice strategy rather than innate talent. Our findings underscore the importance of examining not just how much people practice, but how deliberately they practice, and with what tools. In doing so, we move one step closer to understanding how expert performance is built—and how it can be made more accessible.
Supplemental Material
sj-pdf-1-pss-10.1177_09567976261452568 – Supplemental material for Not All Practice Is Created Equal: Longitudinal Evidence From Over 40,000 Chess Players
Supplemental material, sj-pdf-1-pss-10.1177_09567976261452568 for Not All Practice Is Created Equal: Longitudinal Evidence From Over 40,000 Chess Players by Daniel A. Southwick, Kyle W. Harwell, Garrett Wright, Joseph A. Olsen and Benjamin M. Ogles in Psychological Science
Footnotes
Acknowledgements
The views expressed in this article are those of the authors and do not reflect the official policy or position of the Department of Defense, the U.S. Army, or the U.S. Army Research Institute for the Behavioral and Social Sciences.
Transparency
Action Editor: Julia Stern
Editor: Simine Vazire
Author Contributions
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
