Abstract
Play the ball (PTB) is an important tactical performance indicator in elite rugby league, referring to the action whereby a tackled ball carrier restarts play by rolling the ball underfoot to a teammate following a completed tackle. Despite its growing use in applied performance analysis, the accuracy of commercially coded PTB duration data has not been verified. This study aimed to assess the agreement between commercial and independently coded play the ball event durations from National Rugby League matches. A cross-sectional methodological design was employed using data from 1185 matches over six seasons (2018–2023). A stratified random sample of 372 PTB events was selected within eight standard deviation (SD) categories. Commercial coding of PTB duration was compared against independent coding with stopwatch during playback of official broadcast. Agreement was evaluated using intraclass correlation coefficients (ICCs), typical error of measurement (TEM), and Wilcoxon Signed Rank Tests for systematic bias. Agreement was poor for short-duration PTBs (<2.44 s; ICC = −0.03 to 0.32), with systematic underestimation by the commercial provider (e.g., < –3 SD: mean difference = −3.15 s; TEM = 2.51 s; p < 0.001). Agreement improved for moderate-to-long PTBs (≥2.48 s; ICC = 0.75–0.90), with strongest agreement observed in the 1–2 SD category (ICC = 0.90; TEM = 0.09 s; p = 0.60). Commercial PTB coding demonstrated acceptable agreement for moderate and longer durations but may misrepresent very rapid PTBs. Ongoing evaluation to ensure accurate PTB event coding is warranted given the importance of ruck speed in professional rugby league matchplay.
Introduction
Rugby League is a team sport characterized by intermittent, high-intensity efforts (e.g., sprinting and high-speed running) interspersed with lower-intensity activity (e.g., walking and jogging), 1 along with physically demanding collisions and wrestling contests. 2 In the attacking phase of the game, teams have six tackles per set. During each tackle, the ball carrier is typically brought to a stop (or to the ground) through a physical contest, determined by the referee's subjective assessment of being ‘held’.3,4 The tackled player must then perform a play-the-ball (PTB), which restarts play by rolling the ball underfoot to a teammate. Apart from the players directly involved in the tackle, the defending team is required to retreat 10 meters from the PTB. This phase of play has important tactical complexities, whereby the attacking team aims to quicken the PTB, thereby reducing time for optimal defensive structures. Conversely, the team without possession aims to slow the PTB by using various defensive techniques aimed at manipulating the ball carrier's position. 5 Wrestling and grappling techniques are often employed to slow the PTB, allowing the defensive team to organize for the proceeding tackle. As such, the PTB is a pivotal aspect of modern rugby league, especially at elite levels including the National Rugby League (NRL) and is of increasing interest to coaches and players for team performance benefits.
Under NRL match regulations, no fixed time limit is prescribed for the PTB; the referee subjectively determines when the tackle is complete and play restarts, hence PTBs can vary based on the context of play. Attacking teams aim to complete the PTB as quickly as possible to maximize defensive disorganization, while the defending team seeks to delay it through legal wrestling and grappling techniques, producing this broad spectrum of durations that carry distinct tactical implications. Existing studies have highlighted the PTB's influence on team performance, distinguishing higher-ranked teams from lower-ranked ones. 6 The PTB has been described as a factor contributing to game tempo, enabling attacking teams to exploit defensive disorganization. 5 In theory, a faster PTB can disrupt defensive structures, providing the attacking side with greater opportunities for line breaks and scoring opportunities. 7 Thus, the PTB functions not only as a technical skill but also as a tactical lever that can potentially shape match outcomes. For example, the duration of the PTB can affect the speed of attacking play, the ability to capitalize on defensive weaknesses, 8 and the likelihood of creating scoring opportunities. Although the PTB is of technical and tactical importance, evidence regarding the accuracy of PTB duration measurement to inform the above considerations remains lacking.
Notational analysis has been widely used in sports such as rugby league 8 and rugby union 9 to quantify the frequency and duration of technical actions (e.g., passes, offloads) and physical activities (e.g., sprints, tackles). Advances in time-motion analysis and video-based systems have enhanced the objectivity and precision of match analysis. 10 Understanding match events provides insights into tactical patterns, player workloads, and performance benchmarks, enabling coaches to tailor training programs and optimize match strategies.1,6 In the case of the PTB, understanding its duration directly informs tactical decision-making (e.g., ruck speed targets), player selection based on technical proficiency, and real-time adjustments to exploit defensive disorganization. However, few studies have reported PTB durations, 5 and none have assessed the accuracy of PTB coding to inform duration outcomes. Without published evidence verifying the accuracy and variability of available coded PTB data, further analysis and interpretation of PTB trends and their implications is constrained. This verification gap is critical because unreliable or unvalidated performance metrics can lead to inaccurate interpretations of player ability and tactical performance. As emphasized by Koo and Li, 11 the selection and transparent reporting of reliability coefficients, such as intraclass correlations, are fundamental to ensuring that measurement tools produce consistent and credible data in applied sports research. They further stress that measurement tools demonstrating poor or unreported reliability values can lead to questionable findings and undermine the validity of performance evaluation, making rigorous reliability assessment essential. In the context of rugby league, failing to validate the precision of PTB duration metrics may lead coaches to overemphasize or undervalue player performance, thereby influencing tactical planning and selection on flawed evidence. Furthermore, cross-study comparisons are severely compromised without evidence of consistent coding standards, as discrepancies in the definition or detection of PTB events can yield incomparable or misleading datasets. Establishing standardized and verifiable event-coding procedures is therefore indispensable for maintaining data integrity and supporting accurate benchmarking across different analyses and competitions.
Given that there are reported performance outcomes associated with PTB duration (i.e., higher-ranking teams perform PTB more effectively than lower-ranking teams), and coaches may make player selection and/or training decisions around PTB duration, a distinct need exists to assess the accuracy of commercially coded data on the PTB. As such, the purpose of this study was to investigate the agreement between commercially coded PTB event durations and an independent coder in NRL matches.
Methods
Participants
Data for 825 professional male rugby league players from 17 NRL franchises were provided to the research team by the governing body under formal agreement, with information collected from NRL premiership matches between 2018 and 2023. Individual consent was not required under the Collective Bargaining Agreement between the players union and the NRL. Organizational consent and ethics approval were obtained prior to data collection (Human Research Ethics Committee approval: ETH-24-9402).
Design
A cross-sectional methodological design was employed to compare measurement agreement between commercial and independent coding of PTB durations. The dataset comprised all regular season and finals matches from six NRL premiership seasons (2018–2023), excluding trial and representative fixtures. The 2018–2023 period was selected to encompass recent rule modifications introduced to the game, thereby allowing analysis of PTB events under standard revised regulations. In total, 1185 matches and 324,603 PTB events were available. A subset of PTB durations coded by the official provider were compared to independently coded durations obtained for the same events by the research team using official broadcast video footage. To obtain the subset, PTB durations of all events were categorized into 8 standard deviation (SD) categories. A categorical sampling approach, stratified by PTB duration, was deliberately chosen over simple random selection from the full dataset of 324,603 events. This decision was made to ensure balanced representation of PTB events across the full spectrum of durations, including rare short- and long-duration events that would be underrepresented in a purely random sample given the concentration of events in the central duration categories. This approach allowed agreement to be evaluated independently at each duration stratum, which is of greater applied utility than an overall estimate that would be dominated by the most common (moderate duration) events. Subsequently, a sample of PTB events was randomly selected from each of the eight-standard deviation (SD) categories of the larger sample, ensuring balanced coverage of PTB durations across the spectrum. The research team was blinded to commercial coding outcomes during independent coding.
PTB event coding
PTB event durations were obtained from the official commercial service provider for the NRL (Stats Perform, Chicago, IL, USA). PTB events were operationally defined as commencing when the referee signaled a completed tackle (“held” call) or when the ball carrier contacted the ground and concluding when the ball was placed under the attacker's foot. This definition, as used by the commercial coding provider, was adopted for the purposes of the present study. Commercial coding is completed live during each NRL match. Following each match, PTB events are reviewed to identify and rectify any coding errors from the initial live coding process. This coding process typically takes 2–4 hours post-match. A key consideration in interpreting PTB data is the definition used for coding these events. In this study, the commercial provider's definition of PTB was adopted, which aligns with the ‘Ruck Time’ definition proposed by Eaves and Evers. 5 The choice of definition can influence the timing and interpretation of PTB events and highlights the importance of consistency and transparency in coding methodologies.
For independent coding, matches were reviewed using broadcast footage. Each PTB event was initially located within the video file from official broadcast partners, after which footage was rewound to the onset of the tackle sequence. When the primary broadcast angle did not provide sufficient visual clarity to accurately identify the timing of the PTB, alternate broadcast camera angles were consulted to confirm event start and end points. Coding was performed by a single trained analyst with two years of professional experience in rugby league. The analyst received instruction and undertook several months of familiarization and practice with the coding protocol to ensure procedural consistency prior to formal data collection. Coding was completed in real-time replay using the iPhone stopwatch application (Apple Inc., Cupertino, CA, USA). Video was played at normal (1x) speed for all independent coding, consistent with the live real-time conditions under which commercial coding is performed. This methodological choice was deliberate as the aim of the study was to evaluate the agreement between commercial coding outcomes and independent coding under comparable temporal conditions, rather than to establish a definitive ground-truth duration through frame-by-frame analysis. While slow-motion or frame-by-frame coding using dedicated video editing software may yield greater temporal precision, this approach would not reflect the operational context of commercial coding and would introduce a different set of perceptual demands (e.g., identifying exact event boundaries at reduced speed), thereby evaluating a different question to the one posed. The focus of this study is therefore on the agreement and process of capturing the PTB speed metric, rather than establishing an absolute ground-truth measure of PTB duration. Each PTB event was measured multiple times in a coding session to ensure accuracy, with the stopwatch started at the commencement of the PTB and stopped when the ball was placed underfoot. The total duration for each PTB event was recorded in seconds. Event timings and groupings were entered into a spreadsheet (Microsoft Excel, Microsoft Corporation, Redmond, WA) alongside unique event identifiers to enable matching with the commercial dataset. No missing data were identified in the sampled PTB events, as all selected events were successfully located and coded using full-match broadcast footage.
Sampling and accuracy
All PTB events from the 2018–2023 seasons were classified into eight categories according to their standard deviations (SD) from the mean PTB duration: < −3, −2 to −3, −1 to −2, −1 to 0, 0 to 1, 1 to 2, 2 to 3, and >3 SD. Systematic random sampling was implemented within each category following the approach described by Makwana et al. 12 Specifically, events were ordered chronologically, a random starting point was selected, and every Kth event was subsequently chosen to ensure representative sampling. Based on an anticipated intraclass correlation coefficient (ICC) of 0.95, an acceptable threshold of 0.85, targeted power of 80%, and α = 0.05, a minimum of 25 PTB events per category was calculated to achieve adequate power for interrater agreement analysis. Sample size determination employed the online tool 13 and was guided by methodologies outlined in Walter et al. 14 and Monti et al. 15 To further enhance estimate precision and category representation, deliberate oversampling was performed, resulting in a final sample of 372 PTB events for analysis (50 per group, except for the < −3 SD group, which included 22 events), corresponding to a power of 98%.
Statistical analysis
Data were processed using a customized Microsoft Excel spreadsheet (Microsoft Excel, Microsoft, Redmond, WA, USA). Descriptive statistics (mean ± standard deviation [SD], 95% confidence intervals [CI]) were calculated for all categories. Assessment of agreement between PTB duration were performed in R (version 4.5.0), using the irr package, v 0.84.1, 16 via ICCs as per Koo and Li. 11 Two-way mixed effects models were used, specified for consistency, with single raters/measurements.17,18 ICC thresholds were classified as: very high (0.90–1.00), high (0.70–0.89), moderate (0.50–0.69), and low (0.26–0.49). Non-parametric tests were selected due to the presence of outliers and non-normally distributed differences between the data series. Systematic bias was evaluated using Wilcoxon Signed Rank Tests and effect sizes were calculated as r = Z/√N and interpreted as trivial (<0.10), small (0.10–0.30), medium (0.31–0.50), or large (>0.50). Statistical significance was set at p ≤ 0.05 for all analyses.
Results
Examination of PTB event frequencies revealed a concentration in Categories 4 (37.1%) and 5 (33.9%), with smaller proportions in Categories 3 (13.8%) and 6 (11.0%). Events in Categories 2 (1.1%), 7 (2.4%), 8 (0.8%), and 1 (0%; 22 total events) were comparatively rare. Intra-tester reliability was addressed during data collection, whereby each PTB event was independently coded 5 times within the same coding session. Agreement across repeated measures was confirmed prior to recording the final value, ensuring consistency in the independent coding process. Agreement between commercial and independent coding methods for PTB durations varied across the eight categories (Table 1). Figure 1 shows the distribution of PTB durations for Categories 1–8 (< −3 SDs to >3 SDs), highlighting discrepancies between coding methods.

Visual representation of data distribution. Panels: Category 1 (< −3 SDs), Category 2 (−2 to −3 SDs), Category 3 (−1 to −2), Category 4 (−1 to 0), Category 5 (0–1 SDs), Category 6 (1 to 2 SDs), Category 7 (2 to 3 SDs), Category 8 (>3 SDs).
Descriptive information and agreement statistics between coders for play the ball durations in each standard deviation group.
Key: SD = Standard Deviation; ICC = Intraclass Correlation Coefficient; CI = Confidence Interval; TEM = Typical Error of Measurement
As shown in Table 1, agreement between commercial and independent coding methods for PTB durations varied considerably across categories. For PTB durations shorter than 2.44 s (Categories 1–3; < −1 SD), agreement was poor. ICCs ranged from −0.03 to 0.32, with large typical errors of measurement (TEMs: 0.36–2.51 s) and significant mean differences. Category 1 (< −3) showed the largest discrepancy (mean difference = −3.15 s, TEM = 2.51 s, 142.1%). Figures 1 and 2 show the distribution of PTB durations for Categories 1–8 (< −3 SDs to >3 SDs), highlighting discrepancies between methods, particularly in the shortest duration categories.

Visual representation of data distribution.
Agreement improved as PTB duration increased. For durations ≥2.48 s (Categories 4–8; −1 SD to >3 SDs), ICCs ranged from 0.74 to 0.90, reflecting moderate to excellent agreement. PTBs within 4.56–5.48 s (Category 6; 1–2 SDs) exhibited the highest agreement (ICC = 0.90, TEM = 0.09 s, 1.8%) and no significant mean difference between methods (p = 0.60). Small but statistically significant differences persisted in other groups (p < 0.05), with effect sizes ranging from r = –0.15 to 0.44.
Notably, Category 8 (>6.64 s; >3 SDs) showed strong agreement (ICC = 0.87) but higher TEM (0.47 s, 6.4%) compared to the Category 6 (1–2 SDs). Overall, these findings demonstrate that agreement between methods was lowest for very short PTB durations, whereas agreement improved for moderate-to-long durations (see Figures 1 and 2).
Discussion
This study provides a comprehensive evaluation of the agreement between commercially coded PTB event durations and independent video analysis in elite rugby league, utilizing a large and representative sample from six NRL seasons. By directly comparing commercial coding with independent analysis, the findings highlight important distinctions in PTB data agreement, with significant implications for both research and applied performance analysis in rugby league.
Notably, the present results demonstrate that agreement between commercial and independent coding of PTB durations is highly dependent on the duration of the PTB. It is emphasized that this study evaluated the agreement between two real-time coding methods conducted under comparable conditions, rather than establishing an absolute ground-truth PTB duration; as such, the findings speak to the process of capturing the PTB speed metric rather than definitively adjudicating which value is “correct”. Agreement was lowest for very short PTB events (<1.40 s), with ICCs ranging from −0.03 to −0.2 and typical errors of measurement (TEMs) reaching up to 2.51 s (142.1%). The findings suggest a systematic underestimation of PTB duration by the commercial provider in these short-duration events, with Category 1 (< −3 SDs) showing a mean difference of −3.15 s compared to independent coding. A plausible explanation for the large discrepancies observed in faster PTBs (Categories 1 and 2) is the role of human reaction time. When the PTB event is very brief, the window between the ‘held’ call by the referee and the ball being placed underfoot may span very short durations. Hence, both commercial and independent coders must react to these auditory and visual cues in real time, and individual differences in reaction latency to the ‘held’ signal and to the subjective identification of the ball placement endpoint, may account for a disproportionate share of the total event duration in short PTBs, inflating variability and reducing agreement. By contrast, for longer-duration PTBs (Category 4 and above), reaction time represents a much smaller proportion of the total event, resulting in smaller relative errors and higher agreement between methods. These discrepancies likely reflect the inherent difficulty of capturing rapid, technically complex actions using commercial systems, a challenge previously reported in time-motion analysis of team sports.1,19 Systematic bias was evident and points to the potential for misclassification by the commercial coding provider. Such instances predominantly arose when PTB events coincided with broadcast replays of the preceding play (e.g., missed scoring opportunities or major collisions). In these cases, the research team utilized alternative camera angles without broadcast overlays to ensure accurate analysis of the PTB event. Another scenario affecting PTB duration was when the ball carrier was deemed held, initiating the PTB event, prior to falling to the ground. In certain instances, the point at which the player contacted the ground was used to mark the start of the PTB, rather than the initial ‘held’ call from the referee.
Within the complete database of 324,603 events, Categories 1–3 (−3 to −1 SDs) comprised a total of 48,131 events, representing 14.8% of all observations. Discrepancies observed for short-duration PTBs may have practical implications for performance analysis. Fast PTBs are widely recognized as a tactical advantage, enabling attacking teams to exploit defensive disorganization and increase scoring opportunities.5,6 Systematic underestimation or misrepresentation of the speed of these events could mask important differences between teams or players, particularly when distinguishing high-performing teams known for rapid ruck speed. 5 This limitation reinforces the need for ongoing assessment of agreement with commercial data sources, especially as PTB duration is increasingly used as a key performance indicator in both research and coaching contexts.
In contrast, agreement improved markedly for moderate-to-long PTB durations (≥2.48 s), with ICCs ranging from 0.75 to 0.90. The highest agreement was observed in Category 6 (1–2 SDs; 4.56–5.48 s; ICC = 0.90, TEM = 0.09 s, 1.8%), with minimal systematic bias. These findings are consistent with previous research indicating that commercial coding systems tend to perform more reliably when measuring longer-duration or less complex events, where the temporal resolution of the system is less likely to be a limiting factor.20,21 For very long PTB durations (>6.64 s), agreement remained strong (ICC = 0.87), though typical errors were higher (TEM = 0.47 s, 6.4%), suggesting that both very short and very long events present unique challenges for coding. The evolution of coding systems in rugby league and other sports has seen a progression from manual, paper-based notational analysis to sophisticated semi-automated systems integrating video technology. While these advancements have improved efficiency and enabled the collection of large datasets, challenges persist in ensuring the accuracy of technical event coding, particularly for brief or complex actions. 21 Despite considerable technological advancements, coding systems may still encounter challenges in accurately detecting events at the temporal extremes, underscoring the importance of regular benchmarking and ongoing refinement of event definitions and coding protocols. 21
For practitioners and researchers, these results highlight the importance of verifying the agreement of commercially sourced technical data, particularly for events at the lower and upper duration limits. Where possible, independent checks with video-based coding should be considered, especially when PTB speed is used for player evaluation, tactical planning, or talent identification. The findings of this study indicate that PTB events occurring between Category 1 and Category 2 (0.04–1.40 s) warrant further examination due to potential classification uncertainty. In contrast, PTB events classified in Category 3 (1.48–2.44 s) demonstrated an ICC of 0.32, with a TEM of 0.36 s (16%). Based on these metrics, PTB events occurring near the midpoint of this category (approximately 2 s) may be considered sufficiently reliable for use without the need for additional verification. Independent evaluation can be feasibly achieved by routinely selecting and manually coding a representative subsample of PTB events, allowing teams to systematically monitor and assess the reliability of commercial data. Moreover, the findings suggest that improvements in commercial coding workflows including enhanced training for human operators, refined event definitions, or the integration of machine learning tools may be needed to address systematic biases in short-duration events. As automated tracking and notational analysis technologies continue to evolve, regular benchmarking methods will be crucial to ensure data accuracy and agreement. 21
Several limitations must be acknowledged. While the sample was large and representative, only a subset of PTB events was independently coded due to resource constraints, which may limit the generalisability of the findings to all PTB events within the sampled seasons. Additionally, the study compared commercial coding with independent video-based coding, but neither method represents a gold standard; hence, both are subject to human error, definitional ambiguity, and technological limitations. The observed discrepancies highlight differences, but do not definitively establish which method is “correct”. It is important to acknowledge that the primary intent of this study was to evaluate the process and agreement of capturing the PTB speed metric under real-time conditions comparable to those used in practice, rather than to determine absolute PTB duration accuracy through gold-standard frame-by-frame analysis. This distinction is relevant to the interpretation of the findings, particularly for very short PTBs where human reaction time and perceptual demands are most likely to affect coding agreement across both commercial and independent approaches. Future studies could incorporate frame-by-frame analysis, multiple independent coders, or consensus coding to further clarify accuracy, however, the trade-off against feasibility needs to be considered as this would be a laborious process for large datasets. The analysis focused solely on PTB duration, but other technical actions (e.g., offloads, passes, tackles) may also be susceptible to coding discrepancies and warrant similar evaluation, in this period of data.
Conclusion
This study demonstrates that commercially coded PTB durations in the NRL generally show strong agreement with independent assessment for moderate-to-long events but exhibit systematically lower values for very short PTBs. Given the tactical importance of rapid PTBs, practitioners and researchers should interpret such data with caution and consider independent validation where feasible. Ongoing efforts to standardize and assess agreement in technical event coding will enhance the utility of performance analysis in rugby league and support evidence-based decision-making at all levels of the sport.
Footnotes
Acknowledgements
The authors acknowledge the NRL for providing access to match statistics and for their ongoing support of rugby league performance research.
Ethical considerations
Organisational consent and ethics approval were obtained prior to data collection (Human Research Ethics Committee approval: ETH-24-9402).
Consent to participate
Individual consent was not required under the Collective Bargaining Agreement between the players union and the NRL.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
The data that support the findings of this study were obtained from official National Rugby League (NRL) match statistics, and processed datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.
