Improving Clinical Prognostic Model Methodology: Response

Abstract

Authors’ Response:

We have read with interest the letter to the editor concerning the development of our prognostic prediction model for shoulder pain in competitive swimmers. We thank the authors for their knowledgeable comments and hereby wish to respond to their statements. Overall, we share the concerns of the authors about certain methodological considerations. However, we would like to stress that this study is the first to investigate and develop a prognostic prediction model for shoulder pain in competitive swimmers, and that, as we acknowledged in the discussion of our paper, this should be seen as a starting point. We agree that more work (eg, external validation) needs to be done before this model should be used in practice and wish to express our interest in further improving this model.

It is suggested that we would require a much larger sample size to prevent the risk of overfitting while developing our prognostic model. Binary logistic regression modeling is among the most frequently used approaches for developing multivariable clinical prediction models,² and a key contributing factor to obtaining robust predictive performance of these models is an adequate sample size. The authors of the letter refer to Riley et al (2020).⁶ However, since this method of sample size estimation has only been introduced recently, we did not have access to it at the time of planning our study. Furthermore, for logistic regression analysis, the sample size has typically been estimated in terms of events per variable (EPV), which is defined by the ratio of the number of observations in the smaller of the 2 outcome groups (most often the cases) relative to the number of regression coefficients estimated in the prediction model.¹⁰ In the medical literature, an EPV of 10 has been widely advocated as the rule of thumb (the lower limit) for developing prediction models with binary outcomes.^3,4 We had therefore used the “10 EPV” rule, stating that we would need 10 athletes with shoulder pain per variable included in our model. Although it recently (from 2019) became clear that this method is often not suitable to follow, especially when the disease prevalence is much higher than 10% or lower than 3%, it was the best available method at the time of designing the study.

We agree that an important component of developing a prediction model is to identify and select the appropriate predictors to include in the model. To develop this model, we had determined the initial set of predictors that were monitored during the study a priori, based on previous literature and clinical expert consensus. Although not clearly described within the Methods section of our paper, univariate analysis was not used to reduce the list of candidate predictors, but rather to develop a comprehensive understanding of our data set. That is the reason why we have used a very liberal P value of .20. Ultimately, our decision on which variables to include in the final model did not depend solely on the univariate analysis, but was a balanced decision based on clinical consensus and the statistical relevance of the predictors during model building.

Next, missing data are ubiquitous in clinical epidemiological research, and these can bring about considerable challenges in the analysis and potentially weaken the validity of the conclusions.⁵ Multiple imputation (MI) is widely advocated as the standard method to deal with missing data.⁹ However, MI assumes that the data are at least missing at random (MAR),⁸ but the only true way of distinguishing between missing not at random and MAR is to measure some of the missing data. The MAR assumption is nevertheless a suitable starting point in many practical cases whenever the missing data show a systematic relationship with the observed data. For instance, coaches at higher competitive levels were more likely to track and share the training load than coaches at lower levels. In addition, Little’s test has also been used to test for “missingness completely at random,” but like most tests of assumptions, its result is not definitive. Ultimately, we created 1000 imputed data sets through predictive mean modeling for numerical variables and logistic regression modeling for categorical variables. All relevant variables that appeared in the complete data set model were used for imputation, including the outcome of shoulder pain. This method has been suggested to make the MAR assumption more plausible and reduces the need for specific adjustments.⁷

We are aware of potential nonlinear relationships between our continuous predictors and our outcome, and we acknowledge that future work is needed to account for this in the modeling, as this may increase the predictive accuracy of our model. Alternatively, there is the opportunity to consider these continuous variables from a categorical point of view that may even improve clinical utility of the model. The authors of this study recognize that the need for further assessment and accounting for potential nonlinear relationships is necessary.

Unfortunately, due to the space limitations of our paper, we did not have the opportunity to delineate further on several of our methodological aspects, and we had opted not to expand on our methods of bootstrapping or evaluation of shrinkage. These have both been performed using the “psfmi” package in R (https://github.com/mwheymans/psfmi),¹ which provides functions to apply pooling of logistic prediction models in multiply imputed data sets. Internal validation was performed by drawing bootstrap samples from each imputed data set, after which the results were combined.

Finally, we wish to thank the authors of the letter for their interest in our work and for pointing out these important methodological considerations. Although we acknowledge the need for future work and the benefits of external validation, we would like to emphasize that this study is only the first to investigate and develop a prognostic prediction model for shoulder pain in competitive swimmers. In addition, it was our primary concern when writing the paper to translate our findings to the end users of our study, that is, clinicians, coaches, and swimmers. We believe that in doing so our results have increased the understanding of shoulder pain in competitive swimming and may ultimately help this population. Nevertheless, we do not intend to minimize the importance of these methodological considerations in the development of prediction models, and we acknowledge the benefit of collaborating for future development. Consequently, any methodological fine-tuning that may improve the prognostic model and its performance should be considered.

Stef Feijen, PhD, PTAntwerp, Belgium Thomas Struyf, MScLeuven, Belgium Kevin Kuppens, PhD, PTAntwerp, Belgium Angela Tate, PhD, PTGlenside, Pennsylvania, USA Filip Struyf, PhD, PTAntwerp, Belgium

Footnotes

Submitted January 29, 2021; accepted February 4, 2021.

The authors declared that they have no conflicts of interest in the authorship and publication of this contribution. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.

References

Heymans

Eekhout

Prediction model selection and performance evaluation in multiple imputed datasets. R package version 0.2.0. Accessed March 14, 2021. https://github.com/mwheymans/psfmi

Moons

Altman

Reitsma

, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1): W1-W73.

Moons

KGM

de Groot

JAH

Bouwmeester

, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11(10):e1001744.

Pavlou

Ambler

Seaman

De Iorio

Omar

RZ.

Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events. Stat Med. 2016;35(7):1159-1177.

Pedersen

Mikkelsen

Cronin-Fenton

, et al. Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol. 2017;9:157-166.

Riley

Ensor

Snell

KIE

, et al. Calculating the sample size required for developing a clinical prediction model. BMJ. 2020; 368:m441.

Schafer

JL.

Analysis of Incomplete Multivariate Data. Chapman & Hall; 1997.

Schafer

Graham

JW.

Missing data: our view of the state of the art. Psychol Methods. 2002;7(2):147-177.

van Buuren

Groothuis-Oudshoorn

. mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3).

10.

van Smeden

Moons

KGM

de Groot

JAH

, et al. Sample size for binary logistic prediction models: beyond events per variable criteria. Stat Methods Med Res. 2018;28(8):2455-2474.