Abstract
Background
The mismanagement of missing data in large clinical databases may lead to inaccurate findings. The purpose of this study was to demonstrate the effects of missing data on hand surgery research findings using an analysis of postoperative morbidity in patients undergoing hospital-based hand surgery.
Methods
The National Surgical Quality Improvement Program database was queried for patients undergoing common hand and upper extremity surgery between 2011 and 2016. Major and minor postoperative complications were identified. Demographics, comorbidity, and preoperative laboratory values were identified, and the percentage missing of each was tabulated. To demonstrate how missing data can alter analysis results, these variables were evaluated for an association with major complications using multivariable regression on 3 separate cohorts: (1) all patients; (2) all patients after exclusion of any patient entry with >10% of missing data; and (3) after removal of any patient entry with any missing data.
Results
Groups 1, 2, and 3 had 48 370, 23 118, and 6280 patients, respectively. There were 14 variables associated with increased odds of major complications in group 1, yet only 10 and 9 variables for groups 2 and 3, respectively. Six variables were associated with increased major complications across all 3 groups, whereas only 1 was associated with decreased odds of major complications across all groups.
Conclusions
Filtering patient cohorts according to the amount of missing patient information affected analyses of predictors for major complications associated with hospital-based hand surgery. These findings highlight the importance of considering and addressing missing data in large database studies.
Introduction
With the growing availability of national databases to track perioperative data, there have been many studies using these databases to assess patient outcomes and complications in hand and upper extremity surgery.1-10 Clinical databases such as the American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) allow researchers to benefit from large sample sizes, and because they are built via chart review and clinical data entry, they contain care details that provide advantages over purely administrative databases in several ways. 11 Databases such as NSQIP enable researchers to evaluate how perioperative outcomes are related to demographics, surgical and laboratory characteristics, and certain procedures. Criticisms of database studies have highlighted issues with data quality, 12 patient selection, 13 and the potential for database structure to impact conclusions and associations. 14
The NSQIP database has steadily increased the number of participating centers over time and includes both large and small institutions. In an attempt to include smaller hospitals, NSQIP offers the “Small and Rural” option for hospitals that perform fewer than 1680 “ACS NSQIP-eligible” cases per year. Smaller institutions may often have fewer research resources, and in recognition of this, the NSQIP database allows these institutions to collect fewer variables. As the number of participating small institutions has increased, the presence of missing variables has likely also increased; however, to benefit from these expanded patient cohorts, missing data must be addressed. Studies using databases in hand and upper extremity surgery have not routinely reported on missing data for each variable that is used and also do not account for how missing data have been handled in analyses.1,2,4-9 There are several approaches to missing data,1-9,15 and while prior literature has shown how missing data may alter the results of database studies for patients undergoing spine surgery, 16 it is unclear whether a similar effect is observed in hand and upper extremity surgery. Because database research is already fraught with potential confounding and spurious results if analyses are mismanaged, 17 a better understanding of the impact of missing data is critical. Our primary aim is to evaluate the frequency and type of missing data in NSQIP and to determine how inclusion of patients with varying amounts of missing data may alter the findings from an analysis of 30-day major and minor postoperative complications after hand and upper extremity surgery.
Methods
Patients who underwent hand and upper extremity surgery between 2011 and 2016 were identified from NSQIP using Current Procedural Terminology (CPT) codes. The CPT codes were chosen based on Centers for Medicare and Medicaid data to identify the most common procedures performed by hand surgeons. All isolated, Accreditation Council for Graduate Medical Education–tracked CPT procedure codes routinely performed by hand surgeons with volumes greater than 1000 cases per year (facility, nonoffice setting) were included, yielding a total of 83 codes. 18 Any patient who underwent hand surgery as determined by the primary CPT code listed in NSQIP between 2011 and 2016 was included in the analysis. Because NSQIP does not collect perioperative data on routine procedures with low complication rates, such as carpal tunnel or trigger finger release, only 53 CPT codes were present in the database (see Supplemental Appendix 1).
Patient demographics included age, sex, body mass index (BMI), and race. Comorbidities assessed included diabetes, current smoking status, dyspnea on exertion, dependent functional status, chronic obstructive pulmonary disease, congestive heart failure, hypertension requiring medication (HTN), renal insufficiency, dialysis, chronic steroid use, metastatic oncologic disease, recent weight loss, bleeding disorder, and presence of an open wound. Preoperative laboratory values included creatinine, international normalized ratio (INR), platelet count, hematocrit, white blood cell count, sodium level, and albumin. The American Society of Anesthesiologists classification system for each patient was also included.
The NSQIP database identifies and records the occurrence of postoperative complications for a 30-day period. For this study, postoperative complications were classified as major if they required reoperation or were likely to have a profound and lasting impact on health status as described in similar studies. 19 Major complications include cardiac arrest, renal failure, stroke, pulmonary embolus, sepsis, septic shock, deep infection, wound dehiscence, pneumonia, unplanned reintubation, and death. Complications were classified as minor if further treatment was required without having a lasting impact on health status, including superficial infection, cellulitis, deep vein thrombosis, and urinary tract infection.
The percentage of missing data was calculated for each variable included in the analysis. Multivariable logistic regressions were performed to identify any association between patient variables and both major and minor complications. Because patients with missing data points are dropped from regression models, as a part of each regression missing data were imputed using the simple approach of a regression imputation. The regression analyses were then performed on 3 successively smaller groups to simulate what would occur if no missing data management were done. The first group included all patients. The second group only included patients with at least 90% of the variables present. The third excluded all patients with any percentage of missing data. The odds ratio was calculated for each variable, and 95% confidence intervals were used with significance set at P < .05.
Results
A total of 48 370 patients who underwent surgery from 2011 to 2016 were identified. The mean age was 50.94 ± 17.62 years. Fifty-five percent (26 600) were women, and 64.9% (31 382) were overweight or obese by BMI. The most common comorbidities included HTN (30.0%, n = 14 494), active smoker (21.9%, n = 10 577), and diabetes mellitus (9.4%, n = 4564). Overall, 2.3% (1107) experienced a complication, with 1.2% (567) experiencing at least 1 major complication and 1.3% (620) experiencing at least 1 minor complication. The most common major complication was a deep space infection (0.46%, n = 221). The most common minor complication was superficial wound infection (0.66%, n = 320).
Missing data were identified in 42 090 (87%) patients. The most commonly missing variable was INR, missing in 82.40% of patients (Table 1). In patients with less than 10% of missing data (group 2), 1.8% (420) had at least 1 minor complication and 2.0% (468) had at least 1 major complication. In patients with no missing data (group 3), 2.3% (143) had at least 1 major complication and 1.8% (110) had at least 1 minor complication.
Number and Percentage of Missing Data for Each Variable.
Note. COPD = chronic obstructive pulmonary disease; CHF = congestive heart failure; HTN = hypertension requiring medication; INR = international normalized ratio; BMI = body mass index; ASA = American Society of Anesthesiologists.
Analysis of all patients (group 1) showed increased odds of major complications associated with 14 different variables, whereas analysis of patients with <10% missing data (group 2) demonstrated association with 10 variables (Table 2). There were 9 variables identified in patients without any missing data (group 3). The 4 variables no longer significant after moving between groups 1 and 2 were Pacific Islander/Hawaiian race, increased platelet count and INR, and dependent functional status. Smoking status was no longer significant when moving between groups 2 and 3. Six variables were associated with major complications across all 3 groups. Female sex was associated with decreased odds of major complications across all groups.
Multivariable Logistic Regressions for Any Major Complication with Different Treatments of Missing Data.
Note. Factors with significantly (P < .05) increased odds of major complications are in bold. Factors with significantly (P < .05) decreased odds of major complications are in italics. CI = confidence interval; ASA = American Society of Anesthesiologists; INR = international normalized ratio; COPD = chronic obstructive pulmonary disease; CHF = congestive heart failure; HTN = hypertension requiring medication.
In all, 11, 8, and 6 variables were associated with minor complications for groups 1, 2, and 3, respectively (Table 3). The 3 variables that were no longer significant between groups 1 and 2 were increased INR and albumin, and dependent functional status. History of dialysis and age were no longer significant after moving between groups 2 and 3. There were decreased odds of minor complications with 4, 2, and 1 variables for each group, respectively.
Multivariable Logistic Regressions for Any Major Complication with Different Treatments of Missing Data.
Note. Factors with significantly (P < .05) increased odds of minor complications are in bold. Factors with significantly (P < .05) decreased odds of major complications are in italics. CI = confidence interval; ASA = American Society of Anesthesiologists; INR = international normalized ratio; COPD = chronic obstructive pulmonary disease; CHF = congestive heart failure; HTN = hypertension requiring medication.
Discussion
There has been an increase in the number of hand and upper extremity studies using national databases.20-24 Although missing data are a challenge even in high-quality randomized controlled trials, it is important to be aware of the prevalence of missing data in any database used in upper extremity surgery research. The NSQIP has been used with increasing frequency in hand and upper extremity studies—especially to identify risk factors for complications and readmission.1-3,5,8,9 Our study demonstrated that there were fewer variables associated with both major and minor complications after excluding patient entries with missing data. In addition, even when a variable was continued to be associated with a complication, the magnitude of the odds ratio varied after each level of exclusion. For example, the effect of recent weight loss on major complications increased with progressive exclusion. Overall, our findings show how failure to appropriately screen patient data and manage missingness across a large data set may lead to false conclusions and potential exacerbation of the already substantial risk of type I errors in large database studies. The relationships identified, and the magnitude of those relationships, may be inaccurate or misinterpreted if gaps in the data are not appropriately addressed.
In our complete case analysis, 14 different variables were associated with increased risk of major complications. Five of these variables were not associated with increased risk of major complications when any cases with missing data were excluded. Although the remaining 9 variables were also associated with major complications, the odds ratio changed by greater than 10% in 4 of these variables (Table 2). Interestingly, Pacific Islander/Hawaiian or Asian race was associated with the greatest odds of a major complication in group 3, suggesting these patients may be under-recorded in terms of demographics and comorbidities. Missing data had less of an impact on variables associated with decreased rates of major complication.
A similar pattern was observed with minor complications when analyzing the effects of progressive exclusion of patients with missing data. Among all patients, 11 variables were associated with increased rates of minor complications (Table 3). Five of these variables were not associated with increased rates of minor complications when all cases with missing data were excluded. Although the remaining 6 variables were also associated with major complications in group 3, the odds ratio changed by more than 10% in 4 of these variables. When looking at variables associated with decreased rates of minor complications, 4 variables were associated with decreased rates of minor complications in group 1. Only 1 variable (increased hematocrit) continued to be associated with decreased rates of minor complications in group 3 (Table 3).
Although some of these factors were significant in each group, the magnitude of the odds ratio and clinical relevance should be carefully interpreted. For example, increased albumin remained a significant predictor of major complication; however, the odds ratio among each group was close to 1.00, which conveys no clinical impact. In contrast, the odds ratio for major complications increased nearly 7 times for Pacific Islander/Hawaiians after excluding all patients with missing data, which could have strong clinical implications for a potentially underserved demographic. Because smaller institutions are allowed less stringent recording standards compared with larger centers with more resources, the factors with the strongest odds ratios can be preferentially prioritized and recorded.
Our findings illustrate how missing data can lead to inaccurate conclusions. One simulation study showed that missing data in more than 30% to 50% of cases can introduce bias. 25 The proportion of missing data in our study is similar to what was seen in a study assessing missing data on anterior cervical disk and fusion complication rates. 25 In this study, the authors used multiple imputation to replace missing data values of preoperative albumin and hematocrit, which constituted 63.5% of their study population. They then used their entire study cohort to show predictors of morbidity. Similar advantages of multiple imputation have been seen in unicompartmental arthroplasty for patients with missing data. 26 While the goals of our study were not to examine how different approaches to managing missing data may affect results, nor review the techniques to address missing data, we believe that multiple imputation and other robust methods should be considered in large database studies.
Another element highlighted by this study is the effect of power and sample size on these large database studies. Although the magnitude of certain odds ratios increased following progressive patient exclusion, such as recent weight loss for major complications, certain variables were no longer found to be significantly associated with the primary outcome. For example, in patients with a history of renal insufficiency, the odds ratio progressively decreased and, after excluding all patients with missing data, had a confidence interval of 0.99 to 5.40 and P > .05. With a larger sample, it is possible that this variable would still have been associated with major complications even after excluding all patients with missing data, as it is possible that renal insufficiency does increase risk of certain complications. This not only shows the potential advantages of larger samples available in database studies 17 but also highlights the analytic benefits of the previously mentioned statistical techniques, 25 such as multiple imputation, to replace missing variables and have more robust sample sizes.
There are limitations to this study. The primary aim was not to characterize predictors of postoperative morbidity; therefore, the data and findings regarding these predictors should be carefully interpreted. In addition, any of our findings regarding factors that predict complications are limited because NSQIP does not represent a stratified sample that can be scaled or is representative of the US healthcare system as a whole. The NSQIP database only tracks complications for a 30-day period postoperatively, and any complications that are recognized after this period would not be included. Although comorbidities are closely tracked, there is no manner to stratify the severity of comorbidities, which can impact perioperative outcomes. Furthermore, the CPT codes used to generate these patient groups reflect a heterogenic group of different surgeries but exclude many of the most commonly performed hand surgery procedures; however, the 53 types of surgeries analyzed carry similar perioperative risk and comorbidity as many of the lowest risk procedures common in a hand surgery practice are not tracked in NSQIP. One potential outlier to this is the “muscle flap to the upper extremity” CPT code 15736, depending on the flap used, although it was less than 5% of the cases in our series. It might be of future interest to specifically characterize perioperative risks in this patient subpopulation, but we do not address those issues here.
Finally, this study does not offer a solution as to how missing data should be properly treated, but rather highlights the importance of including the management of missing data—and reporting on that management—for any large database study. This includes not only methods to fill in the gaps (ie, imputations, as discussed above) but also analytic modalities that help to “de-noise” large data sets, including least absolute shrinkage and selection operator. How these different techniques can be used in hand surgery database research is the focus of future works. As with any database study, the impact of findings is based on much more than statistical analyses and P values. 17 Any lasting conclusions regarding patient factors that predict complication risk in upper extremity surgery require more rigorous study and analyses than those provided here.
Appropriate methods to handle missing data depend largely on whether the data are systemically or randomly missing, and in this study, it is not clear whether incomplete NSQIP entries are missing at random. Our goals in this study are not to invalidate all prior database studies, but to bring awareness of the effects of incomplete data and to promote that future studies consider and address this problem. The NSQIP remains a high-quality clinical surgical database and confers a number of advantages to researchers seeking to improve care. Our findings highlight the importance of recognizing database limitations and the need for appropriate handling of missing data.
Supplemental Material
sj-pdf-1-han-10.1177_15589447211023867 – Supplemental material for Impact of Missing Data on Identifying Risk Factors for Postoperative Complications in Hand Surgery
Supplemental material, sj-pdf-1-han-10.1177_15589447211023867 for Impact of Missing Data on Identifying Risk Factors for Postoperative Complications in Hand Surgery by Keith T. Aziz, Suresh K. Nayar, Dawn M. LaPorte, John V. Ingari and Aviram M. Giladi in HAND
Footnotes
Supplemental material is available in the online version of the article.
Ethical Approval
This article does not contain studies with human or animal subjects.
Statement of Human and Animal Rights
This article does not contain any studies with human or animal subjects.
Statement of Informed Consent
Informed consent was obtained when necessary.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
