Abstract
Background
The use of administrative data for pharmacoepidemiology research on chronic rhinosinusitis (CRS) has become increasingly popular. Although large sample sizes and ease of accessibility have made electronic health data an attractive data option, the risk of inaccurate cohort identification can lead to biased outcomes.
Objectives
The objectives of this systematic review were to (1) report current case definitions for CRS used in administrative data base research, and (2) define the various administrative data bases used for CRS research.
Methods
Medical literature data bases were searched from the date of their inception to February 1, 2015. Included studies were publications that obtained CRS-specific data from a health records data base. Studies were excluded if they evaluated a non-CRS cohort, failed to use or report an international classification of disease (ICD) code in the case definition, or published in a non-peer-reviewed journal.
Results
Of the 27 studies that met inclusion criteria, 8 different CRS case definitions were identified and 13 administrative data bases were evaluated. Of the 8 different CRS case definitions identified, only one was validated. The most commonly used CRS case definition was the ICD-9 473.x code alone.
Conclusion
To optimize the accuracy of pharmacoepidemiologic research for CRS that used administrative data, it is important to apply appropriate case definitions for CRS. Various nonvalidated CRS case definitions are currently being used in administrative data base research. There is a need to develop a generalizable and validated ICD-based CRS case definition to increase the accuracy of future pharmacoepidemiologic research.
The International Classification of Diseases (ICD) system is a standardized international diagnostic coding tool for epidemiology, health management, and clinical purposes. 9 The ninth revision of the ICD codes (ICD-9) was developed in 1975 and has become the focus of cohort identification within health care administrative data bases.5,10–13 The U.S. National Center for Health Statistics created ICD-9 Clinical Modification (ICD-9-CM), which improved morbidity coding for hospital and ambulatory care settings. 9 Nonetheless, the accuracy of ICD-9 codes to identify disease cohorts has become increasingly scrutinized due to the known risk of producing faulty conclusions related to inaccurate cohort definition.6,7,9 Potential sources of bias include upcoding or gaming and variation of the coder (physician or administrator) knowledge for using the appropriate diagnostic codes. These factors can have tremendous influence on data validity. 20 To improve the quality of pharmacoepidemiologic research that uses administrative data, there has been a growing effort to develop validated case definitions for specific diseases.8,11–14
The primary objective of this systematic review was to identify and report the current case definitions of CRS used in health care administrative data base research. The secondary objective was to define the various administrative data bases used for currently published CRS research. The purpose of this study was to identify current gaps in administrative data base research, which may be used to help guide the development of a validated ICD-based case definition for CRS.
Methods
A systematic review by using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines was performed in February 2015. Data bases included Ovid Medline (Ovid Technologies, Salt Lake City, UT; 1946 to 2015 week 2), Embase (Elsevier B.V., Amsterdam, The Netherlands; 1980 to 2015 week 2), and Evidence-Based Medicine Reviews (EBM Reviews; Ovid Technologies, Salt Lake City, UT), chronic methodology register (third quarter 2012). 15 The search terms included the following: “(ICD-9 or ICD 9 or ICD-10 or ICD 10 or data base or administrative data or medical record or physician claims or claims or cod$ or discharge data) and (chronic and $sinusitis).” The $ symbol was used to provide an unlimited truncation strategy. Inclusion criteria were publications that explicitly evaluated a CRS cohort and obtained data from an administrative data base. The search was restricted to human studies published in English. Studies were excluded if they evaluated non-CRS cohorts of sinusitis, reported in non–peer-reviewed formats, or failed to incorporate an ICD-based code. Bibliographies of included articles were subsequently screened for possible articles for inclusion.
Two reviewers (J.L., L.R.) independently screened all abstracts to identify studies that fulfilled the predetermined eligibility criteria. Any disagreement between the reviewers was resolved by consensus. Data from each included study were extracted by using standardized data forms and included the following variables: authors, year, country, population demographics, type of administrative data base, ICD codes applied, and other non-ICD codes applied.
Results
A total of 528 abstracts were identified with our search strategy, and 7 additional articles were identified after bibliography reviews. After de-duplication, the abstracts of 448 were screened, and 42 articles were selected for full-text review. A total of 15 articles did not meet inclusion criteria. Articles were excluded due to the use of non-ICD coding mechanisms or the identification of other disease cohorts, or they were non-peer reviewed in nature (posters, conference presentations). Meta-analysis was not pursued secondary to the heterogeneous and observable nature of the study. A total of 27 articles were assessed for qualitative synthesis (Fig. 1).

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram of the identification of studies.
A total of eight different case definitions for CRS used in administrative data base research were identified (Table 1).16,2^ 51 The definition of each individual ICD-9 code used in this review is outlined in Table 2. Thirteen of the 27 studies (48%) applied a single ICD-9-CM 473.x case definition for CRS. The second most common CRS case definition that applied both ICD-9–CM 471.x and 473.x codes was seen in 5 of 27 studies (19%). Only one of eight CRS case definitions was validated, which was defined as two or more office visits, no cystic fibrosis diagnosis, a 473.x or sinus surgery Common Procedural Terminology (CPT) code, and one or more otolaryngology—head and neck surgery visits with at least one of the following conditions: (a) 471.0, 471.8, 471.9 nasal polyposis surgery CPT codes; or (b) 471.x, 473.x plus sinus surgery CPT codes; or (c) 473.x at two or more otolaryngology—head and neck surgery visits.
CRS case definitions used in administrative data base research
470.x = Deviated nasal septum; 471.x = nasal polyposis; 472.0 = chronic rhinitis; 473.x = chronic sinusitis; Oto-HNS = Otolaryngology-Head & Neck Surgery; ESS = Endoscopic Sinus Surgery; CT = computed tomography; CF = cystic fibrosis; NP = nasal polyposis; PPV = positive predictive value; CRS = chronic rhinosinusitis; ICD = international classification of diseases.
31237, 31240, 31254–31256, 31267, 31276, and 31287–31288.
31000, 31002, 31020, 31084–31087, 31090, 31200–31201, and 31205.
30110, 30115, and 31032.
31000, 31030, 31050–31051, 31090, 31205, 31237, 31254–31256, 31267, 31276, 31287–31288, and 31295–31297.
31254–31295, and 31299.
ICD CRS-related codes
Thirteen different administrative data bases were identified that evaluated a CRS cohort (Table 3). The two most used data bases were Taiwan's Longitudinal Health Insurance Database and the U.S.-based National Ambulatory Medical Survey. Five studies used Truven Health's MarketScan Commercial Claims and Encounter Database (Ann Arbor, MI), which is a private medical and drug insurance claims data base. The National Hospital Ambulatory Medical Care Survey was identified as the data set in two instances. The remaining national health records, the National Disease Treatment Index and the Medical Expenditure Panel Survey, were cited once each. The remaining seven articles used regional health record systems for their data.
CRS case definitions used in administrative data base research
Discussion
In this systematic review, we evaluated the current case definitions for CRS used in administrative data base research. Overall, 27 studies applied 8 different ICD-9-CM-based CRS definitions, with the most common being 473.x alone, followed by 471.x plus 473.x codes. Of the eight CRS case definitions identified, only one was validated. The design of this validation study used two independent reviewers of health records to confirm guideline-specific CRS diagnoses; however, the definition included CPT codes, which are specific to the United States. 16 Thus, the generalizability to other health care systems is not possible. To reduce the risk of biased outcomes from administrative data base research for CRS, there is a need to create a generalizable ICD-9-CM-based case definition of CRS capable of being used for international administrative data base research.
Health care administrative data bases allow for population-based cohort studies to generate large sample sizes with small financial and time expense.5,6,10 In 1978, the World Health Organization published ICD-9, which was the ninth revision of the mortality reporting classification system. To complement this, the U.S. National Center for Health Statistics developed the ICD-9-CM to improve morbidity coding for hospital and ambulatory care settings. 9 The addition of a fifth digit allowed for greater precision of diagnosis when applicable.9,17 The ICD-9-CM, which contains >12,000 diagnostic and 3500 procedural codes, consists of three volumes: the Diseases Tabular List, the Diseases Alphabetic Index, and the Procedures Tabular List and Alphabetic Index. 18 To complement ICD-9-CM, country-specific procedural classification systems have been developed, such as the CPT codes in the United States and the Canadian Classification of Interventions in Canada. 19
In 1999, ICD-10 was introduced to replace ICD-9 for coding mortality data in the United States. Since then, 153 countries have adopted this iteration and have implemented country-specific codes to complement regional practices. Several countries have developed nation-specific ICD-10 versions to meet regional needs, such as Australia (ICD-10-AM), Canada (ICD-10-CA), and Germany (ICD-10-GM). This widespread adoption and universality of its coding promotes global exchange of valuable health record data. 20 Conversely, ICD-10 is specific to mortality data and its morbidity coding counterpart, ICD-10-CM, will likely be introduced in October 2015, along with the now separate ICD-10 Procedural Coding System (ICD-10-PCS). 21 The U.S. National Center for Health Statistics-led adaptation to ICD-10–CM/PCS has received significant input from various stakeholders, including physicians and expert coders, to improve clinical precision. 21
Unlike its predecessor, ICD-10 boasts 87,000 diagnostic codes; can specify anatomic sites; and includes ambulatory care, home health, and skilled nursing. 20 Moreover, ICD-10 boasts twice the amount of disease categories as ICD-9, has the capacity to include new diseases, and introduces new subcategories that capture risk factors and psychosocial factors. 19 As a result, ICD-10 has been touted to provide a more accurate reflection of medical terminology.19,20 However, in a recent study by Quan et al., 22 current ICD-10 codes failed to provide a significant improvement in sensitivity for identifying 32 diagnoses assessed.
The accuracy of ICD codes has been called into question across multiple disciplines, and a concerted effort to generate validated case definitions has been initiated.5,8,11,12 Coding knowledge and expertise varies greatly because some data bases employ trained health technologists for indexing, whereas others rely on physician input.9,16 Furthermore, the continuous changeover of clinical guidelines, definitions, and coding systems makes accurately identifying patient cohorts in administrative data bases very challenging. The validation of cohort definitions is achieved by measuring sensitivity and specificity of numerous ICD-coded algorithms within a given data base. These case definitions have been altered to serve particular purposes, including surveillance (high sensitivity) or diagnostic certainty (high specificity). 14
A recent study by Hsu et al. 16 is the only study to evaluate the accuracy of various ICD-9-based case definitions for CRS to our knowledge. They demonstrated that the use of 473.x alone (chronic sinusitis) resulted in a positive predictive value of 34%. 16 Unfortunately, the sensitivity and specificity with 473.x were not reported. Although the reasons why the majority of studies used the CRS case definition of 473.x alone are unknown, the primary advantages of using a single ICD-9 code include ease of use and that is provides a high sensitivity by minimizing the exclusion of patients with true CRS. The primary disadvantage of using 473.x alone is the low specificity because the cohort would potentially include a large number of patients without true CRS.
To improve the accuracy of only using the ICD-9-CM 473.x code, Hsu et al. 16 evaluated various combinations of ICD-9-CM codes and incorporated CPT codes for Endoscopic Sinus Surgery. By using a data base of >2 million entries and a time frame of 23 years, the investigators limited the study population to those who had more than two office visits to ensure sufficient clinical information. 16 Moreover, they randomly selected 996 charts for review. Multiple iterations of search queries were used, and the results of two iterations were published. The initial algorithm's positive predictive value increased from 54% to 91% after several changes were introduced, including the removal of 471.1, an ICD-9-CM code often associated with non-CRS–related diagnoses, which required a specialty evaluation consultation (otolaryngology—head and neck surgery or allergy-immunology), and, finally, adjusted CPT codes to more accurately reflect CRS-related procedures. The sensitivity and specificity of the algorithm were not calculated secondary to resource constraints. 16
A disadvantage of the CRS case definition generated by Hsu et al., 16 is the incorporation of U.S.-based CPT codes, which are owned and managed by the American Medical Association. 23 Although treatment codes have been shown to increase the positive predictive value of a case definition, the use of the CPT coding system reduces the generalizability to use this validated CRS case definition in other countries. 24 For example, procedural coding in Canada adheres to the Canadian Classification of Health Interventions, and, in Australia, the Australian Classification of Health Interventions is used. Therefore, for cross-border applicability of a validated CRS case definition, a more universal ICD-based algorithm should be generated. To confirm the validity of such a generalizable case definition, the accuracy of a search by using various administrative data bases from different countries should be evaluated.11,25
The primary limitation of this systematic review is that the literature search was limited to the English language in peer-reviewed publications thus, potentially missing some publications. To ameliorate this limitation, two reference data bases with dissimilar search interests (EMBASE and Medline) were searched. Second, we excluded studies that failed to use an ICD-9 code as part of their CRS definition. We intended to identify case definitions for CRS that were internationally generalizable, thus we excluded definitions that used country-specific case definitions. Developing a generalizable case definition for CRS will allow for appropriate international comparisons and validation of study outcomes.
Conclusion
The use of health care administrative data bases to perform population-based pharmacoepidemiologic research has increased due to providing large sample sizes, longitudinal outcomes, and ease of access. However, to promote accurate and valid outcomes, it is imperative to develop validated disease-specific case definitions to minimize the risk of information bias. Outcomes from this systematic review demonstrated that various nonvalidated CRS case definitions are currently being used in administrative data base research. There is a need to develop a validated and generalizable ICD-based case definition for CRS to increase the accuracy of future pharmacoepidemiologic research.
