Abstract
Aims:
To evaluate the validity of recorded chronic disease diagnoses in Icelandic healthcare registries.
Methods:
Eight different chronic diseases from multiple sub-specialties of medicine were validated with respect to accuracy, but not to timeliness. For each disease, 30 patients with a recorded diagnosis and 30 patients without the same diagnosis were randomly selected from >80,000 participants in the iStopMM trial, which includes 54% of the Icelandic population born before 1976. Each case was validated by chart review by physicians using predefined criteria.
Results:
The overall accuracy of the chronic disease diagnoses was 96% (95% CI 94–97%), ranging from 92 to 98% for individual diseases. After weighting for disease prevalence, the accuracy was estimated to be 98.5%. The overall positive predictive value (PPV) of chronic disease diagnosis was 93% (95% CI 89–96%) and the overall negative predictive value (NPV) was 99% (95% CI 96–100%). There were disease-specific differences in validity, most notably multiple sclerosis, where the PPV was 83%. Other disorders had PPVs between 93 and 97%. The NPV of most disorders was 100%, except for hypertension and heart failure, where it was 97 and 93%, respectively. Those who had the registered chronic disease had objective findings of disease in 96% of cases.
Conclusions:
Introduction
The incidence of chronic disease has increased in the last few decades as a result of the ageing population and improvements in the management of chronic disorders. In particular, multimorbidity (i.e. the presence of multiple concomitant chronic diseases) has increased dramatically. In a 2011 meta-analysis, Marengoni et al. [1] estimated the prevalence of multimorbidity to be 55–98% in older people. Chronic diseases are a large burden on healthcare systems, leading to an increased need and complexity of care and therefore increased costs [2]. Multimorbidity poses a challenge for clinical and epidemiological research because concomitantly present chronic diseases, referred to as comorbidities to one another, have additive and synergistic effects on most healthcare outcomes, including survival, especially in older people [3]. For example, our group has shown that comorbidities affect outcomes in multiple myeloma, a cancer of the bone marrow [4]. Importantly, chronic diseases are often causally related and not randomly distributed, leading to confounding in epidemiological and clinical studies. For example, diabetes mellitus increases the likelihood of cardiovascular disease [5] and smoking increases the risk of both lung cancer [6] and cardiovascular disease [7].
In order to manage healthcare systems and public health policy, data on chronic diseases are needed to assess their prevalence and outcomes and to adjust for comorbidities in epidemiological and clinical studies. However, this requires that accurate and complete data are available because incorrect classification may introduce new and unpredictable information bias and insufficient data may lead to an underestimation of disease prevalence and residual confounding.
To acquire data on chronic disease, many researchers use questionnaires on self-reported previous medical history. However, such questionaries are time consuming, have been shown to be inaccurate [8] and may lead to recall bias [9]. An alternative and more convenient method is to use administrative databases or central healthcare registries where medical diagnoses are prospectively recorded. These registries are widely available, but their quality varies. In North America, large administrative databases are available that are usually coded by non-physician coders. The sensitivity in these databases can be as low as 12% for cirrhotic liver disease and around 50 and 56% for myocardial infarction and malignancy, respectively, both important and common chronic diseases [10]. By contrast, high-quality healthcare registries with diagnoses recorded by diagnosing physicians have been available in the Nordic countries for decades and have been used in large epidemiological studies [11]. This is especially true in Sweden and Denmark, where large, high-quality, population-based registries are available with overall positive predictive values (PPVs) of correct diagnosis of around 90% [12,13].
A cancer registry was established in Iceland in 1955 and has since included high-quality data on cancer diagnosis that have been shown to have high validity, completeness and timeliness[14]. Registries of other medical diagnoses have been available since 1999 with the establishment of the Hospital Discharge Registry, which includes diagnoses made during inpatient care in the universal Icelandic healthcare system as International Classification of Diseases version 10 (ICD-10) codes [15]. The registry also includes outpatient contacts in hospitals since 2010 [16]. The Icelandic Register of Primary Health Care Contacts was established in 2004 and has since included ICD-10 coded diagnoses made in primary healthcare centres around Iceland [17]. Both registries are kept by the Icelandic Directorate of Health. We are not aware of any studies evaluating the validity of diagnoses made in these two registries, hereafter referred to as the Icelandic healthcare registries (IHR). We were therefore motivated to evaluate the validity of chronic disease diagnoses that are often comorbidities in the IHR.
Methods
We are currently conducting the Iceland Screens, Treats or Prevents Multiple Myeloma study (iStopMM), a population-based screening study for multiple myeloma and its precursors. A total of 80,759 participants born in 1975 or earlier (54.4% of the population in that age group) provided informed consent for study participation. This includes cross-linking of study data to data in the IHR using a national identification number provided to all residents of Iceland by Registers Iceland. Participants in iStopMM with at least one diagnosis in the IHR on 1 April 2019 were included in the study. The iStopMM study and its recruitment has been described in greater detail elsewhere [18].
We evaluated the validity of eight chronic diseases from different medical specialties. The diseases chosen are long-term chronic disorders that are either very common and/or are associated with objective findings that are detectable in chart review. The diseases were defined by ICD-10 diagnostic codes as detailed in Table I and were chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), diabetes mellitus, heart failure, hypertension, hypothyroidism, ischaemic heart disease (IHD) and multiple sclerosis. For each disease, 30 participants with registered codes for the disease (positive cases) and 30 participants without a registered code for the disease (negative cases) were randomly selected for review from among the 80,759 registered participants of the iStopMM study.
International Classification of Diseases 10th Revision (ICD-10) codes used to define the chronic diseases assessed in this study.
To estimate the validity of the presence of these chronic diseases rather than the timeliness of the diagnosis registration, we evaluated whether participants had the disease in question on 1 April 2019 or at death, whichever occurred first. The medical records of each disease group were reviewed by one of three physicians (SR, ÞEL or ST), who were blinded to the disease status of the participants. The presence of the chronic disease in each case was determined using pre-specified criteria in steps ranging from the highest degree of certainty, (i.e. objective diagnostic testing available) to the lowest degree of certainty (i.e. diagnosis recorded in physician notes). The criteria are detailed in Table II. In cases of uncertainty by the reviewers, the case was brought up in a meeting of the medical record reviewers. False positive and false negative cases were re-reviewed in a meeting of the medical record reviewers in case there were factors not captured by the pre-specified criteria or if the pre-specified criteria captured participants who did not have the disease in question. Data were collected and managed using the REDCap electronic data capture tools hosted at the University of Iceland [19,20].
Pre-specified criteria used to determine whether disease was present in the study participants. Cases were reviewed in a stepwise fashion with steps ranging from most certainty (step 1) to least certainty.
eGFR: estimated glomerular filtration rate; HbA1c: haemoglobin A1c; BNP: brain natriuretic peptide; free-T4: free thyroxine 4.
The PPVs and negative predictive values (NPVs) were calculated for each of the disease diagnoses before and after re-evaluation of false positives and false negatives. The total PPV and NPV for all the diseases were then calculated and combined into total accuracy. The 95% confidence intervals (95% CI) were calculated using the Wilson score interval. The average of the PPVs and NPVs, weighted by the disease prevalence in the IHR, were calculated to estimate the accuracy of the IHR. R statistical software [21], including the tidyverse [22] and Hmisc [23] packages, were used for data wrangling and statistical analysis.
Results
After medical chart review, 460 of the 480 participants in the random sample were found to be correctly identified as having or not having a chronic disease, providing a total accuracy of 96% (95% CI 94–97%). After weighing the PPVs and NPVs for each disease by their prevalence, we estimated the total accuracy of the IHR for chronic disease diagnoses to be 98.5%. Of the 240 participants with registered disease diagnosis, 223 were found to have the disease in chart review (i.e. were true positives), providing a PPV of 93% (95% CI 89–96%). Of the 240 participants without registered chronic disease diagnosis, only three were found to have that disease(i.e. false negatives), leading to a total NPV of 99% (95% CI 96–100%; Table III). Of the 223 true positive cases, 214 (96%) had a priori defined objective findings of disease and of the three false negative cases, three (100%) had a priori defined objective findings of disease. After review of the false negative and false positive cases, the results of four original chart reviews were overturned (Supplemental table).
Confusion matrix of the results of the chart review.
The PPV was generally high for all the different diseases (Table IV), but was highest in cases of heart failure and diabetes mellitus (97%). The only disease with a PPV <90% was multiple sclerosis (83%). In most of the false positive cases found in multiple sclerosis, the codes were found to be registered as part of neurological work-up that did not reveal multiple sclerosis. We also noted that in CKD, where the PPV was 90%, false positive cases were associated with the ICD-10 code N19 (unspecified kidney failure). However, we found that 10 of the 30 participants with CKD codes only had code N19, of which seven (23% of the total) were found to have CKD.The NPV was high for all the different diseases, with most diseases having an NPV of 100% (Table IV). Only heart failure and hypertension had a NPV <100% (93 and 97%, respectively).
Results of the chart review for each disease and calculated positive and negative predictive values and total accuracy for each disease. Grey shading marks objective findings of disease.
PPV: positive predictive value; NPV: negative predictive value.
Discussion
In this first study of the validity of chronic disease diagnoses in the IHR, we found that 96% of participants were correctly characterised as having or not having a chronic disease. This is similar to, or higher than, other high-quality health care registries in other Nordic countries. For example, the PPV in the Swedish inpatient register has been estimated to be between 85 and 95% [12]. After weighing the PPV and NPV by the prevalence of the disorders evaluated, we estimate that the overall accuracy of the IHR with regard to the presence or absence of chronic disease is 98.45%. These findings indicate that the accuracy of diagnoses in the IHR is very high. Furthermore, we found objective evidence of disease, beyond the recorded opinion of the treating physician (e.g. blood tests and medical treatment) in 96% of true positives. This indicates that chronic disease diagnoses in the IHR are not only recorded correctly, but are also medically accurate. Taken together, the results show that chronic disease diagnoses in the IHR are highly valid and that the data can be used to accurately determine the presence of chronic disease in research or public health policy-making. Compared with similar registries in the Nordic countries and internationally, the data in the IHR are of high quality. These finding provide confidence in studies utilising chronic disease data from the IHR that have broader international implications. Although the study findings do not identify any particular factors that increase data quality, we speculate that the high data quality in the IHR can be attributed to the diagnoses being physician-recorded and the fact that the small Icelandic medical community were all trained, at least in part, at the same institutions, leading to more uniform registration.
There were some disease-specific differences in the validity of the IHR, but these differences were minimal. All of the diagnoses included had 95% overall accuracy, except multiple sclerosis (92%), where the PPV was 83%. This was caused by participants being incorrectly assigned a diagnostic code during work-up that did not lead to a diagnosis of multiple sclerosis. We also noted that one of the codes for CKD, N19 (unspecified kidney failure), led to a few cases of acute kidney failure being registered as CKD. However, a considerable number of cases that had CKD were only registered as N19. We therefore recommend including N19 when defining CKD in the registry because the increase in specificity gained would not justify the loss in sensitivity. Given the consistent high quality of data between the included disorders, it is likely that the same applies for other chronic diseases not included in the analysis. However, there may be some individual exceptions to this, as seen here in multiple sclerosis.
This study has some strengths. First, by including a broad range of chronic diseases diagnosed and treated by different sub-specialties, the study is, to some degree, generalisable to diagnoses not included in this study. Furthermore, the study included a random sample of participants from a cohort including a large proportion of the Icelandic population, which supports the generalizability of the results and decreases the possibility of selection bias. Second, by using a step-based criteria of validation certainty, we have also gained insight into the level of diagnostic certainty associated with chronic disease diagnoses in the IHR. Most importantly, because the chronic diseases included (e.g. as comorbidities) in epidemiological studies are those that have been diagnosed before a certain date, like in this study, the study findings provide a measure of validity of chronic disease diagnoses as they are included as comorbidities in research. Therefore they can be used when interpreting studies incorporating chronic disease data from the IHR.
This study also has some limitations. First, we only included eight chronic diseases and there may be other chronic diseases that have poorer registration. Most notably, we did not include rheumatological disease due to the degree of diagnostic uncertainty in rheumatological syndromes and because rheumatology medical records are mostly found in private practices in Iceland. Second, we did not assess the timeliness of disease diagnoses or acute illness, such as infection or thrombosis. Although the total number of patients reviewed was high, relatively few patients were included in the PPV and NPV calculation for specific diseases. Therefore the disease-specific PPVs and NPVs should be interpreted with caution.
In summary, we evaluated the validity of chronic disease diagnoses in the IHR and found it to be very high, both in terms of registration and objective validity. These findings show that chronic disease data from the IHR are highly valid and can be used with confidence when evaluating chronic diseases in Iceland and in epidemiological and clinical research.
Supplemental Material
sj-docx-1-sjp-10.1177_14034948211059974 – Supplemental material for Validity of chronic disease diagnoses in Icelandic healthcare registries
Supplemental material, sj-docx-1-sjp-10.1177_14034948211059974 for Validity of chronic disease diagnoses in Icelandic healthcare registries by Sæmundur Rögnvaldsson, Thorir Einarsson Long, Sigrun Thorsteinsdottir, Thorvardur Jon Love and Sigurður Yngvi Kristinsson in Scandinavian Journal of Public Health
Footnotes
Acknowledgements
Screening tests were performed by The Binding Site Ltd, Birmingham, UK. Cross-linking of study data to national registries was performed by the Icelandic Directorate of Health and the Icelandic Cancer Society. This study was made possible by the hundreds of nurses, laboratory technicians and physicians around Iceland who collect blood samples from participants for screening or during follow-up and provide clinical care that is not part of the study. Icelandic and international myeloma patient organisations, including Perluvinir, the Icelandic myeloma patient organisation, and the International Myeloma Foundation provided the project with important perspectives and networks. The information technology department and clinical laboratory staff at The National University Hospital of Iceland made the complex sampling processes required for the study possible. The staff at Loftfar, Aton, Miðlun and Hvíta Húsið played an integral part in the development of recruiting strategies and media campaigns. Decode genetics have generously provided the study team with important insights and access to their state-of-the-art facilities. Presidents Vigdís Finnbogadóttir and Guðni Th Jóhannesson are thanked for their public support of the study. The iStopMM team are particularly acknowledged for their hard and dedicated work. Special thanks go to the residents of Akranes, Iceland who participated in the pilot and proof-of-concept of the study’s recruitment. Most importantly, acknowledgements go to the thousands of Icelanders who have generously provided their informed consent for the study, answered questionnaires, provided blood samples, and undergone diagnostic testing and follow-up.
Author contributions
The study was designed by all the co-authors and overseen by SYK. Data acquisition and data analysis was performed by SR. Medical chart review was performed by SR, ÞEL and ST. The paper was written by SR and edited by all the co-authors.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: SYK has received research funding from Amgen and Celgene. Other co-authors have nothing to disclose.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article. The iStopMM study is funded by the Black Swan Research Initiative by the International Myeloma Foundation and the Icelandic Centre for Research (grant agreement No. 173857). This project has also received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 716677). Additional funding was provided by the University of Iceland, Landspítali University Hospital and the Icelandic Cancer Society.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
