Abstract
The study objective was to facilitate investigations by assessing the external validity and generalizability of the Centricity Electronic Medical Record (EMR) database and analytical results to the US population using the National Ambulatory Medical Care Survey (NAMCS) data and results as an appropriate validation resource. Demographic and diagnostic data from the NAMCS were compared to similar data from the Centricity EMR database, and the impact of the different methods of data collection was analyzed.
Compared to NAMCS survey data on visits, Centricity EMR data shows higher proportions of visits by younger patients and by females. Other comparisons suggest more acute visits in Centricity and more chronic visits in NAMCS. The key finding from the Centricity EMR is more visits for the 13 chronic conditions highlighted in the NAMCS survey, with virtually all comparisons showing higher proportions in Centricity.
Although data and results from Centricity and NAMCS are not perfectly comparable, once techniques are employed to deal with limitations, Centricity data appear more sensitive in capturing diagnoses, especially chronic diagnoses. Likely explanations include differences in data collection using the EMR versus the survey, particularly more comprehensive medical documentation requirements for the Centricity EMR and its inclusion of laboratory results and medication data collected over time, compared to the survey, which focused on the primary reason for that visit. It is likely that Centricity data reflect medical problems more accurately and provide a more accurate estimate of the distribution of diagnoses in ambulatory visits in the United States. Further research should address potential methodological approaches to maximize the validity and utility of EMR databases. (Population Health Management 2010;13:139–150)
The purpose of this investigation is to facilitate future investigations by assessing the external validity and generalizability of Centricity data and analytic results to the US population as a whole. Our literature review identified the National Ambulatory Medical Care Survey (NAMCS) as an appropriate validation resource for this study. Supported by the Federal Centers for Disease Control and Prevention (CDC), NAMCS collects data regarding patient visits from a sample of physicians. Like Centricity, the NAMCS provides a de-identified data set that includes patients' demographic characteristics, physicians' diagnoses, and medications ordered and/or provided. NAMCS also publishes reports with descriptive statistics on patients' demographic characteristics and conditions including diagnoses and treatments. However, the 2 data sets employ different methods of data collection: NAMCS collects data from physicians using a survey tool at the time of the patient visit, posing a modest threat to internal validity, while the Centricity data set is based on an EMR, in which high data quality stem from professional responsibility, the neglect of which could compromise quality care and endanger patient safety. NAMCS data include sample weights that allow making statistical inferences from its national probability sample, providing a basis for interpreting possible deviations between nationally representative NAMCS results and corresponding results derived from the Centricity EMR database. We evaluated a number of demographic, diagnostic, and disease state variables for external validity and generalizability.
Background
A review of the peer-reviewed literature provides a basis for the present effort to assess the external validity and generalizability of the GE Centricity EMR data set and its capacity to support population-level research on chronic conditions in the United States. Although the methods employed in this literature review were not formal, we believe that the results are valid, especially given our ability to query Centricity staff about all publications to date based on Centricity EMR data.
Our review includes several examples of recently published research that employed secondary analyses of federal data sets and/or data derived from widely used EMR systems.
In a study by Burt and associates, 1 data from NAMCS and its counterpart in hospitals, the National Hospital Ambulatory Medical Care Survey (NHAMCS), were used to investigate physician treatment and prescribing patterns. Specifically, using a multiplicity estimator, NAMCS and NHAMCS data on the number of past visits the patient made to the sample provider during the 1-year period prior to the sampled visit were used to estimate the number of patients. The resulting distribution of patients by annual number of visits is similar to the estimated distribution of persons in the United States making ambulatory care visits based on a population-based survey.
Another recent report, by Hing and associates, 2 describes ambulatory visits to physician offices in the United States and presents statistics on selected characteristics of physician practices, patients, office visits, and trends in visits. The data were collected in the 2004 NAMCS and were weighted to produce annual national estimates by employing an estimator using revised non-response adjustment. The authors found that in 2004 an estimated 910.9 million visits were made to physician offices in the United States, an overall rate of 315.9 visits per 100 persons, with 58.9% of visits to physicians in the specialties of general and family practice, internal medicine, pediatrics, and obstetrics and gynecology. Essential hypertension, malignant neoplasms, acute upper respiratory infection, and diabetes mellitus were the leading illness-related primary diagnoses.
An earlier paper by Machlin and colleagues 3 compared 1996 survey estimates using NAMCS, NHAMCS, the Medical Expenditure Panel Survey, and the National Health Interview Survey and described methodological issues to consider when using these data sets to measure ambulatory utilization.
There has been considerable research in the last few years based on analyses of Centricity EMR data. In fact, focusing on just US national research using Medical Quality Improvement Consortium (MQIC) data in particular, there were 12 journal publications during 2006–2007, and 31 poster/podium presentations during 2004–2008. 4 For example, Brixner and colleagues 5 used clinical (biometric), diagnosis (ICD-9-CM codes), and treatment (prescription) information in the Centricity EMR database to examine the prevalence of cardiometabolic risk (CMR) factors that contribute to metabolic syndrome in the primary care setting. In the study population of 475,651 patients with information on indicators of CMR, 15.3% and 11.8% were found to have metabolic syndrome according to National Cholesterol Education Program and International Diabetes Federation criteria, respectively; 34.2% had a body mass index (BMI) ≥ 27 kg/m2 as a risk factor, 56.0% had high blood pressure, 10.7% had high triglycerides, 16.0% had low high-density lipoprotein cholesterol (HDL-C), 8.8% had impaired fasting glucose, and 7.2% had diabetes. Brixner and colleagues conclude that “the distribution of CMR factors in a primary care database is similar to that established by prospective national health surveys such as NAMCS. A key source of identification of risk factors are clinical outcomes including BMI and lab values. Future studies on metabolic syndrome need to link clinically based information with more readily available treatment and diagnosis information.”
In a 2008 article, Brixner and colleagues 6 evaluated the relationships between CMR factors and BMI as recorded in the Centricity EMR. Patients with a BMI ≥ 18 kg/m2 in the EMR at any time during the 10-year period from 1996–2005 were stratified into groups by number of CMR factors, and individual risk factor for those with only 1. The authors identified a total of 499,593 patients with a BMI ≥ 18 kg/m2; 56.4% had a BMI > 27 kg/m2, while 43.6% had a BMI of 18–27 kg/m2. Compared with patients with no risk factors, patients with 1-4 risk factors were significantly more likely to have a BMI > 27 kg/m2; 48.4% without CMR factors had a BMI > 27 kg/m2, compared with 63.3%, 79.8%, 84.6%, and 88.5% for patients with 1-4 cardiometabolic risk factors, respectively. Adjusted odds ratios (ORs) for having a BMI > 27 kg/m2 were 2.64 for type 2 diabetes, 2.21 for elevated triglycerides, 1.91 for hypertension, and 1.45 for low HDL-C. Adjusted ORs for having a BMI > 27 kg/m2 were 3.58 for patients with any 2 risk factors, 4.24 for patients with any 3 risk factors, and 5.07 for patients with any 4 risk factors, relative to patients with no CMR factors. Brixner and colleagues conclude that “For patients with cardiometabolic risk factors, compared with patients with no risk factors, the odds of having a BMI > 27 kg/m2 were multiplied by 1.45–5.07, depending on the type and number of risk factors. Diagnoses and treatment indicators for cardiometabolic risk factors are potential indicators of obesity.”
In another recent use of secondary data from the Centricity EMR database, Wang and colleagues 7 compared the risk of incident hypertension associated with the use of celecoxib and nonselective non-steroidal anti-inflammatory drugs (NSAIDs).
One of the most relevant and useful analyses of Centricity EMR data, especially for the present analyses, is Gill and Chen's 2008 evaluation of lipid management, 8 which includes adequate lipid testing, achievement of lipid goals, and appropriate use of lipid-lowering medication. Lipid testing was adequate for 62% of high-risk, 67% of moderate-risk, and 36% of low-risk patients. Lipid goals were achieved in 65% of high-risk, 66% of moderate-risk, and 90% of low-risk patients; 35% of high-risk, 45% of moderate-risk, and 32% of low-risk patients achieved adequate testing and optimal goals; and medications were appropriately prescribed for 70% of high-risk, 47% of moderate-risk, and 48% of low-risk patients. Gill and Chen note that “National EHR networks are excellent vehicles for large outpatient quality of care studies, particularly for measuring clinical outcomes such as lipid levels.”
In summary, these studies show that national EMR databases such as Centricity are valuable tools for health services research including epidemiologic and outcomes research and studies of provider behavior. EMR data sets have several attributes that provide significant opportunities for such research. First, identification of a patient's primary reason for a visit not only provides deeper information than that available in International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis codes in claims databases, but it also allows cross-tabulation with disease state and prescribing information. Second, a data set derived from algorithm-based extraction of data from EMRs is inherently more comprehensive and accurate than a data set derived from patient- or visit-level survey records, the validity of which depends on the cooperation and attention of responding physicians and office staff. The factorial analysis of metabolic syndrome by Brixner and colleagues 5 is an early indicator of the potential for multivariate analyses of EMR data, specifically investigations of comorbidities and disease staging. Currently, results of such studies may be of limited generalizability to the degree that patient demographic and clinical characteristics represented in EMR data sets have not been validated as representative of the US population. Comparing patient attributes represented in the NAMCS data to those in the MQIC data set derived from the GE Centricity EMR, the present study provides such results to address the generalizability of this EMR-based data set. Specifically, the analysis explores various factors (ie, differences in data collection methods) that pose potential analytical limitations. The results are intended to provide insight into the strengths and weaknesses of the GE Centricity EMR as a tool for population health research, informing future studies and stimulating promising lines of research.
Methods
National Ambulatory Medical Care Survey
As described on the CDC Web site, 9 “The National Ambulatory Medical Care Survey (NAMCS) is a national survey designed to meet the need for objective, reliable information about the provision and use of ambulatory medical care services in the United States. Findings are based on a sample of visits to non-federally employed office-based physicians who are primarily engaged in direct patient care. Physicians in the specialties of anesthesiology, pathology, and radiology are excluded from the survey. The survey [has been] conducted … annually since 1989.”
“Specially trained interviewers visit the physicians prior to their participation in the survey in order to provide them with survey materials and instruct them on how to complete the forms. Data collection from the physician, rather than from the patient, provides an analytic base that expands information on ambulatory care collected through other NCHS surveys. Each physician is randomly assigned to a 1-week reporting period. During this period, data for a systematic random sample of visits are recorded by the physician or office staff on an encounter form provided for that purpose. Data are obtained on patients' symptoms, physicians' diagnoses, and medications ordered or provided. The survey also provides statistics on the demographic characteristics of patients and services provided, including information on diagnostic procedures, patient management, and planned future treatment.” 9
An enhancement implemented in the 2005 NAMCS allowed chronic condition-specific comparisons: “The emphasis for the 2005 survey year was chronic conditions. Additions to the routine encounter data that related to chronic conditions included: a chronic disease checklist, including conditions affecting the respiratory, cardiovascular, renal, and endocrine systems; arthritis; cancer; depression; obesity; and osteoporosis.” 9
GE Centricity database
The GE Centricity EMR database captures patient-level clinical data elements obtained from the Centricity Physician Office EMR (formerly Logician) for Clinical Data Services (CDS) reporting. The Centricity ambulatory care EMR and its predecessors have been used for over 20 years, are certified by the Certification Commission for Healthcare Information Technology (CCHIT), and are currently used by over 30,000 clinicians in the United States. Centricity CDS includes data provided by 7259 clinicians (including approximately 60% primary care providers and 40% specialists) at 98 installations with 133 unique provider members. CDS includes de-identified, standardized data on more than 8,900,000 patients; and the data on at least half of these patients spans more than 985 days, for a median of approximately 2.7 years of continuous care.
While the present study uses all data in CDS, some research cited in our literature review analyzes data generated by MQIC, a national network of outpatient practices. All practices that use the Centricity EMR are invited to join MQIC, but membership in MQIC is completely voluntary. In 2008 MQIC included over 4,000,000 patients cared for by over 5000 physicians and other providers from over 90 institutions in 35 states throughout the United States. These practices ranged from solo practices to large multipractice institutions with over 1000 providers; roughly 63% of MQIC providers are primary care physicians (including family medicine, general internal medicine, general pediatrics, and geriatrics).
Exclusion criteria applied to the Centricity EMR data
Exclusion criteria and algorithms were developed to eliminate potential distortions in aggregated Centricity EMR data resulting from user interface design, database structure, the de-identification process, or methods of recording procedures. Problem/complaint text extracted from records was filtered to ensure that patients merely evaluated for a condition (without confirmation of diagnosis) were not counted as “positives” along with patients explicitly diagnosed with that condition (included in this process of elimination are diagnoses containing “family history of,” “rule out,” “risk of,” “screening of,” “symptoms of,” and “question of”). Exclusion criteria were also employed to reduce the likelihood of overestimation resulting from “backfilling” of retrospective visit data for physician practices when they first adopt the Centricity EMR.
Office visits and study population
This analysis uses 2005 GE Centricity EMR data on “activities” and patient demographics. Several sets of inclusion/exclusion criteria were applied prior to analysis (Figure 1). First, all 2005 activity data were extracted from the database (Step 1). The index date range was from January 1, 2005 to December 31, 2005. Then, patients who had an activity record in 2005 and at least 1 documented activity in both 2004 and 2006 were selected to assure that they were active and still in the same health care system throughout 2005 (Step 2). According to the NAMCS 2005 summary report, 10 NAMCS excludes office visits to physicians in the specialties of anesthesiology, pathology, and radiology. In the GE Centricity EMR database, 56% of the patients' specialty information was listed as “unknown.” Among the 3.9 million patients with specialty information, specialty data for only 872 patients indicated the 3 specialties of anesthesiology, pathology, or radiology. Specialty information provided by the responsible service provider was patient based, and activity data did not include specialty information. Therefore, this selection criterion was not applied to the data. In NAMCS, certain types of contacts were also excluded, including those made by telephone, those made outside the physician's office (eg, house calls), visits made in hospital settings (unless the physician has a private office in a hospital and that office meets the NAMCS definition of an “office”), visits made in institutional settings by patients for whom the institution has primary responsibility over time (eg, nursing homes), and visits to doctors' offices that are made for administrative purposes only (eg, to leave a specimen, pay a bill, pick up insurance forms). Therefore, among the various activity types, only activities meeting these “office visit” criteria were included in the study (Step 3). In addition to the criteria mentioned earlier, a small number of patients had missing sex and/or age information and were excluded from the final analysis data set (Step 4).

Steps in the selection of office visits.
In the 2005 NAMCS survey data, only a single “primary” ICD-9-CM diagnosis code was recorded by physicians (see Tables 2 and 3), and additional chronic conditions were recorded through checkbox responses regardless of visit diagnosis. In the GE Centricity EMR database, however, activity data were available from multiple sources: problem, complaint, medication, prescription, observation, and order data. To maximize the validity of the comparison of data in the 2 data sets, only office visits associated with the problem table (only visits which resulted in 1 or more ICD-9-CM diagnosis codes) were included in the analyses reported in Table 2 (Step 5). In situations where more than 1 ICD-9-CM diagnosis code is listed during a single office visit, a fraction of each ICD-9-CM code was calculated by dividing 1 into the total number of ICD-9-CM diagnosis codes, and this fraction was used to calculate the annual number of visits with specified ICD-9-CM codes. A small number of ICD-9-CM “E” (external injury) codes were present in the database; these were included in the “other” category in Table 2.
For the chronic disease comparisons shown in Table 3, we first identified the corresponding ICD-9-CM diagnosis codes and searched the problem data set for these codes through the end of 2005. The diagnosis codes for these chronic diseases are listed in the Notes for Table 3. All chronic diseases were considered separately, and the fractional approach described above was not applied in this part of the analysis.
Analytic methods
Standard errors of percentages published for the NAMCS data were used to generate 99% confidence intervals, to which point estimates generated for the Centricity EMR data were compared to yield conclusions about statistically significant differences at the P < .01 level between the 2 sets of results. Given the very large numbers involved, almost all of the differences examined were statistically significant. The actual percentage differences were then reexamined to assess the substantive significance of each finding, with a criterion of a difference of at least 2% to indicate substantive significance.
Results
Table 1 compares NAMCS and GE Centricity EMR results on the number and percent distribution of office visits, with corresponding standard errors, by patient age and sex for the United States for 2005. The GE Centricity EMR data include higher proportions of patient visits for ages up to age 64 and lower proportions for ages 65 and older, as follows: 18.0% vs. 16.7% for younger than age 15 years, 8.6% vs. 7.3% for ages 15–24 years, 22.0% vs. 20.9% for ages 25–44 years, 29.9% vs. 29.4% for ages 45–64 years, 10.8% vs. 12.4% for ages 65–74 years, and 10.7% vs. 13.3% for ages 75 years and older. While all 6 of these differences are statistically significant (P < .01), they appear relatively modest. On the other hand, the GE Centricity EMR data include a significantly and substantially higher percentage of female patient visits (63.3% vs. 58.2%, P < .01).
Visit rates for age, sex, race, and ethnicity are based on the July 1, 2005, set of estimates of the civilian noninstitutional population of the United States as developed by the Population Division, US Census Bureau.
Numbers may not add to totals because of rounding.
14 Advance DataNo. 387 + June 29, 2007.
… , category not applicable.
P < .01.
Table 2 compares NAMCS and GE Centricity EMR results on the number and percent distribution of office visits by physician's primary diagnosis. The comparison is problematic in that the GE Centricity EMR data include a significantly higher proportion of visits with diagnoses in the symptoms, signs, and ill-defined conditions category (16.0% vs. 6.3%, P < .01) and significantly lower proportions in the supplementary classification, all other diagnoses, and unknown categories (11.9% vs. 18.6%, 1.5% vs. 2.7%, and 0.0% vs. 0.9%, all P < .01), with a combined difference across these 3 categories of 13.3% vs. 22.2%. Given a close correspondence between the NAMCS and GE Centricity EMR in the overall proportions in all 4 residual categories combined (28.5% vs. 29.3%), we will compare differences in the more substantive diagnostic categories, but we will do so with the required caution. All 11 such comparisons are statistically significant (P < .01); however, only 3 seem noteworthy - the lower proportions of neoplasm and circulatory condition diagnoses (1.2% vs. 4.1% and 3.6% vs. 8.5%, respectively), and the higher proportion of respiratory condition diagnoses (17.9% vs. 11.5%).
… category not applicable.
Based on the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM).
Includes diseases of the blood and blood-forming organs (280–289); complications of pregnancy, childbirth, and the puerperium (630–677); congenital anomalies (740–759); certain conditions originating in the perinatal period (760–779); and entries not codable to the ICD–9–CM (eg, illegible entries, left against medical advice, transferred, entries of “none,” or “no diagnoses”).
Includes blank diagnoses.
Numbers may not add to totals because of rounding.
Advance DataNo. 387 + June 29, 2007 23.
P < .01.
Table 3 compares NAMCS and GE Centricity results for the number and percent distribution of office visits by selected chronic conditions for the United States for 2005. The Centricity database is substantially more likely than the NAMCS data to show at least 1 condition (63.4% vs. 52.7%) and substantially less likely to show none (36.6% vs. 43.8%) or blank (0.0% vs. 3.5%). Of 13 possible comparisons regarding selected chronic diagnoses, all showed statistically significant differences between the 2 data sets, and all but 1 showed a higher proportion in the Centricity data: hypertension (24.4% vs. 22.8%), hyperlipidemia (25.7% vs. 13.5%), diabetes (10.2% vs. 9.8%), depression (16.0% vs. 8.8%), obesity (8.9% vs. 7.1%), cancer (14.4% vs. 5.9%), asthma (10.7% vs. 5.7%), chronic obstructive pulmonary disease (COPD; 10.3% vs. 4.2), ischemic heart disease (6.1% vs. 4.1%), cerebrovascular disease (4.2% vs. 1.9%), congestive heart failure (3.5% vs. 1.6%), and chronic renal failure (2.9% vs. 1.2%). The one exception to the pattern detailed above was the higher percentage of visits with a diagnosis of arthritis in NAMCS vs. Centricity (14.3% vs. 13.2%). Five of the differences are especially sharp (ie, exceeding 5%): those for hyperlipidemia, depression, cancer, asthma, and COPD.
Figure does not meet standards of reliability or precision.
Presence of chronic conditions, regardless of visit diagnosis, were based on checkbox responses. ICD-9-CM diagnosis codes analyzed in the Centricity data are as follows: hypertension (401–405), arthritis (710–716, excluding 711.0), hyperlipidemia (272, 272.0, 272.1, 272.2, 272.3, 272.4), diabetes (250), depression (300.4, 311, 296.2, 296.3), obesity (278.0), cancer (140–239), asthma (493), chronic obstructive pulmonary disease (490–492, 494–496), ischemic heart disease (410–414), osteoporosis (733.0), cerebrovascular disease (430–438), congestive heart failure (429.1, 429.2, 428 excluding 428.21, 428.31, and 428.31), and chronic renal failure (585).
COPD chronic obstructive pulmonary disease; CHF, congestive heart failure.
Numbers may not add to totals because more than 1 condition may be reported per visit.
28 Advance DataNo. 387 + June 29, 2007.
Presence of chronic conditions, regardless of visit diagnosis, were based on checkbox responses.
COPD, chronic obstructive pulmonary disease; CHF, congestive heart failure.
Numbers may not add to totals because more than 1 condition may be reported per visit.
28 Advance DataNo. 387 + June 29, 2007.
P < .01.
… Category not applicable.
Presence of chronic conditions, regardless of visit diagnosis, were based on checkbox responses.
COPD, chronic obstructive pulmonary disease; CHF, congestive heart failure.
NOTE: Numbers may not add to totals because more than 1 condition may be reported per visit.
28 Advance DataNo. 387 + June 29, 2007.
Discussion
Regarding demographic data, comparisons of NAMCS and Centricity EMR patient visits by age and sex show that Centricity visits are somewhat more likely to involve younger patients (the 6 percentage differences by age group ranged from 0.9%–2.6%), and substantially more likely to involve females (5.1%). Although all of these differences are statistically significant, given the large sample size, we believe that sex may have more potential for confounding than age. This issue should be addressed by additional research.
Comparisons of NAMCS and Centricity EMR results for physician's primary diagnosis show lower proportions of neoplasm and circulatory condition diagnoses in Centricity and a higher proportion of respiratory condition diagnoses, suggesting relatively more visits for chronic conditions in NAMCS and relatively more visits for acute conditions in Centricity.
Turning to the selected chronic conditions highlighted in the NAMCS results, the Centricity data include a substantially higher proportion of visits with at least 1 condition and a substantially lower proportion with none of the selected chronic conditions. Of the 13 possible chronic diagnosis comparisons, all but 1 (ie, arthritis) showed a significantly higher proportion in the Centricity data.
Among the potential explanations for the apparent greater sensitivity of Centricity data to chronic condition diagnoses are (1) methodological differences between the NAMCS and Centricity data collection approaches and data sets, (2) differential characteristics of providers who use EMRs like Centricity, and (3) in particular, differential characteristics of providers who not only use an EMR but also participate in a data collection and analysis program like MQIC.
Regarding the first potential explanation, the key methodological difference in data collection is that the NAMCS focuses on the “primary reason for the visit,” while Centricity accumulates all problems and diagnoses and thus has greater capacity to identify all relevant diagnoses. Moreover, data extracted from the Centricity EMR using algorithms to maximize validity may be inherently more comprehensive and accurate than patient- or visit-level survey data, the validity of which depends on the motivation of patients, physicians, other clinicians, and office staff. Additional strengths of the Centricity EMR database are (1) its incorporation of documentation of a wide range of diagnostic and therapeutic services, specifically laboratory test results and medications ordered, and (2) its incorporation of all such data over time, allowing greater sensitivity in capturing diagnoses. As a result, it is likely that the Centricity data reflect more of the medical problems that motivated patient visits than the NAMCS data.
At least 2 specific interpretations of our major findings relate to the differential characteristics of EMRs and the providers who adopt them. First, EMRs facilitate more complete and accurate documentation of diagnoses as well as laboratory results and medication orders; specifically, when diagnoses are listed in progress notes, the EMR automatically uses them to populate the problem list. Second, as noted by Gill and Chen 8 regarding MQIC in particular, “there may be differences between EHR users in general and those who participate in MQIC … while MQIC includes a large and diverse group of providers, participation is voluntary, so there may be some self-selection. For example, one reason that practices join MQIC is to have access to quality reporting. It may be that practices that are more interested in measuring (and improving) quality of care are more likely to join MQIC.” It is a reasonable inference that providers who use the Centricity EMR and participate in MQIC document diagnoses more completely and thoroughly than other providers. If this is the case, the key findings of the present analysis – substantially higher proportions of visits for key chronic conditions in Centricity compared to NAMCS - represent not an overestimate but an accurate estimate of the distribution of diagnoses in ambulatory visits in the United States.
Limitations
The limitations of this study generally stem from the methodological differences between the NAMCS and Centricity EMR data sets. Although NAMCS data collection is based on probability sampling techniques that support estimates for the United States as a whole, there is no such methodology and capability in the Centricity database. Thus, differences in results derived from the 2 data sets may reflect a variety of factors including, among others, the age, sex, and other demographic characteristics of the patients whose visits were captured; the clinical characteristics of those patients; and the characteristics of the physicians who use the Centricity EMR, which may, in turn, be associated with patients' demographic and clinical characteristics.
As already described, several approaches were employed to reduce the effects of such methodological artifacts. Specifically, exclusion criteria were employed to reduce potential overestimation of patients' diagnoses in Centricity: (1) filtering problem/complaint text extracted from records to guarantee that patients merely evaluated for a condition (without confirmation of diagnosis) were not counted as “positives” along with patients explicitly diagnosed with that condition; and (2) excluding retrospective visit data that was “backfilled” when providers first adopted the Centricity EMR. However, available data do not allow assessment of the success of these strategies at eliminating all duplication and overestimation.
More generally, analyses of EMR data are subject to the same limitations posed by any use of paper or electronic medical records; data will not be available in the medical record if they were not available to the physician. Such gaps result whenever patients receive care from other providers but documentation is not provided to the original physician. Moreover, gaps may result to the degree that paper-based documentation is not transferred to the EMR and to the degree that there are free-text data in the EMR that are not available for analysis because of the complexities and costs of such text mining.
Among the specific potential limitations of the Centricity EMR database are that data on a single patient may be included in the database with more than 1 patient identifier, allowing for duplication and overestimation when aggregate data are analyzed. Additionally, data reflect problems or complaints, rather than ICD-9-CM diagnoses, as are used in NAMCS; as a result, analyses must begin by searching the entire patient record, including multiple codes and text strings, to guarantee a comprehensive search for and valid identification of diagnoses. A final limitation is the rate of missing data on race/ethnicity in the Centricity data, which was too high to allow comparisons with the NAMCS data.
On the other hand, among the strengths of the Centricity EMR database are its incorporation of documentation of a wide range of diagnostic and therapeutic services, specifically laboratory test results (with exact values and units of measurement) and medications ordered, and, perhaps most importantly, all of these types of data at multiple points in time, allowing longitudinal analyses not possible with most federal data sets, including NAMCS.
It is expected that use of ambulatory EMR systems like GE Centricity will expand substantially during the next decade. Many general trends and, specifically, increased federal incentives will combine to increase EMR utilization, increase the number of patients whose records will be incorporated in EMRs and EMR-based research databases such as GE CDS and MQIC, and increase the number of years of continuous patient data available for longitudinal research including outcomes research, quality measurement, and research on quality.
A broad range of clinical research will be facilitated by increasing EMR utilization and the increasing size, scope, and span of EMR-based databases. One way of projecting the potential for such studies is to examine the range of studies conducted to date using GE Centricity data. A recent list of such publications shows 12 articles published in peer-reviewed journals from 2006-2008 and 31 presentations at professional society meetings from 2004-2008. Among the topics of the journal articles were quality of care for patients with diabetes; oral antidiabetic medication use and outcomes (2 articles); cardiometabolic risk factors; the effectiveness of statins to lower low-density lipoprotein cholesterol (2 articles); the effects of a randomized controlled trial on antihypertensive prescribing; gastrointestinal complications of over-the-counter NSAIDs; effects of second-generation antipsychotics on weight gain; diagnosis, treatment, and/or outcomes for COPD (2 articles); and antibiotic use for adult upper respiratory infections. Undoubtedly, the range of conditions and outcomes studied will grow. In particular, studies of rare conditions and outcomes will become more common, given the huge numbers of cases available for longitudinal analyses as well as the ever-widening time span.
Conclusions
This study has compared key demographic characteristics (ie, age, sex) and clinical characteristics (ie, diagnoses) in the GE Centricity EMR database and the federal NAMCS. Compared to NAMCS patient visits, Centricity EMR patient visits are somewhat more likely to involve younger patients and considerably more likely to involve females. Centricity visits include lower percentages of the more severe chronic conditions of neoplasms and circulatory conditions and higher percentages of respiratory conditions, suggesting that there is a higher proportion of visits for acute conditions in the Centricity database and a higher proportion of visits for chronic conditions in NAMCS. However, regarding the 13 chronic conditions highlighted by NAMCS, Centricity data show a substantially higher proportion of visits for such conditions in general, and for 12 of the 13 specific conditions.
While Centricity EMR data and results are not perfectly comparable to NAMCS data and results, Centricity data may actually be more sensitive in capturing diagnoses, especially chronic condition diagnoses. More research is needed to assess the validity and utility of Centricity and other EMR databases, and, specifically, to assess the potential for evolving methodologies to maximize such validity and utility.
Footnotes
Author Disclosure Statement
Ethicon Endo-Surgery, Inc., sponsored this research. Ethicon Endo-surgery, Inc. employs Dr. Haas and contracted with all other authors of this manuscript. Drs. Crawford, Cote, Couto, Daskiran, Gunnarsson, Haas, Nigam, and Schuette, and Mr. Yaskin and Ms. Haas disclosed no other financial ties or conflicts of interest.
a
Centricity Physician Office is a registered trademark of GE Medical Systems Information Technologies.
