Quality-Score (Q-Score) Can Be More Sensitive than %Time in Range and Several Other CGM Metrics in Detecting Responses to Therapeutic Interventions

Abstract

Purpose:

To evaluate the relative sensitivity of several available CGM metrics for the detection of the effects of clinical interventions in people with type 1 diabetes (T1D) and type 2 diabetes (T2D).

Methods:

Real-world data from people with poor glycemic control (hemoglobin A1c 8.2 ± 1.3%) for 120 people with T1D and 92 people with T2D, using Libre 2 CGM. Analysis of CGM data from 3 days prior to admission and 2 days immediately before discharge from ∼8 days of in-hospital care with changes in therapy as prescribed by hospital-based diabetes specialists. CGM metrics included: quality-score (Q-Score), Time in Range (TIR) (3.9–10 mmol/L), Time Above Range (>10 mmol/L), Time Below Range (<3.9 mmol/L), Mean Sensor Glucose, Glucose Management Indicator, Glycemia Risk Index, Glucose Daily Range, and Mean of Absolute Daily Differences (MODD). We evaluated the paired differences in all metrics pre- and postintervention within subjects using classical paired Student’s t tests.

Results:

The Q-Score showed the largest effects in terms of Student’s t-values for T1D, for T2D, and for all (T1D and T2D) subjects after pooling, indicating better sensitivity for detection of an effect than TIR or seven other metrics. One of the five components of the Q-Score, MODD, a classical measure of stability of glucose patterns from day to day, showed the second-best sensitivity in evaluating changes within subjects specifically for people with T1D.

Conclusion:

We observed consistent differences in sensitivity for the detection of the effects of therapeutic interventions, with Q-Score being superior to eight alternatives. This study needs replication using additional patient populations and multiple types of interventions to evaluate its generalizability and applicability to both randomized controlled clinical trials and real-world clinical data.

Keywords

CGM type 1 diabetes type 2 diabetes Q-Score time in ranges (TIR)clinical trials

Introduction

What are the best criteria to monitor responses to therapeutic interventions in people with diabetes? Multiple metrics and criteria have been proposed for the evaluation of therapeutic interventions in people with diabetes.^1–3 The current standard used by regulatory bodies is hemoglobin A1c (HbA1c).^4–6 Many parties have recommended that regulatory bodies also include one or more CGM-derived metrics, for example, %time in range (%TIR), that is, the percentage of time that glucose is in the range 70–180 mg/dL (3.9–10 mmol/L). Sometimes HbA1c is not appropriate or not available, and it may not be the best estimator of Mean Sensor Glucose (MSG).⁷ Many clinical investigators would prefer to use CGM-derived metrics rather than HbA1c. These include TIR, Time Above Range, and Time Below Range (TBR),^5,6 time in tight range,⁸ MSG,⁷ Glucose Management Indicator (GMI),⁹ and other estimates of HbA1c based on mean glucose.¹⁰ Several other metrics had been proposed previously that were calculated directly using the all of the individual glucose values collected by the CGM including: Schlichtkrull’s M _R index,¹¹ Blood Glucose Risk Index (BGRI),¹² High Blood Glucose Index (HBGI),¹² Low Blood Glucose Index (LBGI),¹² Index of Glycemic Control (IGC),¹³ Glycemic Risk Assessment Diabetes Equation (GRADE),¹⁴ and its subcomponents for hyper-, hypo-, and euglycemic ranges.¹⁴ In addition, there are several composite scores that combine two or more CGM metrics, including Quality-Score (Q-Score),^15–17 Comprehensive Glucose Pentagon (CGP),¹⁸ Personal Glycemia Score (PGS),¹⁹ Composite CGM Glucose Index (COGI),²⁰ and Glycemia Risk Index (GRI).²¹

With such a plethora of metrics, the question naturally arises, which ones or which combinations of metrics are most sensitive in terms of their ability to detect a significant effect earlier during a therapeutic intervention within an individual patient or group of patients, or with greater sensitivity so as to be able to detect an effect by means of a smaller number of subjects. If a metric offers greater sensitivity, it may enable researchers to conduct clinical studies utilizing smaller sample sizes and also enable clinicians to more rapidly evaluate their patients and adjust therapy more frequently.

The present study examines the apparent sensitivity of several available metrics to detect changes observed in a non-randomized single-center study following hospitalization of patients with inadequate glycemic control at baseline as evaluated by their clinic physician, in response to customized interventions prescribed by highly trained and experienced hospital-based diabetes specialists.

Methods

This study involved 212 subjects, 120 with T1 diabetes and 92 with T2 diabetes, at the Klinikum Karlsburg, Heart and Diabetes Center, Karlsburg, Germany. These participants provided written informed consent and were hospitalized with the goal of “improvement of diabetes control.” This study was approved by the Regional Ethics Review Board of the University of Greifswald, Greifswald, Germany. All subjects had considerable prior experience using the Abbott Freestyle Libre 2 CGM device in intermittent scan mode. All glucose data were collected directly from the Abbott Reader device. Only subjects with >70% completeness of CGM data were included. Data in the “cloud” as transmitted to Abbott were not utilized due to concerns of the Regional Ethics Review Board regarding patient data privacy. Subjects were instructed to scan the Libre 2 sensor at least once every 8 h. All data were anonymized. Glucose monitoring data were analyzed from the 3-day period immediately before hospital admission and during the 2 days immediately prior to discharge. Metrics calculated included: Q-Score, MSG, GMI, within-day variability measured as average daily glucose range,^15,16 Time Above Range (%TAR) using 10 mmol/L (180 mg/dL) as the upper limit of the target range,¹⁶ %TBR using 3.9 mmol/L (70 mg/dL) as lower limit of the target range,^15,16 between-day variability—Mean of Daily Differences (MODD),^15–17 CGP,¹⁸ and GRI.²¹

Reference data for the calculation of Q-Score was based on a previously reported reference population of 1562 patients at Karlsburg Hospital.^15,16 The Q-Score has been validated previously by comparison with ratings by experienced clinicians.¹⁵ The inputs for calculation of Q-Score include CGM glucose levels, which can be expressed either as mg/dL or mmol/L, and TAR and TBR, which can be expressed as hours per day or as a percentage of the 24-h day (% = 100 h/24).^15,16 We utilized the definition of Q-Score such that %TAR is defined as the percentage of 24 h (%) that glucose exceeds 10 mmol/L (180 mg/dL).¹⁶ Equations for the calculation of the Q-Score that can handle various combinations of units for glucose and duration (h) or percentage of 24 h day periods are provided in Supplementary Data. Statistical analyses were performed using a one-sided paired Student’s t test.²² We also examined unpaired Student’s t tests²² and two nonparametric methods, Wilcoxon signed rank test,²³ and sign test.²⁴ Statistical analyses were performed using PASW Statistics for Windows, version 18.0 (SPSS, Inc., Chicago, IL, USA).

Interventions

Clinical status of patients was reviewed on a daily basis by the diabetes specialist physicians. CGM results were continuously available to the patients and, on a daily basis, to their physicians. Physicians were able to make changes in the subjects’ prescribed medications and provide advice and recommendations regarding diet. Patients were encouraged to participate in moderate physical activity for 45 min per day on a daily basis. The overall goals were to reduce MSG, reduce risk of hypoglycemia, reduce the magnitude of postprandial glucose excursions, and/or reduce glucose variability. The median duration of the intervention (hospitalization) was 8 days (Table 1).

Table 1.

Characteristics of Subjects at Initiation of Intervention

	Type 1 Diabetes (n = 120)	Type 2 Diabetes (n = 92)	All subjects (n = 212)
Sex (M/F)	59/61	44/48	103/109
HbA1c (%)	8.25 ± 1.35	8.17 ± 1.15	8.22 ± 1.26
HbA1c (mmol/mol)	66.7 ± 14.7	65.8 ± 12.5	66.3 ± 13.8
BMI (kg/m²)	28.3 ± 5.7	35.2 ± 11.6	31.3 ± 9.4

	Min	25^th	50^th	75^th	Max	Min	25^th	50^th	75^th	Max	Min	25^th	50^th	75^th	Max
Age (yr)	18	43	57	67	85	38	60	66	71	87	18	52	62	69	87
Duration of diabetes (yr)	1	10	24	38	69	1	11	20	28	52	1	11	22	34	69
Duration of hospitalization (days)	3	6	8	9	16	3	6	8	9	21	3	6	8	9	21

	Oral	Oral + Insulin	MDI	CSII	Oral	Oral + Insulin	MDI	CSII	Oral	Oral + Insulin	MDI	CSII
Prior Therapy (n)	0	4	94	22	9	65	17	1	9	69	111	23

The values shown are mean (bold font) ± standard deviation, or {Minimum, 25^th-, 50^th (bold font)-, 75^th-percentiles, and Maximum}, and the number of subjects receiving various treatment modalities.

BMI, body mass index; CSII, continuous subcutaneous insulin infusion; Max, maximum; MDI, multiple daily injections; Min, minimum.

Results

Patient population

Table 1 shows the characteristics of the patient population at the onset of the intervention (hospitalization).

Responses to intervention to improve quality of glycemic control (Table 2, Fig. 1): type 1 diabetes

Using a conventional “paired Student’s t test,”²² the mean change in CGM metrics (preadmission vs. discharge CGM) within individuals in Q-Score showed the largest magnitude relative to its corresponding standard error of the mean difference (semd) for data from subjects with T1D, T2D, and for the entire dataset (Table 2, Fig. 1A). The value for the Student’s t test is largest for the Q-Score and is substantially larger than the t-value for %TIR or any of the other seven metrics. The values for t for %TIR and %TAR were similar. The values for the t test for MSG and for the Glycemia Management Indicator (GMI) were also larger than the ones for %TIR. The value of the t test for GRI, t = 7.39, was larger than the values for %TIR, %TAR, MSG, and GMI, but remained smaller than the value for Q-Score (t = 8.43) (Table 2, Fig. 1A).

FIG. 1.

Magnitude of effect size (paired Student’s t-statistic [one-sided]) when examining the differences in CGM metrics between the onset of the intervention (hospitalization) and repeat CGM immediately before discharge from the hospital, for people with T1D, T2D, and when data for T1D and T2D are pooled. %TAR, Time Above Range (>180 mg/dL or > 10 mmol/L); %TBR, time below range (<70 mg/dL or 3.9 mmol/L); %TIR, %time in range 70–180 mg/dL (3.9–10 mmol/L); GMI, Glycemic Management Indicator calculated from mean sensor glucose using a linear relationship⁹; GRI, Glycemia Risk Index²¹; MODD, Mean of absolute Daily Differences (mg/dL or mmol/L)¹⁷; MSG, mean sensor glucose; Range: average daily glucose range (mg/dL or mmol/L) (A) T1 Diabetes: In addition to Q-Score, four CGM metrics appear to be more sensitive than %TIR for detection of a significant effect of the intervention, that is, MSG, GMI, GRI, MODD. The average daily Range of glucose showed markedly lower t-values than nearly all of the other metrics but was still statistically significant. Subjects in the present series had low levels of %TBR and did not show a statistically significant changes. (B) T2 diabetes. The sensitivity of Q-Score was greater than that of %TIR and all other metrics, but with a smaller magnitude of difference than for participants with T1D (cf. Fig. 1A.). All metrics (with the exception of %TBR and average daily range of glucose) showed similar levels of sensitivity as reflected in the paired Student’s t test. (C) T1 and T2D, pooled data. Four metrics, including Q-Score, MSG, GMI, and GRI, show greater sensitivity (t-statistics) than %TIR.

Table 2.

Definition of CGM Metrics

	Admission		Discharge		Difference		Change
Metric	Mean	Sem	Mean	Sem	Mean	Semd	Student’s pairedt test	Nominal P value(one-tailed)
Metric	Type 1 diabetes
Q-Score	15.35	0.43	11.73	0.32	−3.63	0.43	8.43	4.8 × 10⁻¹⁴
%TIR	51.67	1.99	65.63	1.71	+13.97	2.34	5.98	1.2 × 10⁻⁰⁸
%TAR	46.00	2.10	32.20	1.74	−13.79	2.46	5.61	6.6 × 10⁻⁰⁸
%TBR	2.34	0.36	2.16	0.28	−0.18	0.32	0.55	2.9 × 10⁻¹
MSG (mmol/L)	10.87	0.28	8.99	0.14	−1.87	0.28	6.79	2.4 × 10⁻¹⁰
GMI (%)	7.99	0.12	7.18	0.06	−0.81	0.12	6.79	2.4 × 10⁻¹⁰
GRI	60.16	2.93	37.81	2.06	−22.35	3.03	7.39	1.1 × 10⁻¹¹
Range (within days) (mmol/L)	12.84	0.30	10.93	0.26	−1.91	0.31	6.07	7.7 × 10⁻⁹
MODD (mmol/L)	3.83	0.12	2.80	0.10	−1.03	0.14	7.38	1.2 × 10⁻¹¹
Type 2 diabetes
Q-Score	11.82	0.50	8.31	0.34	−3.51	0.43	8.21	6.9 × 10⁻¹³
%TIR	57.08	2,95	77.36	2.22	+0.27	2.77	7.33	4.6 × 10⁻¹¹
%TAR	42.18	3.04	21.78	2.26	−20.41	2.77	7.38	3.6 × 10⁻¹¹
%TBR	0.74	0.21	0.87	0.29	+0.13	0.36	0.36	3.6 × 10⁻¹
MSG (mmol/L)	10.37	0.33	8.30	0.17	−2.06	0.28	7.48	2.3 × 10⁻¹¹
GMI (%)	7.77	0.14	6.89	0.07	−0.89	0.12	7.48	2.3 × 10⁻¹¹
GRI	48.22	4.12	22.11	2.32	−26.11	3.75	6.96	2.6 × 10⁻¹⁰
Range (within days) (mmol/L)	9.40	0.28	8.00	0.25	−1.41	0.27	5.15	7.7 × 10⁻⁷
MODD (mmol/L)	2.44	0.12	1.70	0.09	−0.74	0.12	5.99	2.1 × 10⁻⁸
All subjects (TD1 and TD2)
Q-Score	13.82	0.35	10.24	0.26	−3.58	0.31	11.71	5.0 × 10⁻²⁵
%TIR	54.02	1.71	70.72	1.42	+16.70	1.80	9.31	8.6 × 10⁻¹⁸
%TAR	44.34	1.76	27.68	1.43	−16.66	1.85	9.02	5.9 × 10⁻¹⁷
%TBR	1.64	0.23	1.60	0.21	−0.04	0.24	0.18	4.3 × 10⁻¹
MSG (mmol/L)	10.65	0.21	8.69	0.11	−1.96	0.20	9.95	1.1 × 10⁻¹⁹
GMI (%)	7.90	0.09	7.05	0.05	−0.84	0.08	9.95	1.1 × 10⁻¹⁹
GRI	54.98	2.47	31.00	1.63	−23.98	2.36	10.16	2.7 × 10⁻²⁰
Range (within days) (mmol/L)	11.35	0.24	9.66	0.21	−1.69	0.21	7.90	7.6 × 10⁻¹⁴
MODD (mmol/L)	3.23	0.10	2.32	0.08	−0.90	0.10	9.44	3.6 × 10⁻¹⁸

Q-Score: Quality Score.^15,16

%TIR: %time in range (70–180 mg/dL or 3.9–10 mmol/L).

%TAR: %time above range, with 180 mg/dL (10 mmol/L) as upper limit of target range.

%TBR: time below range, using 70 mg/dL (3.9 mmol/L) as lower limit of target range.

MSG: Mean Sensor Glucose, mg/dL or mmol/L.

GMI: Glycemia Management Indicator, estimate of HbA1c as % (NHSP) or mmol/mol (IFCC) calculated from MSG.⁹

GRI: Glycemia Risk Index.²¹

Range: average daily range of glucose (mg/dL or mmol/L), a measure of within-day variability, a component of the Q-Score.^15,16

MODD: Mean of absolute Daily Differences, that is, average difference between glucose values exactly 24 h apart.¹⁷ MODD provides a measure of between-day glucose variability and also reflects the degree of similarity of glucose profiles between pairs of successive days.²⁵

Columns 2 and 3: Mean and standard error of the mean (sem) for glucose levels during the initial CGM data collection.

Columns 4 and 5: Mean and standard error of the mean (sem) for glucose levels during the CGM data collection at the conclusions of the intervention.

Columns 6 and 7: Mean of differences between values at onset and end of the intervention, and standard error of the mean difference (Semd).

Column 8: Student’s paired t test²² for change between initial and final values within each subject, calculated as t = (mean difference within subjects)/(standard error of the mean difference). This value serves as one of the most important estimates of the sensitivity of the specified CGM metric.

Column 9: P values for a one-sided paired Student’s t test, with degrees of freedom (df) = N − 1, where N is the number of subjects.

Type 2 diabetes

The corresponding results for people with type 2 diabetes are shown in Table 2 and Figure 1B. The Q-Score resulted in the highest value for the paired Student’s t test, although the difference is smaller than observed in the case of people with T1D. For subjects with T2D the t-values for %TIR, %TAR, MSG, GMI, and GRI were fairly similar.

All subjects (T1D and T2D, pooled)

The results for all subjects, pooling data from subjects with either T1D or T2D are shown in Table 2 and Figure 1C. In this case, the t-values were generally higher, due in part to the increased number of subjects (N = 212) as opposed to N = 120 or N = 92. Results were similar to those observed for people with T1D, such that Q-Score performed better (with a larger magnitude of the corresponding t-value) than %TIR or any of the other metrics examined.

The overall patterns of results were similar for T1D, T2D, and the combined T1D, T2D group, but the effects were most evident for people with T1D and the T1D, T2D pooled dataset. The effect size (t) for Q-Score was larger than any of the other metrics examined, that is, TIR, TAR, TBR, MSG, GMI, GRI, range, or MODD (Table 2, Fig. 1).

The other metrics, including %TIR, GMI, mean sensor glucose (MSG), and glycemia risk index (GRI), showed smaller changes than the Q-Score (relative to their corresponding standard error of the mean differences). This was observed for subjects with T1D and the pooled (T1D, T2D) group. For all subjects (T1D, T2D) the t test for %TIR was inferior to those for Q-Score, GRI, MODD, Mean Sensor Glucose, and GMI.

The %TBR < 3.9 mmol/L (<70 mg/dL) showed no significant changes between pre- and postintervention at the P < 0.05 level for T1D, T2D, or when both groups were combined.

For all metrics, the t-values were substantially higher when all subjects were included (N= 212) than when considering only the T1D (N = 120) or T2D subgroups (N = 92) when used alone. This was likely due in large part to the larger number of subjects after pooling (N = 212).

Discussion

Major findings

All of the metrics studied, with the sole exception of %TBR, showed a highly statistically significant change in response to the intervention. The Q-Score provided largest t-statistic and correspondingly the smallest P values. The t-statistic for Q-Score was greater than the t-statistic for TIR, TAR, TBR, MSG, GMI, or GRI (Table 2, Fig. 1). In some cases, the t-statistic for MSG, GMI, and GRI were also larger than the one for %TIR. Accordingly, %TIR would not be the most sensitive statistical criterion for the present study.

Criteria for “best” metric

The present study shows that the paired Student’s t test can be used to evaluate the sensitivity and comparative sensitivity of several metrics. We have also evaluated three alternative criteria: two nonparametric methods, the Wilcoxon signed rank test and the sign test, and an unpaired Student’s t test. All three of those approaches gave similar results in terms of the relative sensitivity of the various metrics. The paired Student’s t test resulted in the highest level of statistical significance.

Potential effect on sample size requirements for clinical trials

If the value of the t-statistic for one metric (metric A) were two-fold larger than the t - statistic for another metric (B), then metric A would be expected to achieve any specified t-value with its corresponding P value with approximately a four-fold smaller number of observations. This could potentially help reduce the number of subjects’ costs for a clinical trial. Even a much smaller difference in t-values as observed here, that is, comparing Q-Score (t = 8.43) versus %TIR (t = 5.98) (Table 2) results in a ratio of t-values of (8.43/5.98) = 1.41. This would imply that if one were to use Q-Score rather than %TIR as the response variable, one might expect to obtain the same t- and P values when utilizing approximately a two-fold smaller sample size. One might also anticipate that a statistically significant difference would be observed earlier in the course of a longitudinal study.

Some degree of sampling variability in t-values is expected, since the numerator for t, mean paired difference, was subject to approximately a 10%–15% error (calculated as 100 × semd/mean difference) and the denominator (semd) may have a similar or larger magnitude of error.

Limitations of the current study

The robustness and replicability of this study design needs to be further evaluated using multiple patient populations and clinical settings utilizing a wide range of potential therapeutic interventions. In addition to testing the primary hypothesis (e.g., “Is there a significant effect of a specified treatment?”), it would be desirable to conduct methodological studies to identify the metrics that appear to be most sensitive. Results of these studies could guide future study designs and choice of primary response variables to be incorporated into Statistical Analysis Plans.

Duration of CGM data collection

The duration of CGM was very brief relative to current standard practice. One would prefer to have longer duration of CGM data collection at the onset and following the interventions. Although it is commonly taught that one should have at least 14 days of CGM data, with 70% data accrual, this would be applicable only when the results are being used to evaluate an individual patient. This guidance does not apply to clinical trials or other clinical or real-world studies involving the pooling of data from multiple subjects, considered simultaneously. In the present study, we have N = 120, 92, and 212 subjects in three groups. respectively. The number of subjects (N) reduces the standard errors of the mean (sem) by a factor of the square root of N, and the use of the paired t test with each subject serving as their own control substantially reduces the effects of between-subject variability. Previous studies have shown that mean or median glucose levels can be evaluated with only 2 or 3 days of CGM data.²⁶ Several of the early CGM sensors obtained only 3 days of data collection, but nonetheless were utilized successfully for clinical applications for individual patients.

Some investigators may have refrained from the use of CGM in the mistaken belief that it requires 2 weeks or longer to obtain sufficient data. Recognition that even short CGM studies can be valid and informative, as in the present study, may encourage investigators to utilize CGM in more studies, with meaningful reductions in cost and potential burden on the patient.

Need for additional studies of metrics when hypoglycemia is the primary target of intervention

The sensitivity of various metrics may vary when applied to different studies. The present results indicated that the CGM metrics %TBR < 3.9 mmol/L (<70 mg/dL) was the least sensitive (Table 2, Fig. 1). Failure of %TBR to show a significant change in the present study was expected due to the low frequency of glucose values below 3.9 mmol/L for the present study. In studies designed to evaluate risk of hypoglycemia (e.g., in people with elevated risk of severe hypoglycemia), the %TBR (or variations thereof, e.g., %TBR level 1, level 2), or other metrics for hypoglycemia (e.g., the Hypoglycemia Component of GRI,²¹ GRADE _hypoglycemia or %GRADE _{hypoglycemia,} ¹⁴ Hypoglycemia Index,¹³ and Hypoglycemia Intensity,¹⁸ and others) should be evaluated and further optimized to the extent possible.

Mean of (absolute) daily differences

We examined the performance of MODD,¹⁷ one of the five subcomponents of the Q-Score. MODD was one of the most sensitive metrics for people with T1D for the detection of response to the intervention in the present study. MODD showed large changes (t = 7.38) in T1D. In contrast, for people with T2D, MODD showed a much smaller t-value (5.99) than several other metrics (%TIR, %TAR, MSG, GMI, or GRI) and was only superior to the t-values for daily range and TBR. This contrast between T1D and T2D in terms of the relative sensitivities of CGM metrics deserves further investigation.

Special characteristics of Q-Score metric

Q-Score is the only one of the metrics considered here that includes MODD as a subcomponent. MODD was designed by Molnar et al.¹⁷ in a manner to evaluate both the between-day variability and the stability of the 24-h glucose profile pattern from day to day superimposed on the patterns for all days, and on median glucose by time of day (Ambulatory Glucose Profile [AGP]).^25,27 Q-Score preceded several composite scores that have been introduced during the past 10 years (CGP, PGS, COGI, GRI) (cf. Supplementary Data).

None of the other CGM metrics evaluate the stability and reproducibility of glucose profile patterns from day to day. Other statistical and graphical methods have been proposed to examine similarity and stability of glucose patterns or profiles from day to day.²⁸

Use of one-sided versus two-sided t test: In the present study, we have used a one-sided statistical test, since we were primarily interested in detecting an improvement in the metric evaluating the quality of glycemic control. Figure 1 shows values for the t-statistics, not the P value derived from it. The t-statistics are calculated exactly the same for either a one-sided or a two-sided test. The P value for a two-sided test is always exactly twofold larger than for the one-sided test.

Evaluating individual patients: The availability of a composite score with higher sensitivity can facilitate the interpretation of clinical results for individual patients. If the metric is more sensitive overall, viewed in the context of multiple subjects, then it is reasonable to assume that it is likely to be more sensitive for individual subjects, and might, thereby, be able to detect changes more rapidly than other metrics. However, even an excellent composite metric is not sufficient for overall clinical evaluation. The clinician will usually want to examine the individual metrics, whether or not they have been included in the composite metric, for example, to identify the relative magnitude and clinical importance of hyper- and hypoglycemia, mean glucose, and within-day (average within-day range) and between-day variability (MODD). The clinician also needs to evaluate and interpret the AGP,^25,27,28 e.g., to identify the times of day when each of the metrics are at their maximum or nadir values and identify the time periods when they are within acceptable and desirable limits. Clinical evaluation may also include trends in individual and composite metrics. No one metric, whether HbA1c, mean glucose, TIR, time in tight range (TITR), TAR, or TBR are sufficient by itself. Additional metrics may be needed to evaluate circadian and post-prandial changes and to evaluate the effects of diet, exercise, illness, stress, and various forms of therapy.

Regulatory implications

Regulatory agencies should consider modifying their current policies to accept multiple forms of evidence, and especially CGM metrics, including Q-Score,^16,17 GRI,²¹ other composite metrics,^18–20, MSG and GMI. One should consider not only HbA1c, but also composite metrics such as Q-Score,^16,17 CGP,¹⁸ PGS,¹⁹ COGI,²⁰ GRI,²¹ and some previously developed metrics^1,2 that have their own advantages, properties, and rationale. Several of these metrics had been disregarded for many years, possibly on the basis that they were “too mathematical”, “too theoretical”, “too difficult to calculate”, or “too complicated for clinicians or patients to understand”. However, these metrics could be readily calculated by the data processing systems associated with the CGM devices and can be understood and used as metrics or scores of quality of glycemic control. For some metrics, such as Q-Score,^15,16 CGP,¹⁸ personal glycemic score (PGS),¹⁹ composite CGM glycemic index (COGI),²⁰ and GRI,²¹ one can readily identify the particular component or components of glucose control that are most in need of correction, e.g., mean glucose (MSG), hyper- and hypoglycemia (%TAR, %TBR), within-day and between-day variability, and day-to-day stability of glucose profile patterns by time of day.¹⁷ Additional consideration of composite metrics is provided in the Supplementary Data.

The MODD metric is sensitive to synchronization of glucose patterns, e.g., due to use of standardized times of day for meals, standardized meal size and composition, and medication dosage and timing (by time of day and/or in relationships to time of onset of meals, exercise, sleep, and awakening) and other daily activities.

Moscardó et al.²⁹ have developed methods to evaluate the expected repeatability of many of the most important and frequently used CGM metrics in relation to the duration of CGM data collection.

Conclusions

The Q-Score appears to be one of the most sensitive CGM-derived metrics for detection of changes in response to a short-term intervention in the present study (hospitalization with opportunity for optimization of glycemic control by hospital-based physicians with greater experience in management of subjects with diabetes). This was observed for 120 patients with T1D and 92 patients with T2D in a single-center, nonrandomized, retrospective observational study. The GRI also showed greater sensitivity than %TIR or %TAR for people with T1D. The results were consistent using several statistical criteria. The metrics %TIR (3.9–10 mmol/L, 70–180 mg/dL), MSG, GMI, and GRI showed statistically significant changes but with less sensitivity, i.e., smaller magnitude of change in the response variable (mean difference within subjects relative to the corresponding standard error of the mean difference (Semd).).

Authors’ Contributions

“David Rodbard, MD, Petra Augstein, MD DSc, and Peter Heinke, MSc” made equal contributions to this study.

D.R.: Conceptualization of the need to identify “best metric” or “most sensitive metric” for specific studies and the need for criteria and methods to discriminate between metrics, statistical data analysis, design of graphical displays, and drafting and editing of the article. P.A.: Original development, evaluation, and validation of the Q-Score metric, evaluating correlations among alternative metrics, evolution of Q-Score to use the generally accepted limit for %TAR (10 mmol/L), design of study protocol, overseeing of the clinical study, and writing and editing of the article. P.H.: Development and evaluation of the Q-Score metric, statistical data analysis using both metric and nonparametric methods, development of summary tables, graphical displays, and writing and editing of article. A.T.: Collaboration on the evaluation and validation of the Q-Score metric, evaluation of correlations between alternative metrics. Developer of the composite metric model “Glucose Pentagon.” Writing and editing of the article. E.S.: Contributed to the design of the study, edited, and reviewed the article. J.R.: Project administration, editing, and review of the article. All authors read, approved, and take responsibility for the final article.

Footnotes

Acknowledgments

The authors wish to express their appreciation to patients and to the expert physicians and staff at Klinikum Karlsburg who participated in this study. The authors wish to thank Editage () for assistance with the preparation of the graphical abstract.

Author Disclosure Statement

D.R.—no duality of interests; P.A.—no duality of interests; P.H.—no duality of interests; A.T.—no duality of interests; E.S.—owner of patent protection regarding the Q-Score; and J.R.—no duality of interests.

Funding Information

None.

Supplemental Material

References

Nguyen

, Han

, Spanakis

, et al. A Review of Continuous Glucose Monitoring-Based Composite Metrics for Glycemic Control. Diabetes Technol Ther, 2020; 22(8):613–622; doi: 10.1089/dia.2019.0434

Rodbard

. Quality of Glycemic Control: Assessment Using Relationships Between Metrics for Safety and Efficacy. Diabetes Technol Ther, 2021; 23(10):692–704; doi: 10.1089/dia.2021.0115

Rodbard

, Berger

, Pernick

. Computer, networking, and information systems to facilitate delivery of health care to patients with diabetes. In: Baba

, and Kaneko

. (Eds), Diabetes 1994, Proceedings of the 15th International Diabetes Federation Congress, Kobe, 6–11 November 1994. Elsevier, Amsterdam, pp. 800–803, 1995.

U.S. Food and Drug Administration. Diabetes Mellitus: Efficacy Endpoints for Clinical Trials Investigating Antidiabetic Drugs and Biological Products May 2023 (Draft Guidance). Available from: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/diabetes-mellitus-efficacy-endpoints-clinical-trials-investigating-antidiabetic-drugs-and-biological [Last accessed: October 29, 2025].

Battelino

, Danne

, Bergenstal

, et al. Clinical Targets for Continuous Glucose Monitoring Data Interpretation: Recommendations From the International Consensus on Time in Range. Diabetes Care, 2019; 42(8):1593–1603; doi: 10.2337/dci19-0028

Phillip

, Nimri

, Bergenstal

, et al. Consensus Recommendations for the Use of Automated Insulin Delivery Technologies in Clinical Practice. Endocr Rev, 2023; 44(2):254–280; doi: 10.1210/endrev/bnac022

Rodbard

. Continuous glucose monitoring metrics (Mean Glucose, time above range and time in range) are superior to glycated haemoglobin for assessment of therapeutic efficacy. Diabetes Obes Metab, 2023; 25(2):596–601; doi: 10.1111/dom.14906

Beck

, Raghinaru

, Calhoun

, et al. A Comparison of Continuous Glucose Monitoring-Measured Time-in-Range 70–180 mg/dL Versus Time-in-Tight-Range 70–140 mg/dL. Diabetes Technol Ther, 2024; 26(3):151–155; doi: 10.1089/dia.2023.0380

Bergenstal

, Beck

, Close

, et al. Glucose Management Indicator (GMI): A New Term for Estimating A1C From Continuous Glucose Monitoring. Diabetes Care, 2018; 41(11):2275–2280; doi: 10.2337/dc18-1581

10.

Nathan

, Kuenen

, Borg

, et al.; A1c-Derived Average Glucose Study Group. Translating the A1C assay into estimated average glucose values. Diabetes Care, 2008; 31(8):1473–1478; doi: 10.2337/dc08-0545. Erratum in: Diabetes Care. 2009 Jan;32(1):207.

11.

Schlichtkrull

, Munck

, Jersild

. The M-value, an index of blood-sugar control in diabetics. Acta Med Scand, 1965; 177:95–102.

12.

Clarke

, Kovatchev

. Statistical tools to analyze continuous glucose monitor data. Diabetes Technol Ther, 2009; 11(Suppl 1):S45–S54; doi: 10.1089/dia.2008.0138

13.

Rodbard

. Interpretation of continuous glucose monitoring data: Glycemic variability and quality of glycemic control. Diabetes Technol Ther, 2009; 11(Suppl 1):S55–S67; doi: 10.1089/dia.2008.0132

14.

Hill

, Hindmarsh

, Stevens

, et al. A method for assessing quality of control from glucose profiles. Diabet Med, 2007; 24(7):753–758; doi: 10.1111/j.1464-5491.2007.02119.x

15.

Augstein

, Heinke

, Nowak

, et al. Q-Score Complements the Time in Range in the Evaluation of Short-Term Glycemic Control. J Diabetes Sci Technol, 2025; 19(5):1247–1256; doi: 10.1177/19322968241246209

16.

Augstein

, Heinke

, Vogt

, et al. Q-Score: Development of a new metric for continuous glucose monitoring that enables stratification of antihyperglycaemic therapies. BMC Endocr Disord, 2015; 15:22; doi: 10.1186/s12902-015-0019-0

17.

Molnar

, Taylor

, Ho

. Day-to-day variation of continuously monitored glycaemia: A further measure of diabetic instability. Diabetologia, 1972; 8(5):342–348; doi: 10.1007/BF01218495

18.

Vigersky

, Shin

, Jiang

, et al. The comprehensive glucose pentagon: A glucose-centric composite metric for assessing glycemic control in persons with diabetes. J Diabetes Sci Technol, 2018; 12(1):114–123; doi: 10.1177/1932296817718561

19.

Hirsch

, Balo

, Sayer

, et al. A simple composite metric for the assessment of glycemic status from continuous glucose monitoring data: Implications for clinical practice and the artificial pancreas. Diabetes Technol Ther, 2017; 19(S3):S38–S48; doi: 10.1089/dia.2017.0080

20.

Leelarathna

, Thabit

, Wilinska

, et al. Evaluating Glucose Control With a Novel Composite Continuous Glucose Monitoring Index. J Diabetes Sci Technol, 2020; 14(2):277–283; doi: 10.1177/1932296819838525

21.

Klonoff

, Wang

, Rodbard

, et al. A glycemia risk index (GRI) of hypoglycemia and hyperglycemia for continuous glucose monitoring validated by clinician ratings. J Diabetes Sci Technol, 2023; 17(5):1226–1242; doi: 10.1177/19322968221085273

22.

Student’s test. Wikipedia. Available from: https://en.wikipedia.org/wiki/Student%27s_t-test [Last accessed: September 12, 2025].

23.

Wilcoxon Rank Sum Test. Available from: https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test [Last accessed: September 12, 2025].

24.

Sign Test. Wikipedia. Available from: https://en.wikipedia.org/wiki/Sign_test [Last accessed: September 12, 2025].

25.

Mazze

, Lucido

, Langer

, et al. Ambulatory glucose profile: Representation of verified self-monitored blood glucose data. Diabetes Care, 1987; 10(1):111–117; doi: 10.2337/diacare.10.1.111

26.

Foreman

, Brouwers

MCGJ

, van der Kallen

CJH

, et al. Glucose Variability Assessed with Continuous Glucose Monitoring: Reliability, Reference Values, and Correlations with Established Glycemic Indices-The Maastricht Study. Diabetes Technol Ther, 2020; 22(5):395–403; doi: 10.1089/dia.2019.0385. Erratum in: Diabetes Technol Ther. 2021 May;23(5):397-398.

27.

Mazze

, Strock

, Wesley

, et al. Characterizing glucose exposure for individuals with normal glucose tolerance using continuous glucose monitoring and ambulatory glucose profile analysis. Diabetes Technol Ther, 2008; 10(3):149–159; doi: 10.1089/dia.2007.0293

28.

Rodbard

. New and improved methods to characterize glycemic variability using continuous glucose monitoring. Diabetes Technol Ther, 2009; 11(9):551–565; doi: 10.1089/dia.2009.0015

29.

Moscardó

, Herrero

, Reddy

, et al. Assessment of Glucose Control Metrics by Discriminant Ratio. Diabetes Technol Ther, 2020; 22(10):719–726; doi: 10.1089/dia.2019.0415

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.25 MB