Mind the Gap: A Sequential 4-Step Missing Data Management for CGM-Derived Endpoints in Diabetes Clinical Trials

Abstract

Keywords

continuous glucose monitoring data epoch level multi-layered structure data quality traceability missing data

Continuous glucose monitoring (CGM) systems offer transformative potential in anti-diabetic drug development by providing real-time glycemic data, but their integration into clinical trials introduces complex statistical challenges related to data quality, traceability, and missing data management.¹ Missing CGM data can arise from various sources, such as systematic errors or disconnection from the receiver causing intermittent gaps at the epoch level or complete participant discontinuation of CGM use during the study at the subject level.^2,3 Large amounts of missing data directly impact the quality and analysis of CGM-derived endpoints such as time below range, time above range, and coefficient of variation.^4,5 For example, frequent short gaps within a day can substantially reduce 24-hour CGM coverage and lead to underestimation of hypoglycemic event rates because missing epoch-level data does not imply an absence of events. Appropriate missing data handling at each level is essential for reliable analyses of CGM-derived clinical endpoints. We propose a 4-step framework that progressively imputes missing data from the epoch to the endpoint level while safeguarding the quality of data used in the final analysis.

Step 1. Epoch-Level Imputation: This step fills short gaps in raw CGM readings within a given day. Imputation is applied to short gaps (eg, up to 1 h) using methods such as, but not limited to, linear interpolation or local smoothing. The output is a combined data set of raw and imputed epoch-level readings. Notably, current methods remain limited in their suitability for longer gaps spanning several hours.

Step 2. Daily-Level Validation: This step classifies each day as valid or invalid based on data coverage. A valid day requires a minimum percentage of epoch-level data across 24 hours, counting both raw and imputed readings from step 1. A commonly applied threshold is 70% coverage, below which days are excluded from further analysis. This step ensures that only days with sufficient data density coverage contribute to higher-level summaries.

Step 3. Monitoring Period-Level Qualification: This step classifies CGM monitoring periods of each subject as qualified or unqualified based on the number of valid days within a period. For each subject, data collected from a monitoring period are considered qualified only if a minimum number of valid days is achieved within the collection window. Data from unqualified periods are discarded entirely and excluded from CGM endpoint derivation.

Step 4. Endpoint-Level Multiple Imputation: This step imputes missing CGM-derived endpoints arising from unqualified monitoring periods. For instance, if a monitoring period (eg, weeks 48-52) of a subject is deemed unqualified under the step 3 criteria, the corresponding CGM-derived endpoint at week 52 would be missing and subsequently imputed using an approach aligned with the study estimand.⁶

Sensitivity analyses are recommended to evaluate the robustness of findings across these steps by exploring varying completeness thresholds at the daily level, alternative valid-day criteria per monitoring period, and imputation methods under different assumptions at the endpoint level.

The proposed 4 sequential steps to address missing data are crucial for deriving high-quality endpoints and ensuring reliable CGM analyses.

Yoonhee Kim, PhDOffice of Biostatistics, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA Wenda Tu, PhDOffice of Biostatistics, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA Roberto Crackel, PhDOffice of Biostatistics, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA Yuhao Li, PhDOffice of Biostatistics, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA Joanne Lin, PhDOffice of Biostatistics, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA Hye Soo Cho, MSOffice of Biostatistics, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA Yun Wang, PhDOffice of Biostatistics, Office of Translational Science, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA

Footnotes

Acknowledgements

The authors thank Mark Rothmann and Matilde Kam for their support.

Abbreviations

CGM, continuous glucose monitoring.

Authors’ Note

This paper reflects the views of the author and should not be construed to represent the FDA’s views or policies.

ORCID iDs

Yoonhee Kim

Roberto Crackel

Yuhao Li

Joanne Lin

Hye Soo Cho

Yun Wang

References

Kim

Crackel

Cho

Wang

Beyond time in range: hidden statistical challenges of continuous glucose monitoring data in diabetes drug development. J Diabetes Sci Technol. 2026;20(1):229-230.

Food and Drug Administration. Submitting continuous glucose monitoring data in clinical trials. Guidance for industry, technical specifications document. 2026. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/submitting-continuous-glucose-monitoring-data-clinical-trials

Food and Drug Administration. Digital health technologies for remote data acquisition in clinical investigations guidance for industry, investigators, and other stakeholders. 2023. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/digital-health-technologies-remote-data-acquisition-clinical-investigations

Kok

Walter

Lee

Gaynanova

Impact of missing data and monitoring duration on downstream analyses in continuous glucose monitoring. Diabetes Care. 2026;49:1031-1039.

Zhan

Fisher

Zhang

XD.

Occurrence and impact of missing data on CGM metrics-analysis of data from Type 1 Diabetes Exercise Initiative (T1DEXI). Diabetes. 2025;74(suppl 1):233-OR.

Wang

Kim

, et al. Statistical methods for handling missing data to align with treatment policy strategy. Pharm Stat. 2023;22(4):650-670.