Abstract
A paper by Drs Okada and Sengoku that appears in this issue of TIRS shows how data from Clinicaltrials.gov can be used for research on the pharmaceutical industry. This commentary identifies several challenges associated with using these data for research and concludes with 3 recommendations from a statistical perspective.
Recently, I had the pleasure of reviewing the paper by Drs Okada and Sengoku that appears in this issue of TIRS. 1 As readers will see, the authors used data from ClinicalTrials.gov to describe how strategic alliances have changed pharmaceutical companies’ portfolios over time. Some of the limitations inherent in using the ClinicalTrials.gov database are not widely appreciated by researchers.
ClinicalTrials.gov, initially established in 1997, is a website maintained by the National Library of Medicine at the National Institutes of Health. 2 When it was first released to the public in 2000, ClinicalTrials.gov was a registry of clinical trials of experimental, FDA-regulated treatments for serious and/or life-threatening conditions. Today, the scope of ClinicalTrials.gov is much broader than at its inception. Legislation and regulation from 2007 to 2016 required more types of studies to be registered (interestingly, not phase 1), specified the many details of the study design to be reported, and required study sponsors to report summary results. 3 In addition to these legal mandates, several nongovernmental bodies have adopted policies to incentivize pharmaceutical and device companies to register their trials. For example, in 2005, the International Committee of Medical Journal Editors adopted a policy that required sponsors to register their trials as a precondition for publication in prominent medical journals.
As of July 2018, ClinicalTrials.gov had more than 277,000 studies registered, 4 making it an attractive source of data for research. However, the nature of the registry itself—with its frequently changing requirements—makes some research quite challenging. For example, suppose a research team wanted to investigate when a company first began development of products for diabetes. To answer this question, we might find the earliest record in ClinicalTrials.gov of the company sponsoring a diabetes trial meeting our operational definition (an “eligible study”). This approach might be sensible if we were assured that all diabetes trials were registered in ClinicalTrials.gov. However, this assumption is generally not valid because the registration requirements have changed substantially since 2000. A more valid approach would be to require a baseline period of 1 to 5 years starting in 2008 (when more studies were required by law to be registered) during which the company had not registered a diabetes trial and then find the earliest record of an “eligible study” after the baseline period. But we would still need to be concerned about the possibility that a company’s first research in diabetes was not registered because—as a phase 1 study—it was not (and is not currently) required. These issues of temporal changes in registry requirements are just a few we would need to consider for our research. We would also need to be aware that the accuracy of information in the registry is the responsibility of the sponsor and that the information can be updated at any time. Indeed, research using secondary data can be quite nuanced, and thus it requires great care.
On this note, I would like to conclude this commentary from my own statistical view of the world with 3 recommendations for researchers who are considering the use of data from ClinicalTrials.gov. First, incorporate the experience and insights of the experts at ClinicalTrials.gov 5 and of an experienced regulatory affairs professional who can shed light on how the practices and standards for trial registration vary from sponsor to sponsor. Second, consult a statistician or pharmacoepidemiologist with experience in the analysis of administrative or secondary data analysis: the registry of studies in ClinicalTrials.gov has direct parallels to an electronic health record of patients. Last, emphasize the practical significance of your findings (eg, measures of association such as relative risks); because ClinicalTrials.gov is a large data source, clinically or practically trivial associations may have small P values.
With the appropriate methods, research using secondary data—such as those from ClinicalTrials.gov—can generate valuable insights. I applaud Drs Okada and Sengoku and all researchers who undertake this worthwhile and challenging endeavor.
This article does not contain any studies with human or animal subjects by any of the authors.
