Abstract
Monitoring the quality of clinical trial efficacy outcome data has received increased attention in the past decade, with regulatory guidance encouraging it to be conducted proactively, and remotely. However, the methods utilized to develop and implement risk-based data monitoring (RBDM) programs vary, and there is a dearth of published material to guide these processes in the context of central nervous system (CNS) trials. We reviewed regulatory guidance published within the past 6 years, generic white papers, and studies applying RBDM to data from CNS clinical trials. Methodologic considerations and system requirements necessary to establish an effective, real-time risk-based monitoring platform in CNS trials are presented. Key RBDM terms are defined in the context of CNS trial data, such as “critical data,” “risk indicators,” “noninformative data,” and “mitigation of risk.” Additionally, potential benefits of, and challenges associated with implementation of data quality monitoring are highlighted. Application of methodological and system requirement considerations to real-time monitoring of clinical ratings in CNS trials has the potential to minimize risk and enhance the quality of clinical trial data.
Introduction
Monitoring of clinical drug trials with respect to patient safety and data integrity is a federally regulated mandate dating back to 1938. 1 During the past 10 years, oversight of clinical data has received increased attention. Global regulatory guidance exists in the form of the US Food and Drug Administration’s (FDA’s) “Guidance for Industry Oversight of Clinical Investigations—A Risk-Based Approach to Monitoring,” 2 the European Medicines Agency’s “Reflection Paper on Risk-based Quality Management in Clinical Trials,” 3 the United Kingdom’s Medicines and Healthcare Products Regulatory Agency, 4 and the addendum to ICH E6(R2): “Guideline for Good Clinical Practice, section on monitoring.” 5
While monitoring of clinical trial data is associated with patient safety and data quality, this article focuses on monitoring of efficacy outcome data for the purposes of ensuring quality of that aspect of clinical data. Primary efficacy outcome data determine, to a large extent, the success or failure of a clinical drug study.
The FDA guidance encourages sponsors to change their approach to monitoring in 2 fundamental ways: first, to take a proactive, risk-based approach to ensuring clinical trial quality and, second, to monitor trial activity remotely, from a centralized location rather than from on-site reviews.
The past 5 years have seen a proliferation of companies that offer risk-based data monitoring (RBDM) programs as well as publication of white papers and “best practice” directives, 6 the most comprehensive being guidance from the nonprofit TransCelerate BioPharma. 7 These papers have defined risk-based data monitoring and documented its advantages over on-site monitoring practices. Surprisingly little has been published about the application of RBDM to central nervous system (CNS) clinical drug trials. Although the scales selected for use in these clinical trials have demonstrated acceptable levels of reliability and validity (we will not be discussing the psychometric properties of the scales themselves), when utilized during the course of the study, reliable and accurate outcome data is not always the result.
Sponsors developing treatments for CNS disorders face huge challenges above and beyond those common to many therapeutic areas, such as heterogeneous patient populations, 8 medication nonadherence, 9 and the inclusion of professional research subjects. 9 CNS trials also must contend with the use of FDA/EMA-sanctioned subjective outcome measures, modest drug effects, and marked, highly variable placebo response rates. Although various approaches have been used to minimize these effects, including alternate trial designs such as adaptive randomization or group sequential designs, 10 the problem of high placebo response rates continues to plague the industry.
This article addresses the methodologic considerations and system requirements of a risk-based data monitoring platform for CNS trials, with a focus on real-time monitoring of clinical ratings at the individual subject level. The authors go beyond these topics to explain how a RBDM platform may positively impact subject selection and adherence, rater training, and investigator involvement. The article closes with some thoughts on challenges facing RBDM of CNS trials today.
Methodology Considerations
CNS trials present unique challenges for monitoring clinical state data quality, primarily because of their reliance on subjective outcome measures that are difficult to standardize. Much of safety data relies on standard physiologic and laboratory measures, but where CNS drugs are concerned, many difficult to quantify side effects are more frequently reported even in patients on placebo. And CNS trials, like those in other therapeutic areas, must contend with threats to data quality arising from globalization and increasingly large and complex protocols.
A data monitoring plan is determined by the objectives and purpose of the study, the trial design, the instruments being utilized, and its size and complexity. Generally, complex protocols require more monitoring than relatively simple protocols. For example, the inclusion of remote independent raters in addition to site raters will enlarge the scope of monitoring, as will multiple baseline and randomization visits in the study. However, relatively simple and straightforward protocols can harbor serious risk and should not be exempt from the process since problems with eliciting and recording subjective ratings of severity will always be present in CNS trials.
Regulatory/ICH guidance provides considerable latitude in how or what to monitor, as long as it satisfies the goal of ensuring adequate protection of the rights, welfare, and safety of human subjects and the quality of the clinical trial data. 2
There is an early and fundamental role played by the investigational review board (IRB) and principal investigator (PI) in implementing quality assurance measures to protect human research subjects and data integrity. There are, however, aspects of protecting “data integrity” that go well beyond what is traditionally done by IRBs and PIs. Evaluation of the quality and consistency of data originating from research sites is best performed by an independent third party. Data Monitoring Committees (DMCs) provide an example of an independent body reviewing study data, although the frequency and scope of the monitoring differ from what is proposed for RBDM of CNS trials. The sponsor may utilize a DMC to review safety data in an unblinded fashion to protect the validity of the trial and its conclusions. DMC members are independent of the sponsor and are knowledgeable in the subject area.
The need for independent, third-party analysis applies to RBDM as well, and should be approached in a blinded fashion, ensuring data are always considered and compared the same way, independent of subjects, sites, raters or countries.
The goal of RBDM is to prevent or mitigate important and likely sources of error that threaten the quality of critical data. What defines critical data, risk, error, and successful mitigation or prevention must be clearly stated beforehand and agreed to in discussions with the sponsor.
Definition of Critical Data
The term “risk” in the context of a clinical drug trial is most commonly associated with patient safety. However, from the standpoint of study outcome of a CNS trial, risk refers to those factors that have the potential to compromise the validity of the study results. From this perspective, risk may include such factors as the complexity of study design, number of countries/languages represented, type of assessments and training required, etc. A risk-based data-monitoring project plan should be thorough in its anticipation of risk as well as flexible in its ability to account for and manage risk that is identified during the course of the trial.
In the context of determining efficacy, “critical data” are data that are essential to the reliability of study findings, specifically, those that support the primary objectives of the study. It generally refers to primary and select secondary outcome measures and other outcome measures for which the study team requests oversight and are required to demonstrate signal detection. In addition to clinician reported outcome measures (ClinROs), patient-reported outcomes (PROs) and caregiver reports may help to inform on the quality of the data, and assist in determing the need for remedial intervention with clinicians.
While all data points from relevant measures are typically collected, the focus is on specific, predefined study visits, and scale features/relationships of particular relevance to data quality. Choosing key “risk” indicators pertinent to clinical outcome data quality and, more important, setting the threshold for toleration of that specific risk, must be prospectively determined in the monitoring plan, as well as a plan for resolution of issues once identified. The 2013 FDA guidance addresses assessing risk but does not specify how it should be done. 2
Rater error is a significant source of variance and therefore risk to data quality in CNS clinical drug trials, 11 and the data monitoring process is designed, in part, to detect problematic scale use. Rater error can produce noninformative study data that is obtained using practices that violate the manner in which the scale was intended to be administered or scored in the population for which it was designed. Efforts to minimize measurement error include activities prior to study implementation, for example, rater training, and during the study, for example, use of remote, independent raters and audio and video recording of clinical interviews for review. However, these activities are not uniformly applied to studies, and they have limited ability to remove all sources of error. 12,13 Examples of these kinds of errors are provided in the next section, “Key Risk Indicators.”
Many risk-based data monitoring programs utilize algorithms to identify inconsistencies in psychometric rating instruments. Algorithms are preset rules of item association that assist in evaluating psychometric data for risks associated with rater error. Creating algorithms requires deep understanding of the psychometric properties of the instruments as well as the clinical population for whom they were designed. It also requires empirical validation of how predictive these algorithms are of actual error to develop accurate probabilistic models.
The more monitoring that has been done in specific CNS indications, the more likely the algorithms will correctly identify risk because the sensitivity of the algorithms is derived from a large sample with a wide range of illness severity. For example, using Positive and Negative Syndrome Scale (PANSS) data aggregated across 8 phase III schizophrenia trials (n=4096), Yavorsky et al 14 evaluated the sensitivity of algorithms to detect the risk of problematic administration of the PANSS. These algorithms were then formalized by a working group and applied to the Novel Methods leading to New Medications in Depression and Schizophrenia (NEWMEDS) and Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) databases, both containing large troves of PANSS data from schizophrenia trials. 15
Key Risk Indicators
Rater training is most commonly performed prior to study start-up, covers basic scale administration and scoring conventions of primary and secondary outcome assessments, and may include a performance component (eg, video scoring, mock interview). However, rater drift, or raters becoming less reliable in their ratings over time, is well documented. 16
It is difficult to know the extent to which rater training mitigates certain rating practices that are associated with noninformative data, such as assigning identical scores across all items of a scale. For example, the statistical probability that a patient presentation is accurately represented by no item fluctuation on a 30-item PANSS across several visits is extremely low, and may represent an issue unrelated to the quality of the rater training. In a recent analysis of data from a risk-based data monitoring program for a global schizophrenia trial utilizing the PANSS, Engelhardt et al 17 found that the most frequent scoring practices that were associated with noninformative data were categorized by Inconsistent Item Relationships (ie, assigning a score to one item that directly contradicts the score of another, similar item within the same scale) and Low Variability (ie, lack of item score change across visits). Another data-monitoring analysis from a PANSS trial characterized risks to PANSS data integrity as “large between-visit PANSS total score changes, erratic PANSS item score changes, and 100% identical PANSS item scores from visit to visit.” 18 The factors associated with these scoring patterns are multiple, and may include an atypical and valid patient presentation or lack of understanding of scoring conventions, poor interview technique, or fraud.
The scoring anomalies mentioned above are not routinely addressed in rater training. When they do occur and are captured and investigated as part of an RBDM process, an opportunity is created during the course of the trial to identify, correct, and prevent further occurrence of error. Rater training serves many important functions, such as imparting protocol-specific information and basic scale administration and scoring guidelines. However, its potential to prevent or mitigate the kinds of risks to data quality identified by monitoring is currently limited.
A risk-based data monitoring platform is built upon the assumption that certain inconsistencies in data pose a threat, or risk, to meeting the objectives of the study. This section describes categories of risk and the process of determining the magnitude of risk. Not all risk carries with it the same potential to negatively impact the quality of the data. An RBDM platform should combine computerized technology with deep expertise in the design and execution of the type of trial in question to effectively identify and manage risk.
One category of risk indicators applies to the psychometric properties of the scale, specifically, the expected relationship between items of a scale (intra-scale item consistency) and the expected relationship between items and total scores on different scales (inter-scale consistency). Algorithms are created to define these relationships and the threshold that determines when an algorithm, or rule, is violated. If the risk is deemed low, then the threshold can be set high, and vice versa. The algorithms perform automated checks for consistency based on the binary and factorial relationships that exist between items on a given scale. It is important to note that algorithms take into account not only factor analytic properties but also previous data from similar studies to detect atypical, or outlier, presentations.
Another category of risk indicators includes scoring that appears non-random. This includes score patterns, for example, a majority of items are scored the same; variability, for example, a majority of item scores do not change from prior visit; unexpected change, for example, scores that are significantly different from the mean change from the prior visit, or a dramatic increase or decrease from the prior visit, and so forth. Complexity and sensitivity is added by taking into account visit (eg, screen, randomization), frequency (one or several instances), rater (same or different), and illness severity (high or low total score). For example, variability becomes a concern when it involves the same rater and the same subject and the total score is mild or above. Similarly, unexpected change takes on particular significance when it occurs at certain visits, such as screening and randomization. This can be predictive of rater bias, patient expectation, or actual error: it is the role of the RBDM provider to determine what may be behind such potentially aberrant scores.
Algorithm-generated values indicating a potential risk to data quality require interpretation to have useful impact. The algorithms operate using the internal logic of the scales themselves paired with empirical data from multiple large trials; in this way, they operate largely independently of rater, visit, site, or country. This ensures that every individual “case” is treated objectively and analyzed in a systematic, repeatable form. When algorithms are violated, the RBDM clinical team judges the level of risk to the interpretability of data, and determines any appropriate follow-up action. Risk thresholds can be adjusted during the course of the trial to focus on evolving areas of greatest need or sensitivity/specificity concerns. For example, a specific algorithm that detects a discrepancy of 2 points between 2 similar items on 2 different scales may be modified or discarded because the algorithm had a high false positive rate.
Rule violations should undergo expert clinical (human intelligence) evaluation to determine the role, if any, of noninformative data. As mentioned previously, noninformative data are data collected during the course of a clinical trial that cannot be assumed to be correct and thus subtract from the statistical power of the study as well as decreasing possible signal.
In this regard, data-monitoring algorithms may also be useful in a post hoc analysis to estimate the contribution of rater error. Retrospective use of an RBDM platform is a particularly useful tool to aid in future protocol design. Findings from retrospective analyses can alert sponsors to problematic instrument use or study design features that need to be reconsidered before moving forward. Additionally, a reanalysis of unblinded data based on identification of noninformative data may inform continued investment in a compound. A retrospective, unblinded analysis of a failed depression trial with a compound of known efficacy was conducted by partitioning the data into “probable” and “improbable” based on algorithm violations. Sites with noninformative data had reduced ability to distinguish between placebo and active compound compared with those sites that did separate drug and placebo and had a lower frequency of noninformative data. 19
Once data are determined to represent a risk to study outcome, several kinds of intervention can take place, including source document review, a discussion with the rater or audio/video review of the interview, if available. The purpose of a discussion is twofold: to understand the source of the inconsistency in the data and to provide corrective feedback if necessary. It is important that the call is conducted by a research clinician who has experience with the study population as well as the instruments being monitored, and who conducts the call in a collegial and supportive manner. For example, if 2 item scores contradict one another, the research clinician seeks to understand the rationale for the incongruity. The rater may provide a good reason for the atypical scores, the inconsistency could reveal a lack of understanding of scoring conventions or scale administration guidelines, or the rater could acknowledge that an erroneous value had been entered through inattention or some other cause. In this way, case discussions can occur as targeted training events tailored to each rater’s individual needs utilizing real in-study data.
Interventions in a prospective RBDM project plan need to specify a process for clearly communicating and documenting necessary corrective and preventive actions to ensure issues are not repeated over the course of the trial. For example, in the event that a severe issue is identified and the rater has failed to demonstrate improvement through remediation, the rater may be put on hold or removed from the study. These actions can be determined by the RBDM provider, the sponsor and, if applicable, the rater training provider.
Only with time and more experience can the effectiveness of various interventions be firmly established. There is a theoretical risk of unknown consequences from introducing such RBDM-based interventions at the site and rater level during a trial. But to ignore a problematic rater who would not have been identified without the RBDM approach makes little sense.
System Requirements
Information technology is an indispensable component to RBDM. 20 The patient outcome information (POI) to be monitored must be securely and efficiently stored and reliably/readily retrievable. To those ultimately responsible for assessing and alleviating potential risk, data must be presented in formats friendly to analysis and decision making.
System architecture must be capable of supporting both the operational and the clinical RBM worklows. 21,22 Centrally monitored RBDM data are maintained in relational database management systems (RDBMSs) 23 distinct from the EDC systems wherein site personnel and/or raters perform the data entry of POI. RDBMSs are the industry standard for data systems; they support large data sets, can be optimized to accommodate frequently repeated search and retrieve routines and other data mining activities, and are easily accessed and maintained using Structured Query Language (SQL), another industry standard.
Extract-Transform-Load (ETL) data processing programs are used to incorporate study-specific EDC system data extracts into the RDBMS. ETL scripts must include subroutines able to identify data anomalies and inconsistencies, and where possible, minimize data input rejection and consolidate/clean data variabilities using heuristic data-processing methods. Because the timelines of RBDM results are almost as important as their accuracy, ETL programs must be fast, efficient, and scalable in ways that facilitate transformation of EDC extracts of varying volumes, structure, and content. It is critical that both ETL methodology and the RDBMS architecture not only support source data variability, high-volume input/output, data integrity analytics, and customer-targeted reporting requirements common to all business intelligence data systems, but they also must facilitate the development and application of risk-categorizing algorithms.
Fundamental to RBDM is the uniform application of algorithms that identify either anomalous or discrepant data relationships or statistical outliers within POI. 24,25 These findings may then be addressed by the sponsor, as per the study statistical analysis plan. Multi-arity rules (ie, rules with multiple arguments) can be constructed using the total, subtotal, or individual domain scores of a given psychometric scale, comparing the domain scores within a particular scale or within one or more of the other scales administered in a clinical trial. Discrete sets of rules can be established as “standards” for a given scale and then applied consistently across concurrent clinical trials. Rules can be applied to aggregated subsets of POI (eg, by site, subject, rater, protocol assessment sequence) to discover scoring convention or protocol deviations within a particular study, for example, a positive score on a suicide measure that the protocol stipulates is exclusionary at screening.
Risk-stratified outcome data must be further aggregated and performance-optimized (normalization techniques applied, such as table column indexing, and segregation into data subsets to permit efficient retrieval per perspective or type of analysis) to support risk assessment and mitigation activities. Data normalization and automated data-processing techniques that yield dependable analytical results and timely exception-report delivery are essential. Distinct data schemas may be needed to support end-use (web-based) applications where the results of expert clinical review are captured, analyzed, and assessed in terms of their potential use for raising and resolving issues. Actionable risk-stratified data must reliably link only the pertinent POI with the raters, sites, or sponsor contacts with whom risk mitigation might occur. The extent and success of remediation efforts can be adequately gauged only when the RDBMS and its related applications maintain evidence of these outreach events. A system with dependable scheduling and communication-retention functionality in this regard is indispensable.
As with most computerized systems used in clinical trials, systems for RBDM need to comply with government and other regulatory-body requirements (US FDA, 21CFR Part 11; US DHHS HIPAA; European Medicines Agency, GCP & Clinical Trial Directives, etc). To that end, RDBMS and end-use application design and development must adhere to pre-established requirements-gathering, documentation, testing, deployment, and maintenance procedures. EDC system source files and data, and any records transacted and collected (electronically stored and subject to end-user input or alteration) during risk analysis and mitigation, must remain secure using the latest network firewall technology and be archived in a way that retains a history of any change to the RBDM data set. Because of the sensitive nature of patient medical information in general, and the RBDM vendor’s responsibility for nondisclosure during an active clinical trial in particular, it is also imperative that access to RBDM source data and results only be permitted according to well-defined, role-based protocol.
Conclusions
A centralized risk-based approach to data monitoring in clinical trials is now a regulatory obligation. However, there is a lack of guidance specific to CNS trials. Methodological and system requirements must be carefully considered in the development of a risk-based data-monitoring program, and coupled with an experienced clinical team to evaluate data risk. Such programs may preserve data integrity and increase signal detection.
One of the biggest challenges facing RBDM in CNS clinical drug trials is providing empirical support of effectiveness. Monitoring data for completeness, consistency, and accuracy over time is not likely, in and of itself, to reliably improve data quality. Similarly, advocacy by regulatory authorities does not obviate the need or importance of examining the effectiveness of this approach.
Metrics must be developed prior to implementation that can demonstrate effectiveness of the monitoring process. Several studies have demonstrated the effectiveness of RBDM in terms of decreased error rate and enhanced internal consistency with regard to various outcome measures used in CNS trials. 26 –28
One study retrospectively evaluated the sample size needed to meet the protocol’s primary objectives by considering the effect of RBDM on data quality. Compared to the size of the sample needed without RBDM, the RBDM sample was reduced by 30%. The authors conclude that using an adequate sample size with high-quality data collection through RBDM could result in more reliable results, more timely trials, and fewer resources. 29
Retrospective analysis of the type and frequency of noninformative data is particularly useful to better understand the variables associated with failed trials or noninterpretable study results. Indicators as far-reaching as time to enter data in EDC, or large discrepancies between global clinical ratings and symptom-specific clinical ratings, may be associated with higher rates of rater error.
It is important, in the identification of data as noninformative, that the process guards itself from introducing its own bias. Atypical clinical presentations do occur. In an RBDM platform, all data are objectively evaluated in a standardized manner. It is imperative that an experienced research clinician determine the likely source of the anomaly.
Enlisting the cooperation of clinical investigators and raters is also critical. Data monitoring may be viewed by site clinicians as an unwelcome intrusion and challenge to assessment skills, similar in nature to the request that clinical interviews be audio- or video-recorded for review. It is essential to ensure that research staff are aware of the goals and methods of RBDM, and its advantages. RBDM creates an opportunity to join the clinician in the shared goal of providing informative data and offers the clinician a direct link to quality assurance. By reviewing an atypical presentation, for example, or identifying a protocol violation manifested in a scoring profile, clinicians involved with patients in a study are reassured that quality checks are in place.
RBDM also creates a direct line to the sponsor from investigators and from the sponsor back to investigators. It is mutually reassuring to know that the clinicians supporting RBDM will act as fair witnesses and report accurately to the sponsor irrespective of how it may impact the outcome of the trial. Investigators also recognize the value in addressing rater drift and appreciate the supervisory role RBDM applies to ensuring that scales are continually being used correctly.
Footnotes
Declaration of Conflicting Interests
The authors CM, NE, CY, MM, and GD are full-time employees of Cronos CCS, Inc, a company that provides RBDM services to the clinical trial industry.
Funding
No financial support of the research, authorship, and/or publication of this article was declared.
