“All for One,One for All”: Evaluating Composite Endpoints in Clinical Trials

Abstract

Introduction

The outcomes or endpoints that are chosen for a study are essentially the questions to be asked about the treatment: Will this treatment help patients live longer? Does this treatment cause more bleeding to occur? (The terms “outcomes” or “endpoints” can be used interchangeably; however, “endpoints” will be used in this article, as it implies something that can be measured.) Ideally, endpoints should be easy to choose depending on the questions the authors (and hopefully you, the readers) think need to be answered. However, there are important practical limitations as to which specific endpoints can be studied, as well as to how many different endpoints can be studied at the same time. One way to address either (or both) of these challenges is to use composite endpoints, but it is important to remember that doing so comes with its own limitations. These limitations need to be understood before the results of a study can be fully interpreted.

Definition of composite endpoints

A composite endpoint is one that encompasses several different individual endpoints (see Table 1). For example, investigators may set their study's endpoint as “total mortality or hospitalization or need to have surgery.” This means that if a patient dies, or is hospitalized, or needs surgery, the investigators will count that patient's experience towards the composite endpoint.

Table 1

Examples of composite endpoints

Field of study	Example
Cardiology	Cardiovascular death or hospital admission for CHF, MI, or stroke⁵
Respirology	Death, need for intubation and mechanical ventilation, or need to administer a steroid⁶
Pediatrics/neonatology	Death, shoulder dystocia, bone fracture, or nerve palsy⁷
Nephrology	Doubling of baseline serum creatinine, onset of ESRD or death⁸
Neurology	Progression of disability or need for institutionalization⁹

CHF = congestive heart failure; ESRD = end-stage renal disease; MI = myocardial infarction.

Potential uses for composite endpoints

Using composite endpoints helps investigators address the following 2 potential limitations:

Measurement of rare endpoints

Rare endpoints (such as death) are often important outcomes to include in a clinical trial, but because they are not expected to occur very frequently, they are difficult to study: the trial would need to include a large number of patients and/or follow these patients for a long time in order to get meaningful answers.¹ One solution to this problem is to use composite endpoints. By combining a rare endpoint with other endpoints in a composite, investigators can conduct their study with fewer patients and/or complete it more quickly.

Need to ask many questions

Having a number of endpoints (that is, asking many different questions) is often very important for a clinical trial. Looking at several different endpoints may allow investigators to have a better understanding of the overall effect of the treatment on patients. However, asking too many questions can also be problematic because of the rule of multiplicity (see Table 2), that is, the more individual endpoints a study has, the greater the probability that a difference will be seen between the treatment and control groups simply because of chance alone.^7,8 Composite endpoints allow investigators to “consolidate” several individual questions, helping to address the problem of multiplicity.

Table 2

Rule of multiplicity

Number of endpoints/questions	Probability of finding a difference by chance alone (%)
1 2	5^* 10
3 5	14 23
10	40
20	64

The probability for a primary endpoint is usually set at 5%.

Potential problems with composite endpoints

Composite endpoints also present particular challenges, however. When trying to judge the benefits/risks of the therapy being studied in a trial that uses composite endpoints, you have 2 choices. One is to base your interpretation on the results of the composite endpoint alone. The other is to also look at the results of the specific components of the composite endpoint. Below is a list of questions you should ask yourself in order to determine the potential limitations stemming from the use of composite endpoints. In general it is always a good idea to look at the results for the individual components, but the more limitations there are, the more important it becomes to separate them out to fully understand the results.

1. Are the composite endpoint(s) specified beforehand?

The investigators of a trial should assign what the composite endpoints will be before beginning. This avoids the bias associated with seeing trends in the data and choosing endpoints that show the desired differences. Investigators should also describe how they will measure the individual components of the composite endpoint (that is, who will judge if the patient has had a myocardial infarction [MI]? What will that person look for?).

2. How do the authors present the results of the study?

Ideally, the authors of the study should show not only the results of the composite endpoint, but also the results for its individual components. Some have argued that in any study using a composite endpoint as a primary question, the secondary questions should always include the results for the individual components.⁴

3. How important are the individual components being gathered into a composite?

An endpoint is only as good as its significance to the patient. Thus, it is important to be able to judge whether or not the individual components that have been gathered into a composite each mean the same thing. Take the respiratory example from Table 1: need for intubation and mechanical ventilation, or need to administer a steroid.⁶ It is clear that death and need to use mechanical ventilation are quite serious endpoints; it could be argued that the occurrence of either of those would affect the life of the patient significantly. In contrast, the need to receive a steroid is likely not as serious as the other 2 individual endpoints.

The idea of surrogate endpoints must also be kept in mind. At best, surrogate endpoints (e.g., an increase in blood pressure, an increase in potassium, or the need to start a certain drug) are generally considered to be a substitute for endpoints that patients will actually experience (such as a stroke, kidney failure, or need for hospitalization).

4. Which individual endpoints occurred most often? Which occurred least often?

After determining which endpoints are most important, it is important to look at which are contributing most and least to the overall result.¹⁰ For example, in a study that looks at the composite endpoint of death or bone fracture (see Table 3) you may determine that you consider death to be the more important endpoint. The results for the composite endpoint seem to show that the treatment group did much better than the control group (65 events vs 100 events). However, notice that there were 5 deaths in the treatment group, and none in the control group. Because there were far fewer total deaths in the study compared to total bone fractures, it could be said that the more important endpoint is being diluted by a less important endpoint. In other words, the overall result of the composite endpoint tells us more about the benefit/risk of treatment on bone fracture than it tells us about death. Thus, one must be cautious interpreting the results of a composite endpoint, particularly if rare endpoints are combined with much more frequent endpoints.

Table 3

Total frequency of endpoints in a study

Patient group	Deaths in each group (n = 524)	Total deaths in all patients	Bone fractures in each group (n = 524)	Total bone fractures in all patients
Treatment	5	5	60	160
Control	0	5	100	160

5. Are the results for the individual endpoints similar?

It is important to consider whether the authors (and more importantly you) expect that the benefit or harm as a result of the treatment will be similar across the various individual components of the composite endpoint.¹ For example, in a study that uses the composite endpoint of MI, stroke, or new-onset heart failure, one might argue that it would make sense that the treatment would have a similar effect on each. That is, if treatment were to reduce MI by 3%, it would likely also reduce the other endpoints by a similar amount. However, this assumption cannot be simply relied upon; it is important to look at the results of the study and determine whether the treatment seems to be causing a benefit or harm for each component, and what the magnitude of that effect is.¹ Therefore a good composite endpoint is one that includes individual endpoints (that have been pre-specified) that are of similar importance, and one in which you would not expect (or should not see) that any one individual component is negatively affected.

6. Are correct counting rules used in the study?

Counting rules refer to how the authors tally the incidence of the different individual endpoints and calculate the total for the composite endpoint. The authors should use the following rules:

Rule 1: Count the total incidence of individual endpoints for all the patients over the entire study.

Rule 2: Count each individual event only once per patient.

Rule 3: Count only the first event when counting total events for the composite.

Consider an example of a 3-year study that measured the composite endpoint of death or MI.

We will follow Jill and Jack, both patients in this study (Table 4). After 2 months in the study, Jill unfortunately had an MI. She recovered from it and remained healthy until 1 year into the study, when she had a second MI. She survived the second MI as well, but passed away from a stroke after 2 years. Jack, on the other hand, had an MI after 2 years in the study and then had no further health problems until the end of the study. Table 4 shows the use of correct counting rules. Note the following about how the data was handled:

Table 4

Counting rules

Patient	Composite endpoint	MI	Death
Jill	✓	✓	✓
Jack	✓	✓
Total	✓✓	✓✓	✓

✓ = a single tally; MI = myocardial infarction

Totals for MI and death that occurred during the entire study are determined (Rule 1).

Jill had 2 separate MIs, but only 1 is counted (Rule 2).

Even though Jill had 2 MIs and died, only a single tally for the composite endpoint is made (Rule 3).

Look for a statement by the authors that they used appropriate counting rules. For example, authors may say: “For patients with multiple events, only the first event will be used for the composite.” Authors may also present the data as “Total occurrence of death as well as death as first event.” If you cannot determine whether appropriate counting rules were used, you may need to infer based on the numbers that are presented in the study. Consider the sample case above. A simple way to infer correct use of counting rules is to first find the sum of the total number of individual endpoints (for Jill and Jack the sum would be 3 [2 MIs + 1 death]). Then, compare this to the total number for the composite: in this case, 2. Since the total for the composite was less than the total number of individual endpoints, you can make an educated guess that the authors seem to be throwing out some occurrences of the individual endpoints and (presumably) are using correct counting rules. However, to know for sure you may need to contact the authors to obtain further information.

Conclusion

Endpoints in a clinical trial are the questions that the investigators set out to answer. The strength of the answer obviously depends on the appropriateness and validity of the original question. Composite endpoints can make the logistics and statistical interpretation of clinical trials easier, but they also pose their own limitations. Using the 6 questions outlined in this article, you should be able to avoid the possible pitfalls of using and interpreting composite endpoints.

Footnotes

Acknowledgements:

The authors wish to thank Mary Mah for assistance with formatting and editing of the final manuscript.

References

Montori

Permanyer-Miralda

Ferreira-Gonzalez

. Validity of composite end points in clinical trials. BMJ 2005;330:594–6.

Schulz

Grimes

. Multiplicity in randomized trials I: endpoints and treatments. Lancet 2005;365:1591–5.

Cook

Gebski

Keech

. Subgroup analysis in clinical trials. Med J Aust 2004;180:289–91.

Freemantle

Calvert

Wood

. Composite outcomes in randomized trials: greater precision but with greater uncertainty? JAMA 2003;289:2554–9.

Yusuf

Pfeffer

Swedberg

. Effects of candesartan in patients with chronic heart failure and preserved left-ventricular ejection fraction: the CHARM-Preserved Trial. Lancet 2003;362:777–81.

Niewoehner

Erbland

Deupree

. Effect of systematic glucocorticoids on exacerbations of chronic obstructive pulmonary disease. Department of Veterans Affairs Cooperative Study Group. N Engl J Med 1999;340:1941–7.

Crowther

Hiller

Moss

. Australian Carbohydrate Intolerance Study in Pregnant Women (AHOIS) Trial Group. Effect of treatment of gestational diabetes mellitus on pregnancy outcomes. N Engl J Med 2005;352:2477–86.

Lewis

Hunsicker

Clarke

. Collaborative Study Group. Renoprotective effect of the angiotensin-receptor antagonist irbesartan in patients with nephropathy due to type 2 diabetes. N Engl J Med 2001;345:851–60.

Courtney

Farrell

Gray

. AD2000 Collaborative Group. Long-term donepezil treatment in 565 patients with Alzheimer's disease (AD2000): randomised double-blind trial. Lancet 2004;363:2105–15.

10.

Montori

Busse

Permanyer-Miralda

. How should clinicians interpret results reflecting the effect of an intervention on composite endpoints: should I dump this lump? ACP J Club 2005;143:A8.