Abstract
In hand surgery trials, it is often possible to take several measurements from the same patient, because many disorders here affect bilateral or multiple structures, such as the hand itself, the finger joints or the tendons. Most conventional statistical analyses that take place on the level of hands, digit rays or joints rather than patients violate the assumption that observations should be independent.
Furthermore, ignoring the multiplicity of data inflates sample size and thus may lead to spurious significance. This article describes three options to deal with such problems. First, the analysis can simply be restricted to only one measurement per patient. Second, a self-controlled design may be advantageous for conditions that usually have a bilateral pattern. Third, complex statistical modelling (involving generalized estimating equations) can be used to analyse all available measurements with adjustment for data dependency.
SCENARIO
You are planning a randomized controlled trial which compares two different prostheses for metacarpophalangeal (MP) joint replacement in rheumatoid arthritis. Your hypothesis is that the functional results, in terms of joint mobility, of the two prostheses are different. Let us further suppose that in your hospital about 90 MP joint replacement operations are performed per year, all of which would be eligible for your study.
However, in some of these operations, the pattern of disease requires the replacement of only one joint, while other patients need multiple joint replacements during the same operation. Some patients even undergo surgery of both hands (usually with an interval of a few months between the two operations). In effect, the 90 hands scheduled for surgery might actually belong to only 60 patients, because 30 patients contribute data from both hands into the study. Furthermore, the 90 operations comprise about 180 joint replacements, since on average about two joints are replaced per hand.
You wonder which units of analysis should be used in your study: patients, hands, fingers, or joints? Is any study design capable of adjusting for such correlated observations? This article explains some of the problems associated with such situations and describes different methods for resolving them (Table 1).
INDEPENDENT VERSUS CORRELATED DATA
Treatment outcomes are only partly determined by the treatment itself, because patient characteristics such as age, genetics, disease stage, comorbidity, social support, etc. also play important roles. In hand surgery, such covariables may be grouped into three clusters: those that vary (1) between patients, (2) between hands, and (3) between fingers. Of course, variation is larger between two different individuals than between two different fingers of the same individual.
To rule out such interdependencies of data, statistical tests usually require that observations are independent from each other. However, as many of our organs are bilateral and some are multiple (such as fingers or teeth), clinical researchers sometimes violate this assumption of independency. Statistical reviewers of major medical journals complained that about 4% of submitted articles used inappropriate units of analysis (Gore et al., 1992). Also in the surgical literature, “one consistent problem ... concerned the reporting of two operations on the same patient as if they were independent” (Morris, 1988).
Treating each finger of a patient as an independent observation is misleading for two reasons: First, it leads to an underestimation of within-group variability. Second, it inflates sample size. Both effects may lead to wrong results (Altman and Bland, 1997). In addition, an incorrect unit of analysis may sometimes lead to outright absurdities: Andersen noted a trial that “resulted in the apparent conclusion that after 1 year 22 per cent of the patients, but only 16 per cent of the legs, have expired” (Andersen, 1990).
BACK TO THE SCENARIO
A study which analyses the results of 180 replaced joints as if these data come from 180 patients, is scientifically unacceptable. Counting fingers instead of subjects underestimates variability thus leading to spurious significance.
Similarly, analysing hands instead of patients also underestimates variability, but only if one patient contributes both hands to one treatment group. If patients receive different procedures to the left and right hands, neither intervention nor control group contain two sets of data from the same patient. Thus, calculating a summary measure for each hand, such as mean and standard deviation of the measurements from each of the operated fingers, would not be prohibited. In consequence, an unpaired t-test would not be wrong, either.
Hand surgeons, however, have experienced difficulties in performing two different types of surgery in the same patient (Agee et al., 1992). Patients, particularly those in whom the first operation was successful may refuse to have another type of surgery on the second hand. Under these circumstances, the most puristic solution would be to include only one operation per patient, e.g. the first operation.
In our exemplary trial, the partial exclusion of patients with bilateral disease and identical surgery for both sides therefore would allow a joint analysis of the remaining bilateral and unilateral cases. However, the power of the study (i.e. the ability to detect a difference if it exists) would not be very high, because unpaired testing does not take advantage of performing within-subject comparisons. In those situations, where only a few cases are bilateral, this approach would be easy to use without loosing too much power. In diseases with a typically bilateral involvement of hands, however, it would be counterproductive to sacrifice the within-subject comparisons, because much more patients then need to be recruited to the study. In consequence, you hope for a better option which allows you to analyse the bilateral cases with the same operations on both hands.
SELF-CONTROLLED DESIGNS
In situations, where the majority of patients suffer from bilateral disease, the optimal research design is to perform one type of surgery on the left and the other on the right side, i.e. a completely self-controlled design (Louis et al., 1984). Such studies have various advantages: Each patient serves as his own control, thus increasing the internal validity of the study. Randomization still plays an important role, because chance (and not the surgeon) must decide which side receives which treatment. The analysis of such trials is easy, because the paired t-test not only takes data dependency into account, but uses it. The required sample size (in terms of patients) is at least halved, because two results per patient can be used for analysis. An even smaller sample may be necessary, since the “background noise” in the data caused by interindividual variation is eliminated in the self-controlled design. Results can be easily presented by simply counting how many patients favoured one or the other type of surgery. Readers who are interested in an example should study a recently published comparison of endoscopic and open carpal tunnel release in bilateral cases (Ferdinand and MacLean, 2002), where both operations were carried out sequentially in the same session.
One potential problem may occur, if much time lies between the first and second operations. Patients may move away or die, thus causing missing data. As mentioned above, they also may retract their consent to the study protocol, because the first operation was so successful. Such cases complicate analysis and should be avoided by keeping the time interval between the operations as short as possible.
Methodologically, there is little difference between multiple simultaneous interventions and multiple interventions over time. Multiple interventions over time, however, may also be performed on the same organ, e.g. revisional surgery for recurrent disease. A comparison of first and second treatment courses on the same organ is called a cross-over design. It is used more often in drug than surgical research, usually for chronic hand disorders, e.g. a comparison of two topical creams for rheumatoid arthritis. The only problems here are carry-over effects of the first treatment (e.g. due to long half-times of drugs) and changes in disease severity over time (Louis et al., 1984). Again, randomization is necessary to determine the time sequence of treatments.
BACK TO THE SCENARIO
In our study, those 30 patients who are scheduled for bilateral MP joint replacement surgery could be recruited into a self-controlled design. Patients would then be randomly assigned to undergo one type of surgery on one hand and the other type of surgery on the other hand. Of course, some hands require only one or two MP joints to be replaced, while others require four replacements, but by randomizing a larger number of patients such differences will be roughly evenly distributed among the two groups.
When selecting the primary outcome variable, surgeons usually prefer objectively measurable data, such as the flexion angle of the replaced finger joint. From the patient’s perspective, however, a more appropriate outcome measure would be a hand function score instead of finger-specific measurements. It is more relevant that the patient is able to hold a glass, spoon, or key than achieve 90° flexion in a given joint (Troidl et al., 1987). Since movements of adjacent fingers depend on each other, the digit with the poorest outcome determines overall function.
A self-controlled trial, nevertheless, could only include the 30 bilateral surgery cases, whereas the remaining 60 unilateral cases would require a separate study as outlined before. Therefore, the analysis would be split into two different subsets, the results of which are difficult to combine (Curtin et al., 2002).
GENERALIZED ESTIMATING EQUATIONS
What we are looking for is a statistical method that takes into account the correlated nature of the data and is able to use all available data, regardless of whether a patient received one, two or more joint prostheses on one or both hands. Fortunately, a quite ideal solution was developed in the late 1980s by two American statisticians (Zeger and Liang, 1986). It is called generalized estimating equations (GEEs) and has recently been incorporated into the major statistical software packages, such as Stata and SAS, but not SPSS (Horton and Lipsitz, 1999). It would be too far-reaching for this article to explain the idea behind GEE and, as it is a new and difficult technique, expert statistical expertise is required anyway. Examples of successful GEE modelling of therapeutic and observational data on musculoskeletal diseases are available (Barrack et al., 2001, Wolfe and Pincus, 2001).
Currently, GEE modelling is able to control for data that are correlated on the level of patients. The results for different rays of the same hand, however, are more likely to be correlated than those obtained from the different hands of a single patient. Thus, it would be desirable to have two levels of data adjustment, one for intra-hand correlations and one for intra-patient correlations. Since this is presently more or less only a theoretical option, hand surgeons should not mix both levels and should refrain from randomizing single joints: They should simply randomize hands or patients. Only when treating disorders that affect different rays of one hand without interfering with the outcomes of other rays in the same hand (probably digital nerve injuries), does a randomized self-controlled design seem useful, where two different treatments are being compared between different rays of the same hand. Functional results, however, may be difficult to interpret on the level of rays as compared to hands, so that it is preferable to randomize patients here, too.
RESOLUTION OF THE SCENARIO
After consulting your local statistician, you feel that all 90 patients can be recruited. These 90 patients will contribute 120 hands to the analysis, as 60+(2 ×30)=120. By using GEE, it is possible to analyse bilateral and unilateral cases jointly, with adjustment for between-subject and within-subject variability, thus decreasing the required number of patients. Special sample size calculations for GEE can be performed, but, as a rule of thumb, an ordinary sample size calculation can be performed as if all hands would to different individuals. Free software for sample size calculation is available at http://www.mc.vanderbilt.edu/prevmed/ps.htm (Dupont and Plummer, 1997).
In practical terms, all patients presenting for MP replacement are being classified as either unilateral or bilateral cases. Those who consent to the study protocol are randomized to one type of MP prosthesis. For bilateral cases, randomization is performed separately, i.e. in a “stratified” manner. Each patient receives only one type of prosthesis for the fingers of one hand because, after implantation of two different prostheses types into the same hand, it would be impossible to determine which prosthesis was to blame for bad hand function. In the statistical database, one entry per hand is recorded: commonly the mean value of the operated fingers if finger-specific measurements are used. For patients who undergo bilateral surgery within the study, two separate lines are entered, together with a patient identifier, so that the computer can recognize that both entries are from the same patient. Because less patients are required for a meaningful comparison, the study might be finished earlier than previously thought.
Footnotes
Acknowledgements
We would like to thank Professor Douglas Altman for his expert advice.
