Abstract
Background
Detecting a benefit from closure of patent foramen ovale in patients with cryptogenic stroke is hampered by low rates of stroke recurrence and uncertainty about the causal role of patent foramen ovale in the index event. A method to predict patent foramen ovale-attributable recurrence risk is needed. However, individual databases generally have too few stroke recurrences to support risk modeling. Prior studies of this population have been limited by low statistical power for examining factors related to recurrence.
Aims
The aim of this study was to develop a database to support modeling of patent foramen ovale-attributable recurrence risk by combining extant data sets.
Methods
We identified investigators with extant databases including subjects with cryptogenic stroke investigated for patent foramen ovale, determined the availability and characteristics of data in each database, collaboratively specified the variables to be included in the Risk of Paradoxical Embolism database, harmonized the variables across databases, and collected new primary data when necessary and feasible. Results The Risk of Paradoxical Embolism database has individual clinical, radiologic, and echocardiographic data from 12 component databases, including subjects with cryptogenic stroke both with (n = 1925) and without (n = 1749) patent foramen ovale. In the patent foramen ovale subjects, a total of 381 outcomes (stroke, transient ischemic attack, death) occurred (median follow-up 2·2 years). While there were substantial variations in data collection between studies, there was sufficient overlap to define a common set of variables suitable for risk modeling.
Conclusion
While individual studies are inadequate for modeling patent foramen ovale-attributable recurrence risk, collaboration between investigators has yielded a database with sufficient power to identify those patients at highest risk for a patent foramen ovale-related stroke recurrence who may have the greatest potential benefit from patent foramen ovale closure.
Keywords
Introduction
Cohnheim (1) described the association between cryptogenic stroke (CS) and patent foramen ovale (PFO) in 1877. Since then, progress has been slow at identifying the characteristics of those who are at high risk of recurrent paradoxical embolism. Some factors have been identified, e.g. atrial septal aneurysm (2), but have been inconsistently replicated (3). There is renewed interest in the subject given new diagnostic tools (transcranial ultrasound and contrast echocardiography) and therapeutic options, especially implantable closure
devices. Given that PFOs are highly prevalent in the general population (4) and that CS has many potential causes, it is rarely possible to determine with certainty whether a discovered PFO is pathogenic or incidental in an individual patient (5).
Aims
The aim of the Risk of Paradoxical Embolism (RoPE) Study is to develop mathematical models that can be used to stratify patients by the conditional probability that
an index CS is ‘PFO-related’, and
stroke will recur (6).
These models can then be used to stratify patients in ongoing PFO closure trials with the goal of informing patient-selection for PFO closure in clinical practice. Past studies attempting to identify risk factors for recurrent paradoxical embolism have yielded conflicting results (2,7) and have been limited by extremely low statistical power due to a very low overall stroke recurrence rate (8). Therefore, the first aim of the RoPE Study was to combine databases in order to perform an individual patient meta-analysis that could overcome these methodological and statistical limitations (9). Here we report our experience creating the largest database yet of CS subjects with and without PFO.
Methods
Selection of published and unpublished databases of patients with CS and the development of a collaborative team of international investigators
Potential databases of CS subjects were identified from a systematic review article (10) supplemented by our own literature search and personal communications among investigators. Corresponding authors were contacted directly and invited to participate in the RoPE Study. We invited their participation after reviewing their inclusion criteria, data elements, and availability of primary data (echocardiograms and neuroimaging studies) with the principal investigators. Cryptogenic stroke is defined according to the Trial of Org 10172 in Acute Stroke Treatment classification (11) and excludes subjects with large artery atherosclerosis, cardioembolism, small vessel disease, and strokes of other determined etiology (e.g. arterial dissection, arterial hypercoagulable states). For the ROPE Study, subjects with medium risk sources are considered cryptogenic. ‘Stroke’ is a sudden onset neurological deficit in a vascular territory presumed to be due to focal ischemia after a complete workup. If the deficit lasts for <24 h, it must be accompanied by acute magnetic resonance imaging (MRI) or computed tomography (CT) changes in appropriate locations; otherwise, it was considered a transient ischemic attack (TIA). If the deficit is >24 h, MRI or CT changes are not required. ‘Complete workup’ requires (1) MRI or CT; (2) intra- and extracranial vascular imaging; (3) inpatient or outpatient cardiac monitoring sufficient for the investigator to exclude atrial fibrillation; and (4) transesophageal echocardiography (TEE). The 12 studies (Appendix S1) are Recurrent Stroke and Massive Right-to-Left Shunt Prospective Spanish Multicenter Study (CODICIA) (12), French Patent Foramen Ovale and Atrial Septal Aneurysm Study Group (PFO/ASA) (2), Aortic Plaques and Risk of Ischemic Stroke Study (APRIS) (13), Bern (14), Bern unpublished, Patent Foramen Ovale in Cryptogenic Stroke Study (PICSS) (7), Lausanne, Toronto (15), Sapienza (16), Tufts (17), German (3), and Northern Manhattan Stroke (NOMASS) (18). The principal investigators of each of the 12 databases were included in the RoPE Study Group, with additional coinvestigators as appropriate. The project is highly interactive requiring frequent communications between the RoPE coordinating center at Tufts Medical Center and the investigators at sites in North America and Europe. Communication has been through e-mail, telephone, teleconference and annual face-to-face RoPE Investigator Meetings.
Determining the characteristics of data in each database and specifying dependent and independent variables
Detailed summaries of each database were compiled from study protocols, data codebooks, and case report forms (CRFs), and compared with candidate RoPE variables, previously identified as potentially related to the presence or absence of PFO and to the risk of stroke recurrence (6). For each variable, we determined its definition, how the data were obtained, how the variable was coded, and what effect, if any, the relevant study's inclusion/exclusion criteria might have on the variable. The decision to include a variable was informed by the quality of data collection across databases and on a consensus among investigators regarding their importance.
Clinical variables
Included clinical variables are age (at index event), sex, race, hypertension, diabetes, coronary artery disease, hypercholesterolemia, smoking, a history of cerebral ischemia (TIA or stroke) prior to the index event, medications being taken at the time of the index event (antithrombotics, statins, hormone replacement therapy, oral contraceptives), and treatment medications after the index event (antiplatelets, anticoagulants). Index events were both strokes and TIAs. Some variables of interest were collected on a minority of databases. While these variables will not be included in the main analyses, they are potentially available for substudies. They are history of deep vein thrombosis/pulmonary embolism, Valsalva at symptom onset, history of migraine, alcohol, and neurological outcome scales.
Echocardiographic variables
The echocardiographic variables included in the RoPE database are hypermobility of the interatrial septum (yes/no), shunting across the PFO at rest, i.e. not requiring Valsalva (yes/no), and physiological shunt size (large/small).
Septal excursion was measured as the maximum distance from an estimated midline into either atrium (Bern published, German, Lausanne, Tufts) or as a total excursion between the left and right atria (APRIS, PICSS, CODICIA, French PFO/ASA, Toronto). We used the objective measurement of septal movement rather than the subjective impression of the shape of the septum to determine whether or not it was considered ‘aneurysmal’ (Fig. 1). The RoPE-defined ‘hypermobile’ septum is roughly equivalent to ‘atrial septal aneurysm’ used in the literature.

Risk of Paradoxical Embolism (RoPE)-defined septal mobility. Numbers refer to interatrial septal excursion (mm) during the cardiac cycle during transesophageal echocardiography. Mobility in the green range was defined by RoPE as ‘hypermobile’. APRIS, Aortic Plaques and Risk of Ischemic Stroke Study; CODICIA, Recurrent Stroke and Massive Right-to-Left Shunt Prospective Spanish Multicenter Study; PFO/ASA, Patent Foramen Ovale and Atrial Septal Aneurysm Study Group; PICSS, Patent Foramen Ovale in Cryptogenic Stroke Study.
Regarding shunt size, past studies have used anatomy [e.g. tunnel length (19) or separation distance of the septum primum and secundum (20)]. Within the constraints of the existing data and the TEE images available for review we concluded that a dichotomous variable (large versus small) based on the number of bubbles detected in the left atrium would be used (Fig. 2), based on a consensus that finer gradations are unreliable particularly given the effects of time-varying factors such as volume status, depth of sedation, and strength of the Valsalva.

Risk of Paradoxical Embolism (RoPE)-defined patent foramen ovale (PFO) shunt size. Numbers refer to the maximum number of observed bubbles in the left atrium ≤3 cardiac cycles after right atrial opacification during transesophageal echocardiography. Most studies had a cutoff at ≥10. In the French PFO/ASA, German, and Sapienza studies 1, 2, or 3 bubbles were considered ‘no PFO’. For RoPE: blue range, small; green, large; gray, indeterminable. APRIS, Aortic Plaques and Risk of Ischemic Stroke Study; CODICIA, Recurrent Stroke and Massive Right-to-Left Shunt Prospective Spanish Multicenter Study; PFO/ASA, Patent Foramen Ovale and Atrial Septal Aneurysm Study Group; PICSS, Patent Foramen Ovale in Cryptogenic Stroke Study.
Neuroradiology variables
The neuroradiologic variables from index event imaging are infarct seen (yes/no), infarct location (superficial, deep), infarct size (large/small), multiple acute infarcts (yes/no), chronic (prior) stroke (yes/no). Magnetic resonance imaging images took precedence over CT.
Infarct seen is an infarct in a location consistent with the clinical presentation. Restricted diffusion on acute MRI, or fluid-attenuated inversion recovery (FLAIR)/T2 hyperintensity on subacute imaging [especially with evidence of mass effect and/or T1 hypointensity but not as dark as cerebrospinal fluid (CSF)] would fit this criterion. The CT definition is an area that has decreased density compared with surrounding brain parenchyma but not as low as CSF, especially if associated with mass effect.
Superficial infarct is located in the cerebral or cerebellar cortex, or in the cortical border zones of the middle cerebral artery. Deep strokes are located in the deep white matter of the cerebral or cerebellar hemispheres, the subcortical grey matter, or the internal border zone of the middle cerebral artery.
Infarct size was collected by some studies as a continuous variable and so a large/small dichotomy can be made easily. However, other studies already dichotomized the variable using millimeters (Lausanne) or ‘lobes’ (PICSS, NOMASS, APRIS) as units of measurement. The cutoff for the large/small dichotomy is roughly 15 mm (Appendix S2). Infarcts that are small in size by this definition should not be interpreted as being due to small vessel (‘lacunar’) disease. Subjects with this infarct subtype would not have been included in these CS cohorts.
Multiple acute infarcts are ≥2 infarcts corresponding to the index event.
The prior stroke definition, adapted from the Cardiovascular Health Study (21), is a focal brain lesion seen on MRI that is hyperintense on T2 sequences, hypointense on T1 and ≥3 mm (maximum linear measurement in any direction). Surrounding FLAIR hyperintensity was seen as supportive. Prior stroke on CT was defined as a focal brain lesion ≥3 mm that is isodense with CSF or more hypodense than brain parenchyma with associated volume loss. For some subjects whose imaging could not be obtained, we reviewed clinically derived reports
(n = 93). We tested the validity of this approach by assessing whether clinical reports agreed with data derived from direct review of neuroimaging in 183 cases where both types of data were available. For ‘prior stroke’, agreement was 148/183 cases; 81%, Kappa 0·43, P = 0·0637. This compares with 15/27 cases, 56% perfect agreement for prior stroke variable (Kappa between reviewer pairs ranged from 0·27 to 0·56], when three readers read the neuroimages directly.
Outcomes
The outcome of interest is recurrent stroke: a sudden onset neurological deficit in a vascular territory presumed to be due to cerebral ischemia. If the deficit lasts less than 24 h, it must be accompanied by MRI/CT changes in an appropriate location. If the deficit is greater than 24 h, MRI/CT changes are not required. This is more consistent with the ‘tissue definition’ of TIA (22) than with the arbitrary time-based definitions. We include TIA as a potential outcome but recognize its diagnostic subjectivity and unclear clinical significance without persisting neurological deficits.
All outcomes from the component databases were re-adjudicated. There were three goals to this process. First, we applied the ‘tissue definition’ of TIA (22) and recategorized events as strokes where appropriate, or as non-events when the clinical presentation did not suggest cerebral ischemia. Second, we categorized the mechanism of recurrent ischemia to distinguish those of known cause (unlikely to be due to paradoxical embolism) from those that were again cryptogenic or not otherwise specified. And third, we categorized death as stroke if it was due to stroke recurrence. Study sites completed a CRF for each outcome in their database. A narrative was written that described the event including symptoms, pace, duration, deficit, investigations (with reports) and clinical impression of mechanism. The Outcome Adjudication Committee consisted of three vascular neurologists (ME, PM, CW) and was chaired by a fourth (DET). Each member reviewed each CRF and categorized the event as a stroke, TIA, death, or no-event and assigned an ischemic mechanism (Appendix S3). Outcomes with insufficient data to determine an ischemic mechanism were classified as ‘Stroke/TIA: not otherwise specified’. Any disagreement (i.e. an absence of unanimity) was discussed at a consensus meeting where final adjudications were made.
Acquiring new primary data
In order to create a data set that would support the development of accurate predictive models, we identified the need to obtain additional primary data from medical records, neuroradiologic scans, TEE studies, or by telephone contact of study subjects. The reasons for needing additional primary data included:
missing data in some databases for variables potentially critical to the models
inconsistent definitions across databases for variables potentially critical to the models, and
inadequate follow-up data with regard to outcome events. Institutional review board approval was obtained as required at all local sites.
Clinical data
Some variables that were either not available in the original database or incomplete were obtained at the site through review of the written CRFs or medical record. This included medications that were being taken at baseline and following the index event (French PFO/ASA).
Echocardiography and neuroradiology
Transesophageal echocardiography data unavailable in the original databases were needed for review from three sites: Bern, Lausanne, and Tufts. We determined that RoPE echo variables were incompletely ascertained for 356 PFO subjects, and we were able to obtain echocardiograms for 222 of these. Three echocardiographers were trained to use RoPE-specific definitions that maximized consistency with the other echo variables already present in the RoPE database. Interatrial septal excursion was defined as maximal excursion of the interatrial septum away from the midline within a cardiac cycle. A hypermobile septum was defined as a maximal excursion of greater than 10 mm. While the grading of intracardiac shunting was inconsistently defined across databases, a simple grading system was devised to reconcile these inconsistencies and minimize the need for rereading. This was determined by counting the ‘maximal bubbles in a single frame within three cardiac cycles after arrival of bolus in the right atrium’. Shunt grades are defined as follows: Grade I = 1–10 bubbles; Grade II = between Grades I and III; Grade III = a ‘cloud’ of bubbles. Shunt grading was done both at rest and with Valsalva when possible. All centrally reviewed echoes were read by one of three echocardiographers without double reading. Variables were extracted and recorded for inclusion into RoPE. Echocardiographers were blinded to the clinical and radiologic data for those subjects. After deciding how best to harmonize this variable with the preexisting shunt size estimations in the other databases, we determined that Grades II and III in the reread studies would be considered large. Agreement between the three readers for all echocardiographic variables was assessed on a subset of 29 echos from the French PFO/ASA study: hypermobile septum variable, Kappa 0·57; large PFO, Kappa 0·42; shunt at rest, Kappa 0·75. Similarly, agreement was also tested using 31 echoes from the PICSS study: hypermobile septum variable, Kappa 0·33; large PFO, Kappa 0·14; shunt at rest, Kappa 0·33.
The process for obtaining missing neuroradiologic data was analogous to the missing echo data. We identified 1853 subjects from five sites that needed review of the neuroimaging studies: Bern published, CODICIA, German, Lausanne, and Tufts. Of these, we were able to retrieve 1012, which were read
either locally or transferred to Tufts Medical Center for central review under the direction of DET.
Follow-up and outcomes
Outcome ascertainment protocols differ among the RoPE databases. In an effort to avoid ascertainment bias, we required one-year follow-up (defined as last follow-up in the 12th month or beyond) on over 85% of patients for all databases. For databases in which follow-up was too short, or only done as clinically indicated (rather than with scheduled visits), subjects were contacted, consented, then administered a validated, structured questionnaire to determine stroke-free status (23) (sensitivity 0·97–1·0, negative predictive value 0·96–1·0). Positive responses, suggesting recurrent stroke/TIA, were reviewed by the local neurologist by telephone, medical records, or in person if clinically indicated. This protocol was applied to the Bern, Tufts, and Lausanne databases.
Results
The RoPE database includes a total of 12 component databases – four enrolled CS subjects only with PFO and eight enrolled CS subjects with and without PFO. The PFO prevalence in these eight studies ranges from 21% to 63%. Complete database descriptions and data have been published by nine RoPE collaborating sites (2,3,7,12,14–16,18,24). The remaining three databases are either unpublished (Bern unpublished, Lausanne) or have published substudies without a complete database description (17). Two sites (Tufts, Lausanne) are continuing to enroll subjects while the rest are no longer recruiting. The prevalence of each RoPE variable is shown in Table 1.
Prevalence of each variable by database
Gray columns represent databases of CS and PFO. Otherwise databases included subjects both with and without PFO. Green cells show variables with ≥20% missingness. A dot indicates no data. PFO, patent foramen ovale; CODICIA, Recurrent Stroke and Massive Right-to-Left Shunt Prospective Spanish Multicenter Study; PFO/ASA, Patent Foramen Ovale and Atrial Septal Aneurysm Study Group; APRIS, Aortic Plaques and Risk of Ischemic Stroke Study; PICSS, Patent Foramen Ovale in Cryptogenic Stroke Study; NOMASS, Northern Manhattan Stroke Study; TIA, transient ischemic attack; TEE, transesophageal echocardiography; HRT, hormone replacement therapy; OCP, oral contraceptive pill.
The total number of subjects with CS in the RoPE database is 3674. Most of the subjects, (n = 3023) are from CS studies that included those with and without PFO. The other 651 subjects come from databases that only included those with PFO. There are 1925 subjects with and 1749 without a PFO. A small subgroup of subjects (n = 120) had a known PFO by transthoracic echocardiography or transcranial Doppler but without TEE data.
The mean age of subjects is 54·6 years. The youngest subject was 15 at the time of the index event and the oldest was 92. There is a slight excess of males. Almost all subjects were white except for US-based studies (PICSS, NOMASS, APRIS) where other races were common. Overall, 16% had a clinical history of stroke/TIA prior to their index event but 28% had evidence of prior stroke on baseline imaging.
Appendix S4 outlines the echocardiographic data that were available from the outset, obtained through primary review, and those that could not be obtained. Prevalences of PFO characteristics vary widely between studies: large PFO (19–82%), hypermobile septum (10–48%), and shunting at rest (53–92%).
Appendix S4 also outlines the neuroradiologic process. The prevalence of subjects with positive neuroimaging across the studies varies widely perhaps due to the inclusion of TIA subjects in seven of the 12 studies or the use of different imaging modalities. There was a range of MRI use from 29% (PICSS) and 45% (NOMASS) to 96% (Tufts) and 100% (Bern) in part reflecting MRI availability when the studies were conducted.
In PFO subjects, there were 370 outcomes (stroke/TIA/death) over a median follow-up of 2·2 years (interquartile range 1·0–3·6). prior to readjudication. Of the adjudicated events, 10·3% of strokes (17/165) and 4·2% of TIAs (5/119) were recategorized into ischemia of known cause and so are not cryptogenic (Table 2). The mechanisms of these 22 events were cardioembolic (7), large vessel atherosclerosis (8), small vessel disease (7), and others (4). Four deaths were recategorized as stroke. Adjudication eliminated 9% of the original outcomes as non-events. Clinical data for adjudication were not available from three studies – the etiology for all events from these studies was categorized as ‘not otherwise specified’.
Outcomes by site before and after adjudication
Outcomes in gray are not likely to be patent foramen ovale-related.
NOS, not otherwise specified. CODICIA, Recurrent Stroke and Massive Right-to-Left Shunt Prospective Spanish Multicenter Study; PFO/ASA, Patent Foramen Ovale and Atrial Septal Aneurysm Study Group; APRIS, Aortic Plaques and Risk of Ischemic Stroke Study; PICSS, Patent Foramen Ovale in Cryptogenic Stroke Study; TIA, transient ischemic attack.
Missingness was assessed for each variable in each component database. For many variables, collection was perfect (missingness = 0%). The following variables had ≤5% missingness from every database: age, gender, diabetes, hypertension, smoking, prior stroke/TIA. For patients with PFO, one-year follow-up was perfect (0% missing) for three databases (French PFO/ASA, PICSS, NOMASS), ≤ 10% for three databases (Bern, Toronto, German), and 10–15% missing for four databases (CODICIA, APRIS, Lausanne, Tufts). Thus, in those databases that will be used for modeling recurrence, the proportion of PFO subjects with ≥one-year follow-up was 93% total, (90% among all CS subjects).
Discussion
While no studies are sufficiently large on their own to support risk modeling for stratification of patients with CS and PFO, we were able to combine extant databases that will provide a substrate sufficient for the predictive models. Eight databases will contribute to the model that predicts the presence or absence of a PFO in patients with CS (CODICIA, French PFO/ASA, APRIS, PICSS, Lausanne, Sapienza, German, NOMASS). The variables that we anticipate will be used for this model, e.g. vascular risk factors and age, were collected with essentially no missingness. In turn, the predictive model can be transformed through the application of Bayes theorem to estimate the PFO-attributable fraction (5,25).
Ten databases will be used to model recurrence risk (APRIS, Bern, CODICIA, French PFO/ASA, German, Lausanne, NOMASS, PICSS, Toronto, Tufts). Together, these databases have 133 outcomes (stroke/TIA). Eleven of those 133 events were adjudicated as being of known cause. Based on the heuristic of 10 outcomes per variable (26), this should be sufficient for modeling recurrence risk. While missingness of some variables is a concern, imputation techniques will be used to retain patients in the database and avoid bias.
The Bern unpublished study will not be used for either model, as it is a PFO-only database (inappropriate for predicting presence of PFO) and did not collect sufficient outcome data (inappropriate for predicting recurrence). It will nonetheless be retained in the RoPE database, as it may contribute to other RoPE studies. During our modeling process, we will continually assess consistency of effects across databases. If there is large heterogeneity of effects, we will seek explanations in the variable definition, collection, or in the component study's inclusion and exclusion criteria. We will pay careful attention to variables with definitions that vary somewhat between databases (e.g. large vs. small stroke). These steps during modeling will serve as additional checks on data quality, such that a variable in a component database felt to correspond to a RoPE-defined variable may not in fact be useable or that a database may need to be excluded from a model based on anomalous effects. Modeling procedures will also account for clustering within databases, by including the component site as a random effect using a generalized linear (or other similar) modeling approach.
The process of combining individual patient data from multiple studies is laborious and methodologically fraught. However, because individual databases are not sufficiently large and outcomes are relatively rare, there is no single database that can provide sufficient information and statistical power to answer the clinical question of interest. Further, the effort, time, and expense required for an adequate prospective study is incrementally much more than that required to pool data from prior studies. For these reasons, investigating stroke recurrence in patients with CS and PFO seems an excellent candidate for this approach.
In addition to statistical power, combining databases provides other advantages. Predictive models developed on data from multiple sources can be checked for consistency across sources, thereby improving generalizability. As in any multicenter database, the development of a model using data acquired at different locations lessens the dependency on any particular database and thereby reduces susceptibility to the idiosyncrasies of specific data sets. Additionally, combining data sets permits models to be developed and tested on a wider range of patients than might otherwise be available, providing a better substrate to model risk heterogeneity. There are large differences in the distribution of risk factors and outcomes as can be seen by comparing patients in published studies. For example, APRIS (13) and the French PFO/ASA (2) differ greatly in mean age (70 vs. 43 years), hypertension (82% vs. 15%), diabetes (37% vs. 4%), and one-year stroke recurrence (6·1%, SE = 2·9% vs. 2·0%, SE = 0·6%).
The RoPE Study marks a large collaborative effort to pool research on CS and PFO. We have successfully merged a dozen databases of existing cohort studies of subjects with CS and known PFO status. This is the largest database of such patients with detailed clinical, neuroradiologic, and echocardiographic data. Outcome events have been adjudicated so that recurrent stroke can be modeled excluding those with a non-PFO-related mechanism. These data should be able to describe the natural history of this group of patients and inform decisions regarding CS diagnosis and secondary prevention strategies (6). This database can also provide a substrate to explore hypotheses in this population that are as yet unanticipated.
