Abstract
Purpose
To evaluate accuracy and inter-rater reliability of RetCam fundus images and digital camera fluorangioscopic images in acute retinopathy of prematurity (ROP) by comparing diagnoses given by trainee ophthalmologists with those provided by expert ophthalmologists.
Methods
This is a multicenter retrospective observational study of diagnostic data from 48 eyes of 24 premature infants with classical ROP, stage II, as evaluated by RetCam 3 and fluorescein angiography (FA). Average gestational age was 25.4 weeks, average weight 804.7 g. A staging grid (with ocular fundus divided into 3 concentric zones) and 24 15° sectors centered around the optic papilla were superimposed on 360° retina photomontages (Photoshop) made from RetCam and FA images. Non expert vs expert diagnosis agreement was measured for each sector by means of Cohen kappa (Fleiss, 1981).
Results
A high degree of concordance was found. Inter-rater agreement between expert and non expert interpretations of retinal photomontages was greater for fluorangiographic images than for RetCam images, with κ = 0.61–1 for 120/152 (78.9%) sectors examined on the RetCam images and κ = 0.61–1 for 168/198 (84.8%) sectors examined on the FA images.
Conclusions
The FA images appear to be easier to interpret than RetCam images, both by expert and non expert ophthalmologists. The results confirm that FA is a good examination technique with a high degree of reliability, even where trainee practitioners are involved. This suggests that retinopathy management can be improved by entrusting diagnostic responsibilities to trainee ophthalmologists, in order to extend access to correct diagnosis, recognition of threshold lesions, and prompt treatment.
Introduction
Retinopathy of prematurity (ROP) is one of the prime causes of blindness in infancy (1), in spite of the progress that has been made in neonatal care for preterm babies. Reports on ROP in industrialized countries are conflicting: while some studies show a decrease in incidence and gravity, others suggest an increase in surgical interventions for ROP (2, 3). Typical in preterm babies and multifactorial in nature, ROP is a vasoproliferative disorder secondary to incomplete and immature retinal vascularization. The classic form of ROP develops in recognizable stages of increasing severity, and although it eventually regresses spontaneously, detachment of the retina can occur in the most serious cases and where treatment is not promptly undertaken. Our preliminary study focuses on the classical form of ROP, even though a new form of retinopathy, aggressive posterior ROP, has emerged in recent years. This form affects extremely preterm infants (gestational age [GA] <26 weeks, weight <700 g) (4, 5) and does not develop in the stages recognized for classical ROP. It is difficult to manage and has a poor prognosis even when treated.
Although many risk factors have been identified over the years, 3 have been found to be most closely linked to the disease: low GA, low birth-weight, and excess or uncontrolled oxygen therapy. The current treatment of choice for ROP is laser photocoagulation, now preferred over cryotherapy because it is more manageable, more effective, and has fewer undesirable side effects. Cryotherapy is still used in some specific cases, however.
The only way to catch the early signs of ROP and to tell whether it is progressing towards threshold disease or regressing is by good observational procedure. Studies and new diagnostic techniques introduced over the years have led to the recognition of the stages in which the classical, non-aggressive form of the disease progresses.
RetCam is a digital imaging device designed specifically to visualize the retina of premature infants, and has a very high level of sensitivity and specificity in identifying the different stages of ROP. It is also equipped to conduct fluorescein angiography (FA) and this combination of techniques has proven to be indispensable in ROP management. Fluorescein angiography allows better classification of the disease and optimal treatment timing.
The aims of our study were to evaluate accuracy and inter-rater reliability of RetCam fundus images and digital camera fluorangioscopic images in acute ROP by comparing diagnosis given by trainee ophthalmologists with diagnosis provided by expert ophthalmologists.
To our knowledge, this is the first study statistically comparing expert versus non expert evaluation of FA images.
Materials and Methods
We conducted a multicenter retrospective observational study of diagnostic data from 24 premature infants (48 eyes), 10 girls and 14 boys, with stage II classical ROP. Of the babies included in the study, 17 (34 eyes) were patients at the Neonatal Pathology Department at San Matteo Hospital in Pavia (IRCCS San Matteo di Pavia) between May 2009 and December 2011, and 7 (14 eyes) at the Maria Vittoria Hospital in Turin in the same period. Average GA was 25.4 weeks (SD 1.7424); average weight was 804.7 g (SD 223.189).
Diagnosis was made by indirect binocular ophthalmoscopy (6). Diagnostic imaging was then carried out with RetCam 3 (Clarity, Pleasanton, California, USA) and digital video FA. RetCam fundus images were obtained and video-digital FA was performed. A 10% solution of fluorescein was intravenously administered as a bolus at a dose of 0.1 mL/Kg. Because RetCam 3 cannot simultaneously visualize all the sectors of the retina, the retinal photographs and the FA images obtained were assembled into 360° montages of the retina (Photoshop).
The RetCam funduscopic images and the FA images were analyzed by 2 categories of observer: 2 expert ophthalmologists with experience in ROP management and 3 trainee ophthalmologists doing an internship in the pediatric ophthalmology clinic.
In order to standardize analysis and to obtain comparable data for the pictures examined, these fundus and FA images were superimposed on a grid dividing the fundus into 3 concentric zones (Fig. 1) in accordance with International Classification of Retinopathy of Prematurity 1984–1987 and into 24 sectors of 15° (Fig. 2) centered around the optic papilla. Of the retinal images and FAs gathered (Figs. 3–6), the 350 sectors with the best contrast, brightness, and extent were used for statistical analysis.

Retina divided into 3 concentric zones centered around the optic papilla (International Classification of Retinopathy of Prematurity 1984–1987).

Retina divided into 24 sectors of 15° centered around the optic papilla.
A scoring system was devised where numbers indicated the presence or absence or number of the diagnostic signs taken into consideration. Scores for single signs were summed to obtain a total score for each sector.
On the RetCam images, posterior plus disease, peripheral plus disease, stage, and vascular anomalies were evaluated as follows: posterior plus disease: present = 1, not present = 0; number of vascular anomalies; peripheral plus disease: present = 1, not present = 0; localization/extent of lesions: peripheral = 1, posterior = 2.
On the fluorangiographic images, leakage, ischemic areas, peripheral plus disease, and vascular anomalies were evaluated as follows: leakage: absent = 0, present = 1; ischemic areas posterior to the ridge: absent or of limited extension = 0, present = 1; peripheral plus disease: present = 1, not present = 0; number of popcorn and tangle anomalies (visible at FA due to leakage); localization/extent of lesions: peripheral = 1, posterior = 2.
Scores were attributed to sectors starting from midday and working in a clockwise direction.
A coefficient of agreement was obtained by comparing observer evaluations for each sector by means of Cohen kappa (Fleiss, 1981), which is designed to measure both interobserver variation, as in this case, or intraobserver variation. Cohen kappa (7) takes into account chance agreement, in this case largely due to unequal sized groups. All analyses were carried out with the STATA package (StataCorp, 2011, Stata Statistical Software: release 11.0, College Station, Texas, USA).
Our study has been edited to conform to STARD guidelines (8).

Retinopathy of prematurity fundus image acquired by RetCam 3 and by fluorescein angiography of the same eye.

Retinopathy of prematurity fundus image acquired by RetCam 3 and by fluorescein angiography of the same eye.
Results
Overall, complete agreement (κ = 1) between the opinions of expert vs trainee ophthalmologists was found for almost 22% (76/350) of the fundus sectors examined (Tab. I). Agreement was complete or very good (between 0.81 and 1) for almost 50% (157/350) of expert and trainee opinions. On the whole, agreement was higher for FA images than for RetCam photographs (9, 10).

Retinopathy of prematurity fundus image acquired by RetCam 3 and by fluorescein angiography of the same eye.

Retinopathy of prematurity fundus image acquired by RetCam 3 and by fluorescein angiography of the same eye.
Summing RetCam and FA evaluations, agreement between experts and trainees fell into the good–complete (0.61–1) categories for 298/350 sectors, i.e., for 85% of the sectors evaluated (11). The highest percentage of evaluations fell into the good agreement category (141 sectors), with only 10 sectors (2.8%) falling into the modest or poor categories. Table I shows agreement by fundus sector, comparing RetCam (right eye and left eye) with FA. Agreement levels were calculated using the overall score for each sector. The results are shown in more detail in Figures 7–10.

Percentage of agreement expressed by Cohen kappa in the analysis of RetCam images of the right eye.

Percentage of agreement expressed by Cohen kappa in the analy sis of RetCam images of the left eye.

Percentage of agreement expressed by Cohen kappa in the analysis of fluorescein angiography images of the right eye.

Percentage of agreement expressed by Cohen kappa in the analysis of fluorescein angiography images of the left eye.
Cohen Kappa Of Agreement Between Expert And Non Expert Ophthalmologists
FA = fluorescein angiography.
For FA image left eye sectors, agreement was complete for 42.2% of evaluations, and very good for 23.3% of evaluations. It was good for 27.7% of evaluations and moderate (0.41≤κ<0.61) for 7%.
The degree of agreement between expert and trainee evaluations of RE RetCam images, although less remarkable than for FA images, was still very good.
For RetCam right eye sectors, agreement between experts and trainees was complete for 9.2% of sectors, very good for 38.1%, good for 39.4%, moderate for 9.2%, and fair (0.2≤κ<0.41) for 3.9%. Kappa was never below 0.2, however. For RetCam left eye sectors, complete agreement between expert and trainee evaluations was found for 23.7% of sectors, good agreement (0.61<κ<0.8) for 39.5%, and moderate agreement (0.41<κ<0.60) for 19.7%. Left eye RetCam sectors had the highest percentage of modest (3.9%) and poor (5.2 %) levels of agreement.
Discussion
The diagnosis and recognition of stages is fundamental because in ROP, treatment has to be prompt if it is to be effective and this is only possible where suitable diagnostic means are available to detect the earliest lesions of retinopathy and monitor their development. Until the introduction of the RetCam, the semiologic characteristics of ROP were delimited by what could be observed by indirect binocular ophthalmoscopy. Our study was inspired by a similar one (11) carried out at Columbia University in 2011, in which expert and non expert evaluations of RetCam images were compared (12), although we have included evaluations of FA images in our comparison and we measured expert-trainee agreement with Cohen kappa. Our study found good agreement between trainee and expert ophthalmologists on the evaluation of RetCam and FA charts. Agreement was particularly high for FA images, confirming that fluorangiography is an excellent diagnostic examination for ROP, almost intuitively accessible even to trained non experts. The FA images seem to be the most easily interpretable ones, both by expert and trainee ophthalmologists, and even non experts are able to detect lesions that slip through ophthalmoscopic/funduscopic examinations, but that would allow better disease staging and timely treatment (13).
As in the case of the Columbia University study, ours has limitations, in particular the small number of expert and trainee ophthalmologists who participated. In addition to this, the quality of the RetCam and FA images, which depends on the expertise of the imaging technician, was not very good and not enough sectors were suitable for use in the study.
As far as we know, this is the first study to statistically compare expert and non expert evaluation of FA images. Our results suggest that FA images provide an accurate and reliable basis for diagnosis. In order to get a better picture of the sensitivity and specificity of the examination, our study will be continued to include more cases of ROP and further statistical analysis will be carried out at a later stage (14).
Involving non expert ophthalmologists in the diagnosis and management of ROP would allow more timely recognition of threshold lesions and lead to better timing of treatment in cases where lack of expert personnel can lead to critical delays in treatment (15).
Footnotes
Acknowledgement
The authors thank Dr. Claire Archibald for translation.
