Abstract
Purpose
The Boston naming test (BNT), as a simple, fast, and easily administered neuropsychological test, was demonstrated to be useful in detecting language function. In this study, BNT was investigated whether it could be a screening tool for early postoperative cognitive dysfunction (POCD).
Methods
This prospective observational cohort study included 132 major noncardiac surgery patients and 81 nonsurgical controls. All participants underwent a mini-mental state examination (MMSE) and BNT 1 day before and 7 days after surgery. Early POCD was assessed by reliable change index and control group results.
Results
Seven days after surgery, among 132 patients, POCD was detected in 30 (22.7%) patients (95% CI, 15.5%-30.0%) based on MMSE, and 45 (34.1%) patients (95% CI, 26.3%-41.9%) were found with postoperative language function decline based on BNT and MMSE. Agreement between the BNT spontaneous naming and MMSE total scoring was moderate (Kappa .523), and the sensitivity of BNT spontaneous naming for detecting early POCD was .767. Further analysis showed that areas under receiver operating characteristics curves (AUC) did not show statistically significant differences when BNT spontaneous naming (AUC .862) was compared with MMSE language functional subtests (AUC .889), or non-language functional subtests (AUC .933).
Conclusion
This study indicates the feasibility of implementing the BNT spontaneous naming test to screen early POCD in elderly patients after major noncardiac surgery.
Keywords
Introduction
Postoperative cognitive dysfunction (POCD) is a subtle disorder of intellectual function after full recovery of consciousness and persists far beyond the expected effects of anesthetics. 1 As a mild neurocognitive disorder of unspecified etiology, POCD is purported to encompass acute or persistent deficits in attention, concentration, language function, learning and memory after surgery.2,3 According to the international study of postoperative cognitive dysfunction (ISPOCD1), POCD was present in 25.8% of elderly patients 7 days after major noncardiac surgery, 4 usually defined as early POCD, 5 leading to remarkably decreased working capacity and increased long-term mortality. 6
Despite the high prevalence and its association with adverse outcomes, clinicians often overlook POCD. The symptoms are usually not dramatic enough to gain attention, while neuropsychological diagnostic tests require specialized training and are time-consuming. Therefore, it’s crucial to make the assessment simple, fast, and easily implemented bedside screening tool for POCD assessment and diagnosis in clinical practice. Several articles noted that specific neuropsychological tests could be useful for screening mild cognitive impairment or decline.7,8 Above all, we are looking for a simple neuropsychological test to screen POCD, which is easy to implement, with good reliability and validity.
The Boston naming test (BNT) is a widely utilized neuropsychological test that consists of a group of black and white line drawings of animate and inanimate items and is sensitive in detecting compromised lexical retrieval abilities and aphasia through visual confrontation naming. 9 The 30-item BNT used in this study, modified for the Chinese population, has been proven valid in assessing cognitive disorders.10-13 Patients in our study are in relatively severe condition, which leads to hindered reading or writing capacity BNT might be suitable for them in identifying POCD at the bedside.
The main objective of this study is to evaluate the validity and sensitivity of BNT in early POCD (7 days after surgery) identification after major noncardiac surgery in Chinese elderly patients. Besides, a further understanding of language capacity status in general POCD research is looking forward.
Methods
Participants and Design
This was a prospective observational cohort study in elderly patients conducted in XX Hospital between May 2017 and May 2019. The hospital ethics committee approved the study (approval number PJ-NBEY-KY-2017-003-01), and written informed consent was all subject before enrollment. This is a sub-study of a trial registered at the Chinese Clinical Trial Registry (chictr.org.cn) (identifier ChiCTR-ROC-17010610).
Eligible patients were ≥60 years old and scheduled to undergo elective major noncardiac surgeries, including open radical gastrointestinal, urology, or thoracic surgeries, and total hip or knee arthroplasty.
Exclusion criteria included: difficulty in comprehension (ie, inadequate Mandarin, blindness, and deafness); concomitant diseases that may lead to severe complications (ASA ≥ Ⅳ); pre-existing neurological or clinically identified cerebral disorders (ie, history of brain injury or surgery, stroke, Alzheimer’s disease, schizophrenia, and Parkinson’s disease); preoperative MMSE score indicating dementia (≤17 for illiterate, ≤20 for those with 1-6 years of education, and ≤24 for those with 7 or more years of education); surgery cancellation or approach change (ie, switching into laparoscopic or other non-major surgery approaches); and test interruption due to any causes.
Family members of patients (not limited to the enrolled patients) in the hospital were recruited as controls. These individuals met similar inclusion and exclusion criteria except surgery-related criteria. Controls who underwent unexpected surgery during the trial were excluded.
All individuals underwent neuropsychological tests at the same time points. Patients enrolled in this study received general perioperative care. Monitoring, anesthetic technique, and postoperative analgesia were prescribed at the discretion of the attending anesthesiologist.
Neuropsychological Testing
A fixed research staff trained under a neuropsychologist’s supervision was responsible for carrying out the tests. Education, medical history, and current medication were collected as demographic data before tests. All patients were tested in a quiet environment 1 day before surgery (baseline) and 7 days after surgery, without family accompanying. The MMSE and BNT were performed in order between 1 pm and 5 pm.
The Chinese version of MMSE we employed is modified from the English version of MMSE 14 and has proven reliability and validity among illiterate or minimally educated elderly Chinese.14,15 Modifications are as follows: dates are presented with both the lunar and Roman calendars due to the preference for the lunar calendar of the Chinese elderly. Orientation items and registration items were altered according to Chinese idiomatic expressions. The calculation item was modified into purchase balances due to a poor understanding of “subtraction.” The phrase in the repetition item is a Chinese tongue twister. The reading article is kept unchanged. However, illiterates often scored zero due to being unable to read the instructions. The writing item was changed into reading a sentence with the same scoring method. Naming test, three-step command and copying a figure are unchanged. In our study, besides MMSE total score (MMSE-Total), MMSE language functional subtests (MMSE-Lang, full sore = 9) and non-language functional subtests (MMSE-non-Lang, total score = 21) were studied and analyzed.
BNT is mainly focusing on measuring language ability and verbal fluency, compared with MMSE. For BNT, spontaneous naming (BNT-Spon) was to ask patients to name the presented objects, with a response time of 20 seconds. The total times of correct answers were taken as BNT-Spon scoring. If the answer was incorrect, a previously established semantic cue was provided with an additional 20 seconds period to name the object. The decision to give a semantic cue in a standard BNT test is subjective, based on the examiner’s opinion that the subject may not perceive the item properly. The counts of correct answers after providing semantic cue were recorded as BNT-Sem (n) scoring, and BNT-Sem (%) scoring was BNT-Sem (n) divided by the times providing semantic cue. If the naming was still incorrect, a selective cue was provided, which is an auditory task, and the subject needed to tell the correct item out of 3 choices provided. The number of correct answers after providing a selective cue was recorded as BNT-Selec (n) scoring, and BNT-Selec (%) scoring was BNT-Selec (n) divided by the times provided selective cue. Note that the selective cue was exclusive to the Chinese version of BNT. 10
Calculation of POCD
The reliable change index (RCI) was used to diagnose POCD and postoperative language function decline.1,4,16-19 To calculate the RCI of one test for an individual patient, the baseline score (X 1 ) from the test was subtracted from the preoperative score at 7 days (X 2 ), giving ΔX giving for each individual patient for each neuropsychological test. The same was done in the controls, giving ΔXc. The mean change of ΔXc on that test was then subtracted from ΔX to eliminate practice effects. To create a Z-score for the test, this result was then divided by the SD of ΔXc, to eliminate the effect of natural variation in the test performance. 18 POCD in an individual patient is defined as Z-score equal to or less than −1.96 in MMSE-Total. In the same way, postoperative language function and other cognitive function decline were defined as Z-scores equal to or less than −1.96 in the corresponding tests.
Data Analysis
PASS 11.0 (NCSS, LLC. Kaysville, Utah, USA) and “tests for paired sensitivities” were used to calculate sample size. 20 Based on previous studies of POCD incidence on post-surgery day 7 in elderly patients going through noncardiac surgery,4,16,17,19 we set the prevalence value at .3. Test 1 for MMSE was proposed as a POCD diagnosis tool and set the sensitivity to .883. 21 Test 2 was to explore the effectiveness of BNT on POCD screening, and we set the sensitivity to .7. The proportion discordant value was set as .2. A sample size of 153 patients was thus generated (power of 90%, and α = .05, two-tailed) and further settled at 170 considering a 10% of drop-off rate. The control size was made large enough to reduce variability and allow matching.
Group comparisons were made using an independent t-test for continuous variables, Mann-Whitney U test for ranked data, and chi-square or Fisher exact test for dichotomous data. The type I error rate was controlled using the Holm-Bonferroni step-down procedure or Benjamini-Hochberg procedure 22 for multiple comparisons.
The proportions of specific positive and negative agreement among BNT and MMSE and their subtests were analyzed using an unweighted Cohen’s Kappa with 95% CI. The positive agreement index estimated the conditional probability of a positive diagnosis on one test given a positive diagnosis on the other; likewise, the negative agreement for a negative diagnosis. In the present study, the positive diagnosis means the test’s Z-score ≤ −1.96, while the negative diagnosis means the test’s Z-score > −1.96. These indices are analogous to sensitivity and specificity in the presence of a gold standard classification.
To further compare the predictive value of BNT and MMSE’s subtests for POCD, each test’s score difference (day 7 score - baseline score) was calculated. Afterward, receiver operating characteristics (ROC) curves, including the correspondent areas under the curve (AUC), were calculated and compared using MedCalc 19 (MedCalc Software bvba, Ostend, Belgium). Other analyses were conducted by IBM SPSS Statistics 20 (IBM Corp, Zurich, Switzerland). All hypothesis tests were two-tailed. A P-value of less than .05 was taken to indicate statistical significance.
Results
Demographic and Preoperative Medical Data
473 patients were screened that scheduled for elective major noncardiac surgery. Patient flows through the trial are summarized in Figure 1. In the meantime, we recruited 97 family members of the patients as controls. Three controls were excluded due to baseline dementia (according to baseline MMSE scoring), 2 due to surgery during the study, and 11 did not show up the second time for assessment, leaving 81 controls for Z-score calculating. Study flow. Note: MMSE, mini-mental state examination.
Patient and Control Demographics and Medical History at Baseline.
Note: Continuous variables are presented as mean (SD) except education reported as mean (interquartile range), and categorical variables as number (%).
Abbreviation: ASA, American Society of Anesthesiologists.
aAdjusted P-values according to the Benjamini-Hochberg procedure for multiple testing correction based on controlling the false discovery rate.
Neuropsychological Test Scores at Baseline
MMSE and BNT Score of Patients and Controls at Baseline.
Note: Continuous data are presented as mean (SD) or median (interquartile range).
Abbreviations: MMSE, mini-mental state examination; BNT, Boston naming test; MMSE-Total, the MMSE total scoring; MMSE-Name, the MMSE naming subtest scoring; MMSE-Lang, the MMSE language functional subtests scoring including MMSE-Name; BNT-Spon, the BNT spontaneous naming scoring; BNT-Sem (n), the correct counts of BNT after providing semantic cue; BNT-Sem (%), BNT-Sem (n) divided by the times providing semantic cue; BNT-Selec (n), the correct counts of BNT after providing selective cue; BNT-Selec (%), BNT-Selec (n) divided by the times providing selective cue.
aAdjusted P-values according to the Benjamini-Hochberg procedure for multiple testing correction.
Effects of Anesthesia and Surgery on Neuropsychological Test Scores
Change Scores of Neuropsychological Assessment at 7 Days in Surgery Group.
Note: Continuous data are presented as mean (SD) or median (interquartile range).
Abbreviations: MMSE, mini-mental state examination; BNT, Boston naming test; MMSE-Total, the MMSE total scoring; MMSE-Name, the MMSE naming subtest scoring; MMSE-Lang, the MMSE language functional subtests scoring including MMSE-Name; MMSE-non-Lang, the MMSE non-language functional subtests scoring; BNT-Spon, the BNT spontaneous naming scoring; BNT-Sem (n), the correct counts of BNT after providing semantic cue; BNT-Sem (%), BNT-Sem (n) divided by the times providing semantic cue; BNT-Selec (n), the correct counts of BNT after providing selective cue; BNT-Selec (%), BNT-Selec (n) divided by the times providing selective cue.
aAdjusted P-values according to the Benjamini-Hochberg procedure for multiple testing correction.
The MMSE-Lang and MMSE-non-Lang scores of POCD patients were significantly lower, but the MMSE-Name scores showed no significance in the 2 groups. In terms of BNT, POCD patients were significantly lower in BNT-Spon and BNT-Selec (%) scores, while there was no difference in the rest of the BNT scores.
According to Z-score, the incidences of postoperative language function decline, indicated by different tests, are shown in Figure 2. The incidence of postoperative language function decline diagnosed by BNT-Spon was significantly higher than that of MMSE-Lang. Comparison of incidence of postoperative language function declines with different diagnostic methods. Note: MMSE, mini-mental state examination; BNT, Boston naming test; MMSE-Lang, the MMSE language functional subtests scoring; BNT-Spon, the BNT spontaneous naming scoring. The significant level was adjusted to .017 according to Bonferroni correction.
Comparisons of the BNT and MMSE
Comparison of BNT Spontaneous Naming and MMSE, and Its Subtests.
Note: Data are presented as number or ratio (95% CI).
+ indicates positive result with Z-score ≤ −1.96; −indicates negative result with Z-score > -1.96.
Abbreviations: MMSE, mini-mental state examination; BNT, Boston naming test; PA, index of positive agreement between the 2 tests, analogous to sensitivity; NA, index of negative agreement between the 2 tests, analogous to specificity; Total, the MMSE total scoring; Lang, the MMSE language functional subtests scoring; non-Lang, the MMSE non-language functional subtests scoring; Spon, the BNT spontaneous naming scoring.
Pearson correlation analysis showed that the change scores of MMSE-Total and BNT-Spon were significantly correlated (Pearson’s r = .923, P < .001). ROC curve analysis was used to evaluate further and compare each subtest change score for predicting POCD (Figure 3). The AUC of BNT-Selec (%) change scores was significantly smaller than that of the other 3 subtests’ change scores (All, P < .001). There was a significant difference between MMSE-non-Lang and BNT-Total change scores in AUC (z statistic = 2.178, P = .0294). Receiver operating characteristic (ROC) curves for change scores of BNT and MMSE subtests. Note: MMSE, mini-mental state examination; BNT, Boston naming test; MMSE-Lang, the MMSE language functional subtests scoring; MMSE-non-Lang, the MMSE non-language functional subtests scoring; BNT-Spon, the BNT spontaneous naming scoring; BNT-Total, the sum of BNT-Spon scoring and BNT semantic cued correct counts; BNT-Selec (%), the BNT selective cued correct counts divided by the times providing selective cue.
Discussion
Ever since the ISPOCD1 study, 4 researchers reached a consensus that neuropsychological test battery and reliable change index-based Z-score are suitable for POCD assessment and diagnosis.1,2,18 POCD in an individual was usually defined when the Z-score was equal to or less than −1.96 at least in 2 different tests, and the combined Z-score equal to or less than −1.96.16,17,19 Most conclusions in this study are also based on Z-score. Based on MMSE scoring, we found that POCD incidence reached 22.7% (95% CI, 15.5%-30.0%). Considering MMSE and BNT are known to be influenced by age, gender, and education,9,23 the controls enrolled were also matched to the surgical group on the above indicators. Consequently, there was no apparent difference between the patients and controls for demographic characteristics and neuropsychological testing baseline values in the majority of parameters.
Based on MMSE-Lang and BNT-Spon scoring, the incidence of postoperative language function decline on post-surgery day 7 was 34.1% (95% CI, 26.3%-41.9%) in patients going through major noncardiac surgery. This value was similar to that diagnosed by BNT-Spon alone but significantly higher than that diagnosed by MMSE-Lang alone, indicating to a certain extent that MMSE is less sensitive than BNT-Spon in predicting postoperative language function decline. Note that the incidence of 34.1% was diagnosed by BNT-Spon or MMSE-Lang positive; the incidence would fall to 6.8% (95% CI, 2.5%-11.2%) if the diagnosis were defined as both tests positive.
As revealed in our study, BNT-Spon and BNT-Selec (%) were significantly lower in POCD patients. Considering BNT-Selec (%) is exclusively in the Chinese version, and the ROC analysis showed its poor predictive value for POCD (AUC = .615, 95% CI, .497-.733, P = .056), we chose BNT-Spon as the main indicator of BNT, consistent with other studies in indicator setting.24,25 The agreement between BNT-Spon and MMSE total scores was moderate (Kappa = .523), indicating that BNT-Spon can be used as a simple screening tool for early POCD. Moreover, there was a fair (Kappa = .303) reliability between BNT-Spon and MMSE-non-Lang on diagnosis results, suggesting a correlation between the 2 tests.
Some works of the literature suggested that tests composed of various cognitive dimensional subtests had quite a good construct validity. 26 Our unpublished data also suggest that BNT significantly correlated with Benton Visual Retention Test and Symbol Digit Modalities Test. In this study, ROC curve analysis found that BNT-Spon held similar sensitivity and specificity compared with MMSE-Lang and MMSE-non-Lang in diagnosing POCD. Since some researchers also use BNT-total (the subtotal score of BNT spontaneous naming and semantic cued naming) as the main indicator,23,27 further analysis found that compared with BNT-Spon, BNT-total and MMSE-non-Lang do not have good consistency, indicating BNT-total is more specific for language function assessment. All the preoperative data collected in this study (n = 264) were analyzed. BNT-Spon and MMSE total score (Pearson’s r = .576, P < .001), MMSE-Lang (Pearson’s r = .576, P < .001) and MMSE-non-Lang (Pearson’s r = .467, P < .001) were found with significant correlations.
MMSE, the most recognized clinical cognitive screening scale, is often used as a “gold standard” to validate novel or improved neuropsychological tests.28,29 This study included positive patients diagnosed by MMSE-Lang and MMSE-non-Lang in POCD patients diagnosed by MMSE-Total. The study found that 13 cases with no language function decline were identified as POCD, while only 2 cases were not detected by BNT-Spon. All indicators show that MMSE has limited sensitivity and specificity in diagnosing POCD. Above all, combined neuropsychological tests, such as BNT with MMSE, are an effective cognitive evaluation strategy.
BNT does not rely on surgery type and has a local version23,30,31 suitable for the lower education level of the elderly group,23,27,28 minimizing language interference. 31 However, we found the Chinese version 30-item BNT also has certain defects. For non-dementia subjects, the correct answer rate of naming items such as “tree” and “pencil” is as high as 100%, while cultural differences made “harp” and “icehouse” much harder to name, and the correct rate is under 10%. Therefore, there is room for optimization for the Chinese version of BNT. Additionally, some other answers were answered right by chance. The method to detect this kind of false is needed too. Perhaps adding some other test content can reduce the probability of not being detected. Future studies were needed to verify these hypotheses.
As is known, BNT is more suitable for specialized language function assessment than MMSE. However, the results of this study preliminarily confirm that BNT and MMSE have good consistency in the evaluation of comprehensive cognitive function. The reason may not be due to BNT’s excellent reliability and validity in overall cognitive assessment but to the limitations of BNT in assessing mild cognitive impairment such as POCD. Therefore, it can be speculated that single cognitive dimension neuropsychological tests such as auditory verbal learning tests, trail-making tests, symbol digit modalities tests, etc., have similar effects in POCD clinical study, but further clinical research is needed to confirm.
According to the recommendations, early postoperative cognitive impairment should be classified as “delayed neurocognitive recovery within 30 days after surgery.” 1 . However, the time point we chose to assess cognition was on post-surgery day 7, consistent with previous existing clinical studies of POCD. For the relative long-term (ie, 3 months after surgery) cognitive function, the screening effect of BNT needs further study.
Conclusion
Many consensus meetings have proposed using neuropsychological test batteries for POCD assessment and diagnosis. 1 However, the universal practical application of those tests faces obstacles due to culture and language factors, especially in non-English-speaking regions and subjects are less educated elderly population, who are also at a high risk of POCD. Therefore, developing a set of neuropsychological tests that can be globally promoted and suitable for the low-education population is crucial. BNT may fit in since our study verified BNT for early POCD screening, but further research is needed.
Footnotes
Author’ Contributions
Xiaojie Zhai designed the study. Bo Meng, Xiaoyu Li, and Ruichun Wang recruited participants and executed study procedures. Bo Lu and Bo Meng analyzed the data. Zhang Chen and Xiaojie Zhai wrote the paper under the supervision of Bo Meng and Junping Chen. All authors critically reviewed the paper.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Natural Science Foundation of Zhejiang Provincial China (No. LQ21H090004), Zhejiang Provincial Public Service and Application Research Foundation, China (No. LGF22H090012), Ningbo Science and Technology Service Technology Foundation, China (No. 2020F040), and Medical Scientific Research Foundation of Zhejiang Province, China (No. 2020PY023)
