Abstract
In spite of advances in neuroimaging and other brain biomarkers to assess preclinical Alzheimer’s disease (AD), cognitive assessment has relied on traditional memory paradigms developed well over six decades ago. This has led to a growing concern about their effectiveness in the early diagnosis of AD which is essential to develop preventive and early targeted interventions before the occurrence of multisystem brain degeneration. We describe the development of novel tests that are more cognitively challenging, minimize variability in learning strategies, enhance initial acquisition and retrieval using cues, and exploit vulnerabilities in persons with incipient AD such as the susceptibility to proactive semantic interference, and failure to recover from proactive semantic interference. The advantages of various novel memory assessment paradigms are examined as well as how they compare with traditional neuropsychological assessments of memory. Finally, future directions for the development of more effective assessment paradigms are suggested.
Despite the advances in the identification of biological markers related to Alzheimer’s disease (AD) that are commonly used to assess for the presence of brain pathology early in the disease course, traditional neuropsychological assessment of memory disorders has remained largely unchanged for six decades or more (Brooks & Loewenstein, 2010). Most commonly employed memory assessments have focused on list-learning paradigms that examine different aspects of memory. This includes but is not limited to the storage and consolidation of to-be-remembered information, contrasting immediate with delayed recall, and recognition of target stimuli. Other memory paradigms have assessed immediate and delayed memory for story passages, paired associate learning, and retention of simple and more complex geometric designs. While these procedures have proven valuable in the assessment of conditions such as traumatic brain injury, cerebrovascular impairment, and dementia, it has become apparent that they are largely insensitive in capturing the early prodromal or preclinical stages of AD and other neurodegenerative disorders (Rentz et al., 2013; Sperling, 2007). This is increasingly problematic for several reasons. First, recent discovery of AD biomarkers such as abnormal amyloid and tau deposition visible on positron emission tomography (PET) imaging and abnormal levels of Aβ 42, tau and phosphorylated tau in the cerebral spinal fluid (CSF), have been observed in cognitively normal (CN) individuals who have normal scores on initial traditional memory and other neuropsychological testing (Rentz et al., 2013; Pettigrewet al., 2015). The accumulation of abnormal brain amyloid in the precuneus, posterior cingulate, anterior cingulate and frontal, temporal, and parietal cortical regions may indicate the presence of early formation of fibrillar plaques in CN individuals 20 years or more before the emergence of cognitive symptoms on traditional neuropsychological measures and constitute a risk factor for the later development of AD (Chételat et al., 2013; Vlassenko, Benzinger, & Morris, 2012).
In addition, targeted therapeutic interventions and emerging therapies for early AD are much more likely to be effective when employed in the earliest stages of disease before the widespread multisystem brain deterioration has occurred even in the mild cognitive impairment (MCI) stage of illness (Brooks & Loewenstein, 2010). Finally, there is increasing recognition that therapies will only be recognized as effective when they are associated with changes in clinically meaningful end points (whether cognitive or functional; Vellas et al., 2015) necessitating measures that are sensitive to the earliest stages of pathology. Thus, the identification and development of cognitive measures that are (a) sensitive to detecting early disease states and (b) converge with biological markers of AD pathology, have become ever more necessary in terms of identifying individuals at risk, monitoring disease progression, and ascertaining treatment efficacy (Edmonds et al., 2015).
Typical neuropsychological measures which will be described below are traditionally administered in optimal conditions such as a quiet environment that minimizes any potential distractors. This is at odds with demands in the real-world environment in which persons are forced to allocate attentional resources, multitask, and deal with a welter of competing stimuli. An example of the disconnection between the results of cognitive testing and real-world function can be seen in different aspects of clinical practice. For instance, a brain injured individual may successfully navigate a number of neuropsychological tasks tapping executive function but then “falls apart” when trying to function in the real-world environment. As an example, a worker, employed as a receptionist often has to switch back and forth between the demands of persons at the front desk, take information over the telephone and hand the physician the chart for the next patient to be seen. With all of these competing demands the receptionist may forget an urgent telephone call for the physician which had been placed on hold. Accordingly, it has been observed that in the “optimal” testing environment associated with traditional neuropsychological tasks, that a number of persons are able to employ cognitive reserve and individualized compensatory strategies to mask actual underlying neuropsychological deficits (Stern, 2009).
Another issue with traditional memory measures are that they are often marked by considerable individual variability (e.g., cognitive reserve, individualized learning, and retrieval strategies as well as motivational levels). These and other aspects of traditional neuropsychological paradigms often result in modest sensitivity to preclinical disease states and to the large observed variability in the performance of older adults. This often results in a low signal-to-noise ratio, making it exceedingly difficult to assess the earliest stages of cognitive deficits, and to track changes over time (Brooks & Loewenstein, 2010; Vellas et al., 2015).
Recently, there has been focus on composite cognitive scores as a primary clinical outcomes given pressures to maximize the information gleaned from a plethora of cognitive measures that have been long part of Alzheimer’s clinical trials (Donohue et al., 2014; Lim et al., 2016). While potentially valuable, these measures are part of traditional neuropsychological paradigms, many of which are decades old and largely insensitive to subtle changes in memory. As reviewed below, there has also been increasing interest in computerizing neuropsychological tests which may aid in ease of administration and increased portability. However, many of these measures tap recognition memory rather than free or cued recall. More important, these paradigms are associated with many of the paradigmatic difficulties associated with traditional paper-and-pencil cognitive tests.
It would be desirable to develop cognitive paradigms that are not as susceptible to individual variability in learning strategies, compensatory mechanisms, and which are sensitive to the earliest behavioral manifestations of brain impairment. Such measures would be designed to specifically stress the cognitive system and minimize the successful use of individualized compensatory mechanisms that might mask subtle memory or other cognitive deficits. This is analogous to an exercise electrocardiogram which is often much more effective than a resting state electrocardiogram for detection of underlying cardiac deficits that are only identified when stress is applied to the system.
If such “cognitive stress tests” were developed to identify cognitive deficits resulting from the earliest identifiable brain pathology in AD, such as the deposition of beta amyloid or abnormal phosphorylated tau (Loewenstein et al., 2015; Papp et al., 2015), these measures could then serve as both highly powerful cognitive markers and in turn, clinically significant end points. Furthermore, if these measures were strongly linked to beta amyloid and tau deposition in the neocortex in AD, this could have tremendous utility in avoiding the expense and burden of amyloid PET scans or CSF studies.
A Review of Traditional Memory Paradigms Used in the Evaluation of Neurodegenerative Brain Disorders
An obvious advantage of traditional memory paradigms, such as list learning or paired associate learning tasks, over other memory paradigms is that to-be-remembered targets can be recalled over repeated trials which is sensitive to both learning and retrieval deficits and encourages maximum storage and consolidation of information that can be compared with measures of delayed recall. This is not the case with memory for story passages and visual reproduction that are often based on one-trial learning and retrieval that may be particularly sensitive to deficits in attention. In addition to assessing an individual’s learning curve, list-learning and paired associate tests afford the opportunity to distinguish between storage and retrieval deficits through immediate versus delayed recall and recognition memory measures (Schneider, Boyle, Arvanitakis, Bienias, & Bennett, 2007). While delayed recall and rate of forgetting were previously considered the hallmark cognitive features of medial temporal lobe dysfunction in AD, it has become increasingly recognized that deficits in initial learning may be as sensitive as or more sensitive in the identification of MCI (Greenaway et al., 2006; Loewenstein et al., 2003).
Examples of widely used list-learning measures include the Rey Auditory Verbal Learning Test (RAVLT; Schmidt, 1996), the Hopkins Verbal Learning Test–Revised (HVLT; Brandt & Benedict, 2001), Buschke Selective Reminding Test (Buschke & Fuld, 1974), California Verbal Learning Test–Second edition (Delis, Kramer, Kaplan, & Ober, 2000), Brief Visual Memory Test–Revised (Benedict, 1997); Consortium to Establish a Registry for Alzheimer’s Disease List-Learning Test (Morris et al., 1989).
One of the major limitations of these traditional memory measures is the lack of controlled learning which allows a participant to employ individualized strategies, and to rehearse and organize to-be-remembered information. Considerable variability in attentional resources and learning styles may have a significant impact on memory performance and the ability of a test to capture underlying cognitive deficits (Buschke et al., 1999; Buschke, 2014; Salmon & Bondi, 2009).
In contrast, controlled learning paradigms provide a format within which the to-be-learned information is organized. For example, a specific category cue could be provided, such as the semantic superordinate category “fruits,” among several other targets, with the goal of increasing the depth of processing so as to establish the basis of encoding specificity (Thomson & Tulving, 1970). This same cue can subsequently be used to elicit a correct response during recall. Indeed, it has been established that AD patients have deficiencies is the inability to use proper category cues (Adam et al., 2007; Grober & Buschke, 1987, Grober, Buschke, Crystal, Bang, & Dresner, 1988). Controlled learning minimizes the uses of individualized learning strategies, insures proper encoding of the to-be-remembered material and allows the use of retrieval-specific cues to access memory for what was learned.
Vulnerability to Proactive and Retroactive Semantic Interference
In addition to deficits in the ability to use semantic cues, persons with preclinical AD may be especially susceptible to semantic interference, which is defined as difficulty with managing competing representations of targets within a semantic category (Loewenstein et al., 2003; Loewenstein et al., 2004). For example, individuals provided with a list of vegetables, over repeated learning trials, may exhibit proactive semantic interference (PSI) when they are asked to remember a new target list of vegetables. PSI occurs when old semantic learning interferes with the learning of new semantic targets. On the other hand, recall of the original targets might be affected by retroactive semantic interference (RSI) resulting in deficits learning a new target list of semantically related targets. While existing measures may include competing to-be-remembered lists, (e.g., California Verbal Learning Test–Second edition, RAVLT), there is an insufficient number of shared to-be-remembered targets belonging to the same semantic category to adequately identify PSI and RSI effects. For these traditional list-learning measures, controlled learning is not emphasized nor are the effects of semantic interference optimized (Brooks & Loewenstein, 2010). In addition, traditional measures do not have multiple trials of the second semantically related list to examine issues with recovery from proactive interference.
Another advantage of novel semantic interference paradigms is that a person’s vulnerability to proactive and retroactive interference and more important, the ability to recover from PSI be can referenced to the strength of their initial learning and memory. Thus, performance is on a particular interference trial is not only related to age or educationally related normative groups, (i.e., 1.5 to 2.0 SD below a specific normative value) but can be directly compared with initial acquisition and retrieval. This can enhance sensitivity to detect preclinical conditions and very mild deficits (Buschke, 2014). More important, this practice tests memory performance referenced against an individual’s own performance and initial memory capacity. Indeed, it is possible to optimize testing methods to tap the vulnerably in memory performance characteristic of early AD. Such methods include (a) increasing encoding specificity and depth of processing at baseline using the same category cues at both acquisition and retrieval so as to maximize storage and (b) employing the use of cues to bring out the difficulties with binding targets to cues and with semantic interference when distractor targets are presented. Optimizing encoding specificity provides a measure of maximum learning capacity, which may be more useful than unstructured free-recall measures (see Buschke, 2014).
Memory Binding
It has long been recognized that binding of associations (name–face) and other associative memory is impaired in conditions such as AD. With regard to list-learning tasks, memory binding refers to associative binding of targets on multiple lists through the use of a common semantic cue. The lack of such memory binding may reflect an early sign of presymptomatic memory impairment (Buschke 2014; Parra et al., 2010). Buschke’s Memory Capacity Test also known as the Memory Binding Test (MBT; see Frey et al., 2009) involves the learning of an initial list of 16 targets, which was associated with a distinct category cue at encoding. The same category cues are then employed to recall a different list of 16 targets. For example, the semantic cue “fruit” may be associated with “strawberries” on the first list and “pears” on the second list. Associative binding can be assessed through this type of paradigm; something that cannot be done with widely employed traditional memory measures. Another type of memory binding paradigm is Parra-Rodriguez’s Short-term Visual Memory Binding Test (SVMBT; Parra et al., 2010). This measure relies on feature detection embedded in a recognition paradigm. The participant is presented sequentially with two arrays of various shapes and colors. After these visual arrays are separated by a short delay, the participant is required to detect whether there is a difference between the first and second array. In this test, memory-binding effects can be tested for using polygon shape and color combination, contrasted to memory for polygon shape alone or polygon color alone. The SVMBT, which utilizes feature binding, as opposed to semantic binding, has been shown to be very sensitive in detecting memory deficits in early AD (Della Sala et al., 2012) as well as changes in in E280A single presenilin-1 mutation, asymptomatic carriers AD patients (Parra et al., 2011) and can differentiate mild AD from depression and other non-AD disorders (Della Sala et al., 2012).
Table 1 depicts the limitation of commonly employed list-learning measures and some possible solutions that may lead to better sensitivity to detect conditions such as prodromal AD and other related neurodegenerative brain disorders.
Disadvantages of Traditional Memory Tests in the Detection of Subtle Cognitive Impairment in Those With Early Neurodegenerative Brain Disease.
Emerging Paradigms for the Development of Cognitive Stress Tests
Originally, vulnerability to semantic interference was first evaluated by having the older adult remember two competing lists of semantically related targets (Loewenstein et al., 2003; Loewenstein et al., 2004). Performance on the second target list was susceptible to the effects of PSI and demonstrated excellent sensitivity in distinguishing MCI from CN elders. A limitation of this original paradigm was the lack of controlled learning, no cued recall consistent with original encoding, and that a number of visually presented objects could be stored in ways not limited to the semantic memory system. Subsequently, a more refined paradigm was developed, namely, the Loewenstein–Acevedo Scales of Semantic Interference and Learning (LASSI-L; Crocco, Curiel, Acevedo, Czaja, & Loewenstein, 2014; Curiel et al., 2013) which demonstrated high test–retest reliabilities for both amnestic MCI and CN subjects. In this paradigm, learning is organized around three semantic categories (fruits, musical instruments, and articles of clothing), with each category containing five targets. After learning with free and cued recall of the initial 15 targets, comprising List A, this list is readministered to maximize encoding and storage of to-be-remembered information. After the second cued recall of List A targets, PSI is elicited by presenting another list of 15 targets (List B), within the same semantic categories. List B is then readministered followed by cued recall of List B, which represents the extent of recovery from PSI.
Subsequent recall of List A targets is used to assess RSI effects. A 20-minute interval allows for the assessment of delayed recall. While the LASSI-L measures learning and the effects of semantic interference, among MCI patients with suspected early AD, shared semantic cueing across both lists produced significant numbers of semantic intrusion errors. In fact, on the initial List B cued recall, 52.9% amnestic mild cognitive impairment (aMCI) patients and 72.5% of AD patients, but only 6.3% of CN elders had an equivalent or greater number of semantic intrusions for List B targets than correct cued recall of the targets themselves (Crocco et al., 2014; Curiel et al., 2013). In fact, a combination of cued recall measures tapping maximum storage and susceptibility to proactive interference on the LASSI-L differentiated between aMCI and normal elderly subjects with 87.9% sensitivity and 92.5% specificity. Because the design of the LASSI-L cued recall condition magnifies semantic interference effects, it has shown to have greater sensitivity and specificity to detect very early and subtle cognitive impairment among asymptomatic older adults with apparently normal cognition.
The strengths of the LASSI-L depicted in Figure 1 include the following: (a) explicit identification of the semantic categories around which learning should be organized when target words are initially presented; (b) use of a second list in which each to-be-remembered target is semantically related to a target on the first list; (c) an increased emphasis on encoding by increasing depth of initial processing of to-be-remembered information by repeated exposure to List A stimuli; (d) evaluation of both proactive and retroactive interference, as well as the second presentation and cued recall of List B targets, which provides a unique measure of recovery from proactive interference. It should be noted that there are various delayed recall formats for the two competing semantically related lists using free and cued recall formats.

Description of LASSI-L paradigm.
Loewenstein et al. (2016) studied 91 older adults who were administered the LASSI-L. Thirty-one of these individuals were CN, 18 persons had subjective memory complaints (SMC) but otherwise judged to be CN by the examining clinician and independent evaluation by a neuropsychologist. Twenty-nine persons were diagnosed with amnestic MCI and 15 individuals were diagnosed with PreMCI (evidence of memory decline on clinical examination but normal scores on neuropsychological testing).
Using previously established cutoffs for impairment, controlled learning on two trials of the LASSI-L (which represented learning capacity) was impaired in 31.1% of aMCI patients, 5.6% in persons with SMC and 0% in CN and PreMCI participants. In contrast, List B1 cued recall (vulnerable to the effects of PSI) was impaired in 12.5% of CN, 33.3% of SMC, 46.7% in PreMCI, and 78.6% in aMCI patients. Interestingly, when participants were given an additional opportunity to learn and retrieve List B objects on cued recall, 0% of CN participants, 16.7% of SMC participants, 26.7% of PreMCI subjects, and 60.7% individuals with amnestic MCI had difficulties with recovery from proactive interference. Both proactive interference (PSI) and failure to recover from PSI appeared to evidence monotonic increases as a function of the level of disease severity. This was also the first investigation to show that failure to recover from PSI could distinguish between different nondemented groups.
Amyloid imaging was obtained on 23 of the aforementioned participants without MCI or who had normal scores on traditional neuropsychological tests (Loewenstein et al., 2016).
Using the Spearman’s rho coefficient and a p value of ≤.01 to adjust for multiple errors, failure to recover from PSI was strongly related to increased amyloid deposition in the whole brain (rs = −.60), precuneus (rs = −.62), posterior cingulate (rs = −.50), and anterior cingulate (rs = −.48). There were also statistically significant correlations between initial problems with learning List A with the anterior cingulate (−.49) and frontal lobe (rs = −.44). These findings indicate that the failure to recover from semantic interference was related to amyloid load but not on other aspects of the LASSI-L or other traditional memory measures.
In another recently completed study, we examined the relationship between the LASSI-L and AD signature regions on magnetic resonance imaging (MRI), we examined 32 individuals with MCI and MRI volumes in Alzheimer’s signature regions. Failure to recover from PSI on the LASSI-L was uniquely associated with reduced volumes in the superior parietal lobules (rs = .49; p < .01), precuneus (rs = .54; p < .01), and increased volumes of the inferior lateral ventricle (rs = −.51; p < .01) which were not associated with other traditional neuropsychological measures such as the HVLT (Total and Delayed Recall), Delayed Story Passage on the Wechsler Memory Scale, Trails A and B, Category Fluency and Block Design. Reduced hippocampal and inferior lateral temporal volumes were associated with failure to recover from PSI on the LASSI-L but were also related to performance on memory for delayed passages or category fluency. For CN elders, only increased inferior lateral ventricular size was associated with vulnerability to PSI (rs = −.57), the failure to recover from PSI (rs = −.58), and delayed recall on the HVLT-Revised (rs = −.45).
Taken together, these findings indicate that the inability to recover from proactive interference may constitute an early and unique cognitive deficiency in AD and is associated with early changes in AD-sensitive biomarkers.
Matías-Guiu et al. (2016) recently validated the LASSI-L among Spaniards and suggested that the LASSI-L is a reliable and valid test for the diagnosis of aMCI and mild AD. The study found that internal consistency was 0.932, and convergent validity with the Free and Cued Selective Reminding Test was moderate. LASSI-L raw scores were correlated with age and years of education, but not gender. The area under the curve for discriminating between healthy controls and aMCI was 0.909, and between healthy controls and mild AD was 0.986. LASSI-L subscores representing maximum storage capacity, recovery from proactive interference, and delayed recall yielded the highest diagnostic accuracy.
Novel Paradigms Involving Memory Binding
On the MBT developed in Buschke’s (2014) laboratory has demonstrated high test–retest reliabilities for participants with MCI. Good test–retest reliabilities have been established by Gramunt et al. (2016). On the MBT, the participant is asked to point to a word belonging to a particular semantic category (e.g., country). A separate category cue is provided for each of 16 targets to be remembered and this allows for controlled learning. When a second list of 16 targets (List B) is presented with the same category cues as on List A, the subject is administered cued recall for the B targets and then has to recall both pairs of targets related to the semantic cues (which provides a measure of memory binding) followed by free recall of both List A and List B targets. Long-term delayed recall is then assessed. Figure 2 depicts the MBT procedure. The MBT is available for Spanish speakers (Buschke, 2014; Gramunt et al., 2016).

Memory Binding Test.
Frey et al. (2009) observed that the MBT, previously known as the Memory Capacity Test was a more challenging paradigm than standard selective reminding tests and free and cued selective reminding tests and that it was more strongly associated with amyloid load in the brain among community-dwelling elders. More recently, Papp et al. (2015) found that the MBT was distinctive from standard free and cued recall selective reminding tests in terms of its relationship to amyloid burden in the brain. More specifically, it was found that among cognitively intact individuals, deficits in free recall was associated only with high amyloid deposition but no evidence of neurodegeneration in the brain, while deficits in both free recall and cued recall were observed in persons with both high amyloid load and evidence of neurodegeneration. More recently, the MBT differentiated aMCI from normal elderly subjects (Buschke et al., 2016) and also predicted incident aMCI longitudinally (Mowrey et al., 2016).
Additional Novel Cognitive Paradigms
The Face–Name Test
The Face–Name Test is a challenging paradigm based on 16 face–name pairs and 16 face–occupation pairs for a total of 32 paired associates to be remembered. The addition of a face–occupation versus a face–name aspect of the tests makes it unique and more challenging than traditional face–name paradigms. Excellent reliability and concurrent validity has been reported by Amariglio et al. (2012). This paradigm has the advantage of tapping ecologically relevant cognitive associative skills, and scores on this test have been shown to be correlated to amyloid load among normal elderly individuals, some of whom presumably have Preclinical AD (Rentz et al., 2011). The Face–Name Test is currently being employed in the Dominantly Inherited Alzheimer Network, and the Anti-Amyloid Treatment in Asymptomatic AD for secondary prevention of AD (Morris et al., 2012, Sperling, Donohue, & Aisen, 2012).
Short-term Visual Memory Binding Test
As described above, another innovative paradigm is Parra-Rodriguez’s SVMBT. The SVMBT, which employs binding (polygon shape and color combination), has been shown to be very sensitive in detecting memory deficits in Preclinical AD, (i.e., among asymptomatic carriers of the E280A single presenilin-1 mutation, which results in autosomal dominant early onset AD; Parra, 2011). The SVMBT can also differentiate early AD from depression and other non-AD disorders (Della Salla et al., 2012).
Spatial Pattern Recognition Test
While spatial pattern recognition, and spatial discrimination or location have been studied for a number of years, there is a renewed interest in relating these paradigms neurodegenerative disease. There is increasing recognition that thinning of the parahippocampal gyrsus my account for impaired pattern recognition on early AD (Bar & Aminoff, 2003; J. Liu et al., 2015). An interesting measure is the Spatial Pattern Recognition Memory Test in which participants view a single dot on a screen for three seconds followed by different delay intervals that require them to view new dots and to identify the dot in its original location. Performance on the Spatial Pattern Recognition Test has been shown to be sensitive to a ratio of AB42 and phosphorylated tau among participants with Preclinical AD (Lau et al., 2012).
In addition to this measure, Stark, Yassa, and Stark (2010) describe Spatial Pair Distance task which requires participants to notice changes in the positions of an array of objects after a brief interval. They contend that this task is sensitive to deficits in the dentate gyrus seen in early AD. K. Y. Liu et al. (2016) provides a comprehensive review of different available tests of pattern completion and separation.
Tau deposition and volumetric occurs early in the hippocampus and entorhinal in subjects this prodromal AD which may explain why spatial deficits are seen on these tasks. Similarly, deficits in the ventral stream of the occipital–temporal cortex affect object discrimination and impaired dorsal pathways in the involving occipital–temporal cortex are likely related to Alzheimer’s pathology (Possin, 2010). Lithfous, Dufour, and Després (2013) argue that both structural and functional changes in the hippocampus, parahippocampal gyrus, caudate nucleus, retrosplenial cortex, prefrontal cortex, and parietal lobe are all important determinants of spatial navigation. Benke, Karner, Petermichl, Prantner, and Kemmler (2014) demonstrates how both MCI and mild AD patients have deficits in actual route learning.
The Use of Cognitive Composite Scores
As mentioned previously, there have attempts to increase the range of cognitive scores by employing composites of various traditional memory and nonmemory measures. Donohue et al. (2014) reported that a cognitive composite representing the z scores of the total recall score from the Free and Cued Selective Reminding Test, delayed recall from the Logical Memory II subtests of the Wechsler Memory Scale, the total Mini–Mental State Examination (MMSE) score and Wechsler Adult Intelligence Scale–Fourth Edition Digit Symbol substitution score to form the Alzheimer Disease Cooperative Study–Preclinical Alzheimer Cognitive Composite score. As expected, decline in this composite score of memory, global cognitive abilities, and a processing speed task which was described as “timed executive dysfunction” was worse among normal elderly with high amyloid load and is being used as the primary outcome measure in the Anti-Amyloid Treatment in Asymptomatic Alzheimer’s (A4) study (Sperling et al., 2012). Other composites using traditional cognitive measures are being employed in other anti-amyloid trials. More recently, Lim et al. (2015) reported that the Alzheimer Disease Cooperative Study–Preclinical Alzheimer Cognitive Composite score was more sensitive to amyloid associated decline if the MMSE score was replaced by the FAS Controlled Oral Word Association Test. Coley et al. (2016) pointed to these composites as comprising a retrospective analyses of a large number of measures in observational trials but did not focus in a clinically significant end point. By combining the orientation subscales of the MMSE, Trail B, category fluency and the free and cued selective reminding test, they found that they could define clinically relevant cut-points that could aid in prediction of longitudinal decline.
The focus on composite cognitive scores as a primary clinical outcome is understandable given pressures to maximize the information gleaned from a plethora of cognitive measures that have been long part of Alzheimer’s clinical trials. On the other hand, all of these measures are part of traditional neuropsychological paradigms, many of which are decades old and largely insensitive to subtle changes in memory. This leaves important unanswered questions such as: (a) How would these composites fare against more novel cognitive paradigms with regard to the earliest changes in brain biomarkers? (b) Can constructs such as memory binding or vulnerability to and recovery from PSI provide additional explanatory power? and (c) Are cognitive tests sensitive to the earliest brain changes necessarily the best measures for monitoring cognitive changes over time?
Trends Toward Computerized Assessment
It is becoming increasingly recognized that traditional paper-and-pencil neuropsychological assessments are lengthy, labor-intensive, vulnerable to human error, and associated with practice effects. With advances in computer science, computerized testing batteries in older adults have been advocated as offering more standardized administration, reduces the need for well-trained psychometrists, provides accessibility to distant sites, promotes efficiency, providing real-time data entry, and increasing the accuracy of recording responses and response time.
Several computerized tests have been developed including the CogState, Computer Assessment of Mild Cognitive Impairment (CAMCI), Cambridge Neuropsychological Test Automated Battery, CNS Vital Signs, and the Cognition Battery from the National Institutes of Health Toolbox, but these too have limitations in early detection of cognitive impairments. For example, many of these computerized batteries are relatively successful at distinguishing between participants with normal cognition and those with dementia or late stage MCI, but lack the predictive power needed to move the field forward, which is to correctly classify individuals with MCI and/or earlier in the disease continuum among different ethnic and cultural groups. This highlights a major problem with many traditional computerized batteries; they are often automated versions of traditional neuropsychological tests that lack sensitivity to detect AD-related cognitive decline, and frequently employ the same paradigms originally developed for the assessment of dementia or traumatic brain injury.
One of the most widely used traditional computerized cognitive batteries for the assessment of MCI is the CogState (Darby, 2004). As part of the Mayo Clinic Study on Aging, Mielke et al. (2015) administered the CogState to 1,660 nondemented older adults aged 50 to 97 years, and found that computerized assessment was both feasible and acceptable among older adults with a wide range of age and education. In fact, a touchscreen platform was preferred by this population. In this study, 86 MCI participants were assessed and found to have worse performance than cognitively healthy individuals; however, it is likely that a significant number of individuals classified as MCI were in the later stages of the MCI continuum which is more cognitively similar to early dementia in terms of neuropsychological test performance. Furthermore, the authors note that their results are not generalizable to other ethnicities due to the demographic makeup of the region (Minnesota, USA). Another study conducted by Mielke et al. (2014) aimed to examine performance on the CogState with neuroimaging biomarkers (MRI, FDG PET [fluorodeoxyglucose PET], and amyloid PET) among CN participants aged 51 to 71 years; however, only weak associations were found between CogState subtests and biomarkers of neurodegeneration.
Another measure available to assess MCI is the CAMCI (Saxton, 2009). This is a battery of tests intended for use in conjunction with other neuropsychological tools and includes paper-and-pencil tests modified for computer presentation along with a uniquely developed virtual reality task. The CAMCI was found to show good sensitivity (86%) and specificity (94%) for the identification of MCI among community-dwelling elders; however, there were significant limitations in the criteria employed to classify individuals as MCI, such as the lack of a reliable informant. Other weaknesses of this semicomputerized measure is that it is currently only available in English and is intended only for English-speaking populations in the United States and Canada, there are no alternate forms, and it requires a trained examiner to complete the entire battery.
The Cognition Battery from the NIH Toolbox for the Assessment of Neurological and Behavioral Function (NIH-TB). This was developed in 2004 to supplement outcome measures in epidemiologic and longitudinal research and clinical trials (Beaumont et al., 2013; Weintraub et al., 2013). The computer-administered tests were validated in English and Spanish, in a sample of 4,859 community-dwelling participants ranging in age from 3 to 85 years. A total of 1,446 older adults were included in the normative sample (English-speaking adults aged 60-85 years, n = 1,038; Spanish-speaking adults aged 60-85 years, n = 408). The cognitive domains assessed by the NIH-TB Cognition Battery are general, and include executive functioning, attention, processing speed, language, working memory, and episodic memory. Episodic memory is assessed by the Picture Sequence Memory Test, in which subjects are asked to recall increasingly lengthy sequences of up to 18 pictures with corresponding audio-recorded phrases over the course of two learning trials, and the Auditory Verbal Learning Test (AVLT), which is a 15-item list-learning task that is based on the RAVLT. A major limitation of the NIH-TB Cognition Battery is the small number of Hispanic older adults that were included in the normative sample (Picture Sequence Memory Test: aged 60-85 years, n = 35; AVLT: aged 60-85 years, n = 33). Another major limitation lies in the AVLT; this test differs from the traditional RAVLT in that there are three, rather than five, learning trials, and that it only provides a measure of immediate memory. Finally, there are no studies that has validated the use of the NIH-TB with diverse MCI or aMCI populations and have demonstrated adequate sensitivity/specificity. Other major issues with the NIH-TB cognitive battery and other computerized tests is that they employ the older cognitive test paradigms discussed earlier and do not tap more novel paradigms sensitive to AD pathology such as vulnerability to semantic interference, memory binding, or prospective memory (remembering to remember an intended action) nor do they incorporate the latest state-of-the-art speech recognition and virtual reality technologies.
The shift to computer-based platforms are advantageous but state-of-the-art technology has not immediately translated in paradigmatic shifts that advance the field. In many cases, these are the same measures and paradigms that have been employed for many decades. We welcome improvements in test administration that relies on sophisticated speech recognition, virtual reality, and other advanced human–computer interface technologies that might lead the testing system to be more sensitive, reliable, and available for remote delivery.
Summary and Conclusions
Traditional memory tasks have been successfully employed for many decades and have proven quite useful for the diagnosis and longitudinal evaluation of many neurological and neuropsychiatric conditions. Having used these traditional memory tests for many decades, we believe that they have made a significant contribution to both clinical work and scientific investigation. However, with the advent of advanced neuroimaging and CSF biomarkers, there is now the capacity to detect pathological changes in the brain, even in preclinical stages of AD and related disorders. This emphasizes the need for neuropsychological measures which can identify cognitive changes resulting from “preclinical” disease and which can track the progression and response to treatment of these cognitive deficits (Dubois et al., 2010; Lau et al., 2012; Sperling et al., 2011). Recent FDA guidelines suggest biomarkers alone will not be sufficient as surrogate outcome measures to show effectiveness and ultimately approval of a medication for the treatment of AD.
In light of the increasing emphasis placed on detection and monitoring of the earliest changes in AD and other neurodegenerative disorders, we believe that uncontrolled learning paradigms are far too susceptible to attentional issues and individualized differences in learning strategies. Controlled learning and encoding specificity, as well as assessment of performance in the context of individual memory capacity can be essential the quantity and quality of information which is processed, and reduce intersubject variability in performance associated with attention deficits. There is increasing evidence that memory tests, such as those utilizing memory binding, proactive interference, and retroactive interference effects have been demonstrated to be effective in this regard. Such tests include the MBT, which focuses on proactive interference and memory binding and the LASSI-L, which maximizes both proactive interference effects and retroactive interference effects while uniquely measuring recovery from proactive interference.
Despite rapid changes in neurosciences including neuroimaging, biomarkers, and genetics, there has not been a significant paradigmatic advances in traditional neuropsychological measures that assess memory function. Moving forward, greater consideration should be accorded to memory paradigms that: (a) promote active encoding and maximizes the depth of processing of to-be-remembered information, (b) explicitly identify the semantic categories around which learning should be organized, (c) maximize encoding specificity by using the same category cues used at both encoding during recall, (d) evaluate semantic interference and release from semantic interference and/or associative binding. Paradigms that optimize initial learning provide an individual’s baseline against which to judge proactive semantic interference, recovery from proactive interference and retroactive interference. Using an individual’s initial performance as a benchmark allows a person to be used as his or her own control and reduces sole reliance on comparing an individual’s given score to group normative data.
The steps outlined above can potentially enhance the signal-to-noise ratio and maximize the ability to detect the earliest possible memory changes in incipient AD and other neurodegenerative conditions. These efforts should by no means be limited to just memory measures. There also needs to be further work on developing more sensitive indices of language, executive, and visuospatial function.
Adding dimensions to existing memory measures such as recovery from proactive interference is especially intriguing given increasing evidence that such measures may fare better than traditional measures in their sensitivity to the buildup of brain amyloid in noncognitively impaired community-dwelling elders. Longitudinal studies are currently underway to compare the predictive validity of such paradigms as they relate to progression of cognitive decline over time and changes in established brain biomarkers. Furthermore, we are developing computerized versions of these measures to aid in ease of administration and portability using advanced computer interfaces and voice recognition technology.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by 1 R01 AG047649-01A1, David Loewenstein, PI, the Ed and Ethel Moore Research Program, ALZ002-David Loewenstein, PI, 5 P50 AG047726602 1Florida Alzheimer’s Disease Research Center (Todd Golde, PI).
