Abstract
The use of psychological tests to help identify the noncredible overreporting of psychiatric disorders is a long-standing practice that has received considerable attention from researchers. The purpose of this study was to experimentally determine whether feigning specific psychiatric disorders moderated the influence of coaching on the detection of noncredible overreporting using the Minnesota Multiphasic Personality Inventory–2 (MMPI-2) and the Personality Assessment Inventory (PAI). Using a 2 × 3 experimental analogue design, 265 undergraduates were asked to feign schizophrenia, posttraumatic stress disorder, or generalized anxiety disorder and were either coached about validity scales and disorders or not. The results of this study indicated that the specific psychiatric disorder being feigned did moderate the impact coaching had on the detection of overreported psychopathology using several scales on the MMPI-2 and PAI. Future research examining noncredible overreporting should take into account the impact caused by the interaction of psychiatric disorder with coaching on the detection of symptom overreporting and also identify other important moderating/mediating variables in order to develop more effective means of identifying response bias.
Keywords
Determining the credibility of a client’s symptom presentation is a task that, depending on the specifics of one’s practice, can range in importance from very minor to nearly primary (e.g., pretrial forensic evaluations). Naturally, clinicians use a number of techniques to verify symptom credibility of which psychological testing is but one. Psychological tests are, of course, vulnerable to noncredible responding as well. In fact, Gordon Allport (1937) wrote that one of the major limitations of self-report testing is “the necessity for co-operative and competent subjects” (p. 451). However, some psychological tests include validity scales designed to measure some of the common forms of response bias in which respondents might engage. Ben-Porath (2003) emphasized the importance of using validity scales to evaluate protocol validity, which “refers to the results of an individual test administration” (p. 566). If the protocol is not valid because the client used a response style different from the test’s instructions, then the results are uninterpretable. There are a number of different types of response bias that invalidate test results (for a review, see Ben-Porath, 2003), one of which is overreporting. Burchett and Ben-Porath (2010) define overreporting as “exaggeration or fabrication of symptoms” (p. 497).
A considerable body of research literature has focused on examining the detection of noncredible overreporting using psychological tests in general and the Minnesota Multiphasic Personality Inventory–2 (MMPI-2; Butcher et al., 2001), in particular. Likely, the reason for the interest in examining the efficacy of the MMPI-2 in detecting noncredible symptom presentation is a direct result of the test being able to not only provide clinicians with information about client response bias but also valuable clinical information. Indeed, the MMPI-2 is the most commonly used measure of personality and psychopathology by clinical psychologists in North America (Camara, Nathan, & Puente, 2000).
Although it does not have the historical legacy of the MMPI-2, the Personality Assessment Inventory (PAI; Morey, 2007) is another multiscale instrument designed to assess a broad range of clinically relevant personality characteristics and symptoms of psychopathology. It too has validity scales designed to detect response bias and thus can be used by clinicians to evaluate both the veracity and the scope of their clients’ reported psychiatric symptomatology.
Given the ease of access to information about mental illness available in the age of the Internet, it seems likely that most individuals could gain some knowledge about a disorder they are feigning with even minimal effort. It makes intuitive sense that this information would aid that person in his or her ability to feign successfully. However, most studies examining the impact of symptom knowledge on the MMPI-2 have concluded that symptom information does not improve the ability of most individuals to avoid being identified as overreporters of psychiatric disorders such as schizophrenia (Bagby, Rogers, Buis, et al., 1997; Bagby, Rogers, Nicholson, et al., 1997; Rogers, Bagby, & Chakraborty, 1993; Wetter, Baer, Berry, Robison, & Sumpter, 1993) and posttraumatic stress disorder (PTSD; Arbisi, Ben-Porath, & McNulty, 2006; Bury & Bagby, 2002; Elhai, Gold, Frueh, & Gold, 2000; Elhai, Gold, Sellers, & Dorfman, 2001; Lange, Sullivan, & Scott, 2010; Marshall & Bagby, 2006; Moyer, Burkhardt, & Gordon, 2002; Wetter et al., 1993). There have been a few studies that have found symptom information about specific disorders, namely somatoform disorder (Sivec, Lynn, & Garske, 1994) and depression (Bagby, Rogers, Buis, et al., 1997; Walters & Clopton, 2000; although for contrasting evidence, see Bagby, Nicholson, Buis, & Bacchiochi, 2000; Lange et al., 2010) seems to result in somewhat decreased detection rates of overreporting on the MMPI-2. A meta-analysis examining feigning on the MMPI-2 conducted by Rogers, Sewell, Martin, and Vitacco (2003) provides some support for the notion that overreporters of some specific disorders may be more difficult to detect then others. Rogers et al. report that across studies the effect sizes for the Infrequency (F), the back-F (Fb), and the Infrequency-Psychopathology (Fp) scales are smaller when comparing feigners of PTSD with genuine PTSD patients than when comparing schizophrenia feigners with genuine schizophrenia patients. These results indicate that feigned PTSD may be more difficult to detect than feigned schizophrenia as the mean differences in F, Fb, and Fp scale elevations between feigning samples and clinical samples are smaller when examining PTSD.
Research examining the impact of symptom information on the PAI’s ability to detect noncredible overreporting has also supported the notion that symptom information does not improve the ability of individuals feigning schizophrenia to avoid detection (Rogers, Ornduff, & Sewell, 1993; Rogers, Sewell, Morey, & Ustad, 1996). Like with the MMPI-2, there have also been studies that indicated that participants provided with symptom information about some specific disorders are somewhat less likely to be detected as overreporters on the PAI. Specifically, research has suggested that feigned PTSD (Calhoun, Earnst, Tucker, Kirby, & Beckham, 2000; Lange et al., 2010; although see Liljequist, Kinder, & Schinka, 1998, for contrasting evidence), generalized anxiety disorder (GAD; Rogers et al., 1996; Rogers, Ornduff, et al., 1993), and depression (Lange et al., 2010; Rogers et al., 1996; Rogers, Ornduff, et al., 1993) may be more difficult to detect. Providing additional support for the notion that some disorders are easier to feign than others, Hawes and Boccaccini (2009) conducted a meta-analysis of PAI noncredible overreporting studies and found smaller effect sizes for the Negative Impression Management scale (NIM), the malingering index (MAL), and the Rogers discriminant function (RDF) when participants were instructed to feign mood or anxiety disorders as compared with participants feigning psychotic conditions. These results can be interpreted as indicating that it may be more difficult to detect feigned mood or anxiety disorders than it is to detect feigned psychotic disorders using the PAI.
A potentially more serious complication to the matter of the detection of noncredible responding is if individuals obtain information about an instrument’s validity scales. Any information about how a test detects noncredible responding might aid individuals in successfully feigning while avoiding detection. Several studies have examined the impact of coaching about validity scales on the MMPI-2.
Viglione, Wright, Moynihan, DuPuis, and Pizitz (2001) found that by merely cautioning their participants to avoid dramatic presentations and instead instructing them to be realistic in their feigned responding resulted in decreased sensitivity of MMPI-2 validity scales. Several other studies have also reported a significant decrease in the mean validity scale scores and also decreased sensitivity in identification of overreporting when participants were provided with validity scale information (Efendov, Sellbom, & Bagby, 2008; Storm & Graham, 2000; Walters & Clopton, 2000).
In contrast, several studies (Bagby, Nicholson, Bacchiochi, Ryder, & Bury, 2002; Bury & Bagby, 2002) found no significant impact of validity scale coaching on mean scale scores although Bury and Bagby did report a small decrease in validity scale sensitivity. Blanchard, McGrath, Pogge, and Khadivi (2003) gave participants a warning similar to the warning given by Viglione et al. (2001). However, Blanchard et al. found that the MMPI-2’s validity scales had very high sensitivity rates, successfully identifying almost all overreporters, although they did not use an uncoached comparison group to directly gauge the impact of coaching.
There is little research examining the impact of validity scale coaching on the PAI. Eakin, Weathers, Benson, Anderson, and Funderburk (2006) found the mean validity scale scores on the PAI produced by coached feigners were not significantly different from the mean scores of a clinical comparison sample. Guriel-Tennant and Fremouw (2006) found that coached feigners of PTSD had significantly lower mean validity scale scores and were also significantly less like to elevate these scales above a cutoff than uncoached feigners. Also troubling, Blanchard et al. (2003) found much lower sensitivity rates for PAI validity scales than MMPI-2 validity scales. In contrast, Bagby et al. (2002) not only found that both coached and uncoached feigners produced significantly higher mean scores on PAI validity scales, they also reported that there were no significant differences between coached and uncoached feigners.
In summary, the literature seems somewhat conflicted in determining the impact of symptom and validity scale information in the detection of noncredible overreporting on both the MMPI-2 and PAI. Baron and Kenny (1986) suggest moderating variables are especially likely to be present when inconsistent relationships between variables are found across different studies and/or samples. They define a moderator as a “variable that affects the direction and/or strength of the relation between an independent or predictor variable and a dependent or criterion variable” (Baron & Kenny, 1986, p. 1174).
It is admittedly unclear, and perhaps somewhat arbitrary, whether the specific psychiatric disorder being feigned moderates the impact of coaching on the detection of overreporting by the MMPI-2 and PAI or vice versa. However, if one defines coaching as the receipt of any information which might aid in successful overreporting, as is done in this study, then coaching includes both symptom information and validity scale information. Thus, it seems to be somewhat more readily conceptualized that the degree of sophistication represented by the feigner (i.e., coached vs. uncoached) is the independent variable that might directly affect the scores on the dependent variables (i.e., validity scales of the MMPI-2 and PAI), whereas the specific psychiatric disorder being feigned might be best understood as moderating how strong the impact of coaching is on the dependent variables. Schizophrenia, PTSD, and GAD were selected for examination in this study because they collectively provide broad coverage of important aspects of psychopathology including symptom manifestation, etiology, symptom severity, and functional impairment. Therefore, the purpose of this study is to experimentally examine the moderating effect of schizophrenia, PTSD, and GAD on the impact of coaching on the validity scale scores of the MMPI-2 and the PAI.
Method
Participants
The initial sample was composed of 295 undergraduates recruited at a medium, Midwestern university. Participants were screened on both the MMPI-2 and the PAI for incomplete protocols and random responding during the administration under standard instructions. During the administration under experimental instructions, participants were screened for incomplete protocols only. Screening participants’ protocols completed under experimental instructions for random responding was not conducted because previous research has demonstrated that some participants instructed to feign mental illness will produce protocols consistent with random responding (e.g., Arbisi & Ben-Porath, 1998). Screening criteria used for non-content-based invalid responding on the MMPI-2 were cutoff scores of Variable Response Inconsistency (VRIN) > 79T, True Response Inconsistency (TRIN) > 79T, or Cannot Say (CNS; i.e., the number of omitted or unscorable items) > 29. Noncontent–based invalid responding on the PAI was assessed using cut scores of Infrequency (INF) > 74T, Inconsistency (ICN) > 72T, or CNS > 17. A total of 30 participants were withheld from all subsequent analyses as a result of having at least one scale elevated above the cutoffs on either or both instruments. The remaining sample was composed of 85 men and 180 women, with a mean age of 18.84 (SD = 1.30), and a predominantly European American ethnic background (95.5%).
Procedures
All participants were administered the MMPI-2 and the PAI in a single session under standard instructions. The order of test administration was counterbalanced, and at the end of the first session the participants were randomly assigned to one of six experimental conditions. The six experimental conditions were uncoached schizophrenia, coached schizophrenia, uncoached PTSD, coached PTSD, uncoached GAD, and coached GAD. There were no statistically significant demographic differences between experimental groups based on sex, χ2(5) = 0.84, p = .97, Cramer’s V = .06; ethnicity, χ2(25) = 28.04, p = .31, Cramer’s V = .15, or age, F(7, 257) = 1.36, p = .22, η2 = .04.
One week after completing the MMPI-2 and PAI under standard instructions, participants returned for administration under experimental instructions. Participants were informed at the second session of testing that they would be administered the same two measures, but they would complete them under different instructions. They were encouraged to put forth their best effort and were apprised of the chance to win one of six $50 prizes awarded to the participants best able to follow the instructions for completing the MMPI-2 and PAI during the second testing session (i.e., administration under experimental conditions).
Participants in the three uncoached conditions received the details of a scenario describing why they might feign a specific psychiatric disorder along with the name of the disorder they were to feign. The scenario read,
Imagine that you are seeking insurance payments for long term disability and these disability payments are contingent on you having schizophrenia. To convince the insurance company that you are eligible for this money you must respond in a way that makes it appear that you actually have schizophrenia.
Participants in all six conditions received the same scenario information. The only difference in the scenarios presented across the six experimental groups was name of the psychiatric disorder. Participants in the three coached experimental conditions also received a checklist of symptoms commonly associated with their disorder written using jargon-free language. Finally, participants in the coached conditions were given instructions that said, “It is important to keep in mind that both of these questionnaires have been developed in such a way as to detect malingerers (people attempting to fake mental illness).” They were then provided with a list of five suggestions to avoid being identified as an overreporter. 1 Each individual participant completed both the MMPI-2 and the PAI under the same experimental condition. Test administration was again counterbalanced.
Instruments
MMPI-2
The MMPI-2 is a self-report instrument designed to measure abnormal personality. The validity scales are designed to detect distortions in test response patterns, whereas the clinical scales measure responses to questions about personality and psychopathology. For information about the psychometric properties of the MMPI-2, see the test manual (Butcher et al., 2001) or Graham (2011).
The validity scales of interest in this study include the set of F (Infrequency) scales. They are designed to measure the number of items on the MMPI-2 that are infrequently endorsed by others. This set of scales includes the F scale, the Fb scale, and the Fp scale. The F scale measures infrequent responses occurring in the first 370 items of the MMPI-2 whereas Fb covers the remainder of the test. The Fp Scale measures the number of items endorsed by the respondent that are seldom endorsed by both normal and clinical populations. Elevations on these scales could be because of several types of response bias, including symptom overreporting. The Symptom Validity Scale (FBS) is the most recent addition to the MMPI-2 validity scales. It has been described as useful “. . . in identifying potentially exaggerated claims of disability, primarily in the context of forensic neuropsychological evaluations” (Ben-Porath & Tellegen, 2011, para. 2).
PAI
The PAI is also a self-report measure designed to assess personality variables of clinical interest. The PAI is composed of 22 independent scales designed for four different purposes: validity, clinical, treatment, and interpersonal assessment. The scales of primary interest for this study are three of the validity scales. For information about the psychometric properties of the PAI, see the test manual (Morey, 2007).
The PAI validity scales of interest in this study include NIM, RDF, and MAL. NIM was designed to detect individuals endorsing unusual psychological symptoms and/or attempting to malinger. The RDF is an actuarial function that was developed to take into account the scale configuration of the non-NIM/PIM (Positive Impression Management) scales typically found on feigned PAI profiles. This scale taps into elements of profile distortion mostly independent of the other PAI validity scales. MAL measures eight aspects of profile configuration found in feigned profiles more often than in authentic clinical profiles. Factor analyses indicate that MAL taps into a factor described as self-reporting of uncommon psychotic symptoms as well as a factor described as indicating a negative view of the self and the world (Morey, 1996).
Results
The means and standard deviations on the clinical scales of the MMPI-2 and PAI under both standard and experimental instructions are reported in Table 1. It is clear that participants instructed to feign, regardless of coaching, had much higher scores on almost all the clinical scales on both instruments when compared with the mean scale scores of the tests when administered under standard instructions. The exceptions to this generalization, regardless of experimental condition, were a scale measuring identification with stereotypical gender roles on the MMPI-2 (Scale 5) and a hypomania scale on the PAI (MAN). For only the coached conditions, the hypomania scale on the MMPI-2 (Scale 9) and scales measuring antisocial traits and substance abuse on the PAI (Antisocial Features [ANT], Alcohol Problems [ALC], and Drug Problems [DRG]) were also exceptions. These results indicate that participants endorsed items associated with psychiatric disturbance on both instruments when under instructions to feign psychopathology.
MMPI-2 and PAI Clinical Scale Elevations Under Standard Instructions, Uncoached Overreporting Instructions, and Coached Overreporting Instructions
Note. Scale values are reported as T scores. MMPI-2 = Minnesota Multiphasic Personality Inventory–2; Scale 1 = Hypochondriasis (Hs); Scale 2 = Depression (D); Scale 3 = Hysteria (Hs); Scale 4 = Psychopathic Deviate (Pd); Scale 5 = Masculinity-Femininity (Mf); Scale 6 = Paranoia (Pa); Scale 7 = Psychasthenia (Pt); Scale 8 = Schizophrenia (Sc); Scale 9 = Hypomania (Ma); Scale 0 = Social Introversion (Si); PAI = Personality Assessment Inventory; SOM = Somatic Complaints; ANX = Anxiety; ARD = Anxiety Related Disorders; DEP = Depression; MAN = Mania; PAR = Paranoia; SCZ = Schizophrenia; BOR = Borderline Features; ANT = Antisocial Features; ALC = Alcohol Problems; DRG = Drug Problems.
n = 265.
n = 133.
n = 132.
Effect size of difference between standard instructions and uncoached overreporting instructions.
Effect size of difference between standard instructions and coached overreporting instructions.
Effect size of difference between uncoached overreporting instructions and coached overreporting instructions.
To assess whether the participants in the uncoached conditions were less able to successfully feign than those in the coached conditions in each of the three specific disorder conditions, cutoff scores of F > 99T, Fb > 109T, Fp > 99T, and FBS > 27 on the MMPI-2 were used to distinguish invalid from valid protocols. An elevation above any one of these cutoffs identified that protocol as being invalid. These scores were chosen to identify invalid protocols because they are the cutoff scores recommended in the official MMPI-2 product publications (Ben-Porath & Tellegen, 2011; Butcher et al., 2001). The number of participants in each experimental group able to produce a valid protocol can be found in Table 2. There was no difference in the rates of valid protocols produced by participants uncoached and coached to feign schizophrenia on the MMPI-2, χ2(1) = 0.001, p = .98, φ = .00. However, there was a statistically significant difference between coached and uncoached participants asked to feign PTSD, χ2(1) = 16.51, p < .0001, φ = .43, and GAD, χ2(1) = 13.56, p = .0002, φ = .40. In both cases, there were medium-sized effects indicating that coached participants were more likely to produce valid protocols.
MMPI-2 and PAI Valid Profile Rates by Experimental Condition
Note. MMPI-2 = Minnesota Multiphasic Personality Inventory–2; PAI = Personality Assessment Inventory; PTSD = posttraumatic stress disorder; GAD = generalized anxiety disorder.
The validity scale cutoffs used to determine if a PAI protocol was invalid were NIM > 91T, MAL > 4, and RDF > 0.99. Again, a single elevation above the cutoff of any one of these three scales resulted in that protocol being designated as invalid. These cut scores were recommended in Morey’s (1996) interpretive guide as indicative of invalid protocols. The number of participants in each experimental group who produced valid PAI protocols can also be found in Table 2. Statistical analyses indicated that there was no statistically significant difference in the rates of valid protocols produced by uncoached and coached participants instructed to feign schizophrenia, χ2(1) = 2.26, p = .13, φ = .16. There was a statistically significant difference among coached and uncoached feigners of PTSD, χ2(1) = 12.24, p = .0005, φ = .37, and GAD, χ2(1) = 10.26, p = .001, φ = .35, on the PAI. For feigning of both PTSD and GAD, there were medium-sized effects demonstrating that coached participants were more likely to produce valid protocols.
The results reported in Table 2 not only indicated that coaching decreased the ability of both the MMPI-2 and PAI to detect overreporting, but they also seemed to suggest schizophrenia may be more easily detected despite any impact that coaching might offer. However, additional analyses were needed to rigorously test the influence of coaching and the disorder being feigned on the ability of the validity scales of both instruments to detect symptom overreporting. Specifically, a two-by-three factorial multivariate analysis of variance (MANOVA) was conducted separately for the MMPI-2 validity scales and the PAI validity scales. Coaching was entered as the first independent variable with two levels: uncoached and coached. The second independent variable was the feigned disorder with schizophrenia, PTSD, and GAD serving as the three levels. Finally, an interaction term was included in the model to answer the question of whether or not the specific disorder being feigned moderated the impact of coaching on the ability of the validity scales to detect response bias.
Results of the MANOVA conducted on the MMPI-2 validity scales indicated that there was a significant multivariate main effect for coaching, Wilk’s Λ = .764, F(4, 256) = 19.79, p = .00, η2 = .24; a significant multivariate main effect for disorder, Wilk’s Λ = .611, F(8, 512) = 17.88, p = .00, η2 = .22; and a significant multivariate interaction term, Wilk’s Λ = .902, F(8, 512) = 3.40, p = .001, η2 = .05. Follow-up univariate analyses demonstrated that a significant interaction effect was present for the F scale, F(2, 259) = 9.67, p = .00, η2 = .07; the Fb scale, F(2, 259) = 3.97, p = .02, η2 = .03; and the Fp scale, F(2, 259) = 4.93, p = .01 η2 = .04. 2 Univariate analyses of FBS indicated that the interaction effect was not statistically significant, F(2, 259) = 0.86, p = .42, η2 = .01. However, there was a significant main effect for coaching, F(1, 259) = 42.47, p = .00, η2 = .14, on FBS, which showed that scores were significantly lower for coached participants than uncoached participants. The FBS main effect for disorder was also found to be significant, F(2, 259) = 14.35, p = .00, η2 = .10. Pairwise comparisons using the Tamhane post hoc test indicated that feigners of GAD had significantly higher scores on FBS than feigners of schizophrenia and PTSD.
Interpretation of the significant interaction terms for F, Fb, and Fp was somewhat more complex. In order to better examine how the disorder being feigned was moderating the effect of coaching, a new dummy variable was created that coded each of the six experimental conditions as a single variable so that pairwise comparisons could be conducted to determine which group means were significantly different. The cell means and standard deviations for each experimental group are reported in Table 3.
MMPI-2 and PAI Means, Standard Deviations, and Effect Sizes by Experimental Condition
Note. SCH = schizophrenia; PTSD = posttraumatic stress disorder; GAD = generalized anxiety disorder; MMPI-2 = Minnesota Multiphasic Personality Inventory–2; F = Infrequency Scale; Fb = Back Infrequency Scale; Fp = Infrequency Psychopathology scale; FBS = Symptom Validity Scale; PAI = Personality Assessment Inventory; NIM = Negative Impression Management scale; MAL = malingering index; RDF = Rogers discriminant function. Means and standard deviations are reported as T-score values for all scales except FBS, MAL, and RDF which are reported as raw score values. For the MMPI-2 scales, means in the same row with different subscripts differ significantly at p < .002 by the Tamhane post hoc test. For the PAI scales, means in the same row with different subscripts differ significantly at p < .002 by the Scheffe post hoc test. No post hoc comparisons were made between each individual experimental condition for FBS, NIM, or MAL because the interaction effect was not statistically significant.
n = 45.
n = 45.
n = 43.
n = 44.
n = 46.
n = 42.
Results for F and Fp were nearly identical. In both cases, the impact of coaching was significantly reduced if the disorder being feigned was schizophrenia. Evidence of this can be seen in Table 3 as the means for coached PTSD and coached GAD were significantly smaller than the means for the other four experimental groups. Also note in Table 3 that for F and Fp the effect size for coaching was large for PTSD and GAD, but it was only a medium effect for schizophrenia.
In the case of Fb, it appeared that the impact of coaching was heightened for GAD overreporters. Examination of the cell means indicated that coached GAD was the only coached condition significantly smaller than its uncoached pair. Examination of effect sizes supported this interpretation as the effect for coaching was greater among GAD overreporters (i.e., large) than among feigners of PTSD or schizophrenia (i.e., medium).
Results of the MANOVA conducted on the PAI validity scales indicated that there was a significant multivariate main effect for coaching, Wilk’s Λ = .673, F(3, 257) = 41.64, p = .00, η2 = .33; a significant multivariate main effect for disorder, Wilk’s Λ = .811, F(6, 514) = 9.46, p = .00, η2 = .10; and a significant multivariate interaction, Wilk’s Λ = .921, F(6, 514) = 3.59, p = .00, η2 = .04. Follow-up univariate analyses demonstrated that a significant interaction effect was present for the RDF scale, F(2, 259) = 4.39, p = .01, η2 = .03. Univariate analyses of NIM, F(2, 259) = 2.61, p = .08, η2 = .02, and MAL, F(2, 259) = 0.22, p = .80, η2 = .00, indicated that the interaction effect was not statistically significant for either of these validity scales. However, there were significant main effects for coaching on both NIM, F(1, 259) = 121.21, p = .00, η2 = .32, and MAL, F(1, 259) = 61.56, p = .00, η2 = .19, which in both cases indicated that scores were significantly lower for coached participants than uncoached participants. The main effects for disorder were also found to be significant for both NIM, F(2, 259) = 24.50, p = .00, η2 = .16, and MAL, F(2, 259) = 12.11, p = .00, η2 = .09. Pairwise comparisons using the Tamhane post hoc test indicated that for both NIM and MAL the only significant difference was that feigners of schizophrenia had higher scores than feigners of GAD and PTSD.
Interpretation of the interaction effect found for RDF was again based on the use of a dummy variable coded such that pairwise comparisons could be conducted to determine where the six experimental group means differed significantly (see Table 3 for cell means and standard deviations). Results indicated that the impact of coaching was heightened for PTSD feigners. Examination of the cell means shows that coached PTSD was the only coached condition whose mean was significantly smaller than the mean for its uncoached condition. Examination of effect sizes supported this interpretation as the effect for coaching was greater among PTSD overreporters (i.e., large) than among feigners of GAD (i.e., medium) or schizophrenia (i.e., small).
Discussion
Overall, the finding of a main effect for coaching on both instruments that resulted in decreased validity scale scores is consistent with previous research. The main effect for the specific psychiatric disorder feigned indicated that on six of seven validity scales (the only exception being FBS) validity scales’ scores were higher for feigned schizophrenia than they were for feigned PTSD or GAD. These results were also consistent with previous research. Perhaps more surprising than these findings were the results reported in Table 2, which indicated that across both instruments a significantly larger percentage of participants who were coached to feign PTSD and GAD avoided detection. Why were these coached participants more successful than their uncoached counterparts when the coached schizophrenia feigners were not? The presence of moderator effects can help us answer this question.
The results of this study indicated that in a majority of the validity scales examined, namely F, Fb, Fp, and RDF, the specific psychiatric disorder being feigned moderates the impact of coaching on validity scale scores. Specifically, feigning schizophrenia decreased the effectiveness of coaching, thereby resulting in increased scores on F and Fp. In other words, those two validity scales are less vulnerable to feigned schizophrenia regardless of whether or not the participant had been coached (although careful inspection of Table 3 suggests that F and Fp are possibly no less vulnerable to feigned schizophrenia than Fb and RDF). In contrast, feigning GAD increased the effectiveness of coaching and resulted in decreased scores on Fb just as feigning PTSD increased the effectiveness of coaching and resulted in decreased scores on RDF. In other words, Fb is more vulnerable to the effects of coaching when GAD is being feigned. Likewise, RDF is more vulnerable to the influence of coaching when PTSD is being feigned.
One possible explanation for how feigning schizophrenia decreased the effectiveness of coaching for F and Fp focuses on the types of items that compose those two scales. These scales are composed of items rarely endorsed by the MMPI-2 normative sample and a sample of psychiatric inpatients, respectively. Schizophrenia is marked by the presence of psychotic symptoms such as hallucinations and delusions. These symptoms are, by their very nature, rare. Therefore, participants attempting to feign schizophrenia may have endorsed some of the items that compose F and Fp in an attempt to portray themselves as psychotic. In contrast, symptoms of PTSD and GAD are not as markedly rare. Alternatively, many of the symptoms of schizophrenia are more abstract and difficult to encapsulate when writing test items in comparison with the symptoms of PTSD and GAD. Therefore, it is possible that there are fewer items easily identified as symptoms of schizophrenia on the MMPI-2 and PAI; thus, feigners might endorse items that appear to be bizarre, but are in fact not reflective of true psychosis and are instead scored on the tests’ validity scales.
A number of important implications for response bias researchers can be taken from these findings. First, and most important, the factors affecting validity scale detection of noncredible overreporting are more complexly related to one another than indicated by previous research. The interaction of coaching and disorder affected four of seven validity scale scores above and beyond any additive influence these two factors had alone. The presence of a moderating interaction helps explain some of the inconsistencies in results between the studies reviewed earlier. Therefore, future research on response bias should continue to account for the moderating impact of psychiatric disorder on coaching as well as examine other possible moderating/mediating variables. The moderator effects found in this study indicated that in some cases this is a result of decreased effectiveness of coaching on overreporting of psychotic symptomatology and in other cases increased effectiveness of coaching on overreporting of internalizing problems. Perhaps careful investigation of these effects on specific scales will result in new scale construction strategies that will ultimately improve detection of noncredible overreporting.
There are also several applied implications of this study. Clinicians can have increased confidence in the ability of validity scale scores to detect feigned psychotic symptoms even when coaching is suspected because the validity scales of the MMPI-2 and PAI are relatively robust in these circumstances. In contrast, if coached feigning of internalizing problems is likely, then clinicians would be wise to carefully consider collateral information before deciding that within normal limits validity scale scores are indicative of honest responding. The results of this study indicate that several validity scales may be especially susceptible to coached feigning of these types of problems.
In terms of limitations of this study, it has been observed in the literature that the use of analogue experimental design for examining symptom feigning results in limitations to generalizability because of the emphasis on maximizing internal consistency (Sellbom, Toomey, Wygant, Kucharski, & Duncan, 2010). However, Berry and Schipper (2007) reported that for the MMPI-2 the biggest differences between analogue and known-groups designs are a decrease in the cut scores used for scale F and a drop in sensitivity for Fp. Since neither cut scores nor clinical utility metrics (e.g., sensitivity and specificity) were examined in this study perhaps the use of analogue design is of lesser concern than might otherwise be the case.
Another limitation of this study is the absence of appropriate clinical comparison samples. If the purpose of this study had been to examine and compare cut scores and clinical utility metrics, then this limitation would have been a fatal flaw in the study design. However, given the purpose of this study, the biggest drawback of not having appropriate clinical comparison samples was the inability to use clinical scale profile elevations in determining whether or not an individual participant had successfully feigned a disorder without being identified as an overreporter. As a consequence, at least some participants identified in this study as having produced valid test protocols may have produced clinical scales that did not at all resemble the disorder they were instructed to feign, perhaps limiting the practical implications of their undetected overreporting.
The use of a low probability chance to win a modest cash prize as motivation to put forth strong effort in feigning psychopathology without being detected as an overreporter is another limitation. In the real world, people who make a decision to feign psychopathology are likely to have stronger incentives motivating their behavior if not higher probabilities of success. It is unknown how motivated participants in this study were to put forth their best effort, but it seems unlikely to approach the same degree of motivation as might be found in the real world.
The absence of a check on the adherence to the experimental manipulation used in this study is an additional limitation. Perhaps a posttest interview or questionnaire asking participants about the instructions they received and how closely they adhered to those instructions could be included in future studies. Without such a manipulation check, it is impossible to know how many participants in this study actually attempted to follow the instructions, how motivated those participants who wanted to follow the instructions were to put forth a strong effort, how well the instructions were understood by those who did attempt to follow them, and if there were any differences between experimental conditions in terms of their motivation, adherence to, and understanding of the instructions.
There are many future directions researchers interested in noncredible overreporting could examine to build on the results of this study. Schizophrenia, PTSD, and GAD were chosen for use in this study because collectively they provide broad coverage of symptom manifestation (e.g., thought dysfunction, mood dysregulation, and physiological symptoms), and individually, these disorders have distinct etiologies, symptom presentations and severity, and expected degrees of functional impairment. Therefore, it was felt that these three disorders provided a broad test of the moderating influence of specific psychiatric disorders on detecting coached overreporting. However, especially given the differing responses the validity scales had to the moderating impact of disorder on coaching, the disorders examined in this study by no means provide a comprehensive understanding of this issue. Therefore, future research might explore the moderating influence of other specific disorders (e.g., depression) on the impact of coaching.
Another set of variables likely to mediate or moderate the impact of coaching are coaching variables themselves. For example, characteristics of who is doing the coaching (e.g., self-coaching vs. a lawyer conducting the coaching vs. a psychologist providing the coaching) may affect the effect of coaching. The duration of coaching as well as the quantity and quality of information that the coaching is based on are also strong candidates for variables that might influence the effectiveness of coaching. Future research might also fruitfully examine population characteristics (e.g., education) and other factors that could moderate or mediate the impact of coaching on overreporting. Perhaps of special interest would be a closer examination of the impact of motivation on overreporting, which could conceivably even affect the detection of overreporting in a more complex manner such as mediating the moderating effect of psychiatric disorder on coaching. Researchers might also use research techniques besides experimental design to examine many of these questions. Indeed, a carefully designed meta-analysis of the literature might yield valuable insight into the mediating and moderating influences of a number of different variables.
Future researchers would also be well advised to examine any or all of these issues using the Minnesota Multiphasic Personality Inventory–2–Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008). The MMPI-2-RF is a relatively recently developed multiscale inventory that uses a subset of 338 items taken from the MMPI-2 item pool to measure a broad array of clinically relevant response styles, personality characteristics, and manifestations of psychopathology. The MMPI-2-RF includes nine validity scales designed to measure a variety of response styles that might threaten protocol validity. The MMPI-2-RF’s validity scales include revised versions of the MMPI-2 validity scales examined in this study. A sizable number of publications have already appeared in the literature introducing and examining the psychometric properties and clinical utility of the MMPI-2-RF’s validity scales (e.g., Gervais, Ben-Porath, Wygant, & Green, 2007; Gervais, Wygant, Sellbom, & Ben-Porath, 2011; Handel, Ben-Porath, Tellegen, & Archer, 2010; Marion, Sellbom, & Bagby, 2011; Rogers, Gillard, Berry, & Granacher, 2011; Sellbom & Bagby, 2010; Wygant et al., 2011). Initial investigations into these scales appear very promising; however, a need to gather additional empirical evidence remains. In line with the emphasis of this study, there has not yet been an examination of factors that might moderate or mediate the ability of the MMPI-2-RF’s validity scales to detect response bias.
Footnotes
Authors’ Note
Portions of this article were presented in 2005 at the 40th Annual Symposium on Recent Developments on the MMPI-2/MMPI-A, Fort Lauderdale, Florida and the 2005 Annual Meeting of the Society for Personality Assessment, Chicago, Illinois.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
