Abstract

All good stories should have a beginning, a middle, and an end. The related pair of articles that earned me a place in this Special Section, “Masculine Instrumentality and Feminine Expressiveness: Their Relationships with Sex Role Attitudes and Behaviors” (Spence & Helmreich, 1980) and “Instrumental and Expressive Traits, Trait Stereotypes, and Sexist Attitudes: What Do They Signify? (Spence & Buckner, 2000), represent the middle and the end of a story—at least the end as far as my involvement is concerned. To tell this evolutionary tale, I must present its beginnings, so long ago, to explain how it came to be. But first I will provide a necessary preface about other changes, both societal and personal, that is, how it used to be. Please find the original articles at pwq.sagepub.com/content/5/2/147 and pwq.sagepub.com/content/24/1/44.
A Bit of History: How it Used to Be
I entered graduate school in the fall of 1945, just a few weeks after the end of World War II. The immediate postwar years, that is, the rest of the 1940s and the 1950s, were spent, in the phrase of the time, “getting back to normal.” Rosie the Riveter and her sisters were widely expected to relinquish their jobs to men and to go back to being secretaries; willingly or not, many of them did. Similarly, working women with children who were the heads of their households during the war years often reverted to being stay-at-home housewives when their husbands returned. Many who married in the immediate postwar years were busy establishing families. Time-honored traditions separating men and women were back in place for many Americans, and all appeared to be serene.
Men and women with advanced degrees in psychology also expected to have different careers. A fair number of women were earning PhDs during the war and in the immediate postwar period, most in clinically oriented fields. But, as before the war, few women with academic aspirations could hope to find positions except in women’s colleges. Those would-be academicians who married psychologists with academic appointments were faced with anti-nepotism rules that further limited their employment. A number of women who managed to achieve outstanding research careers initially did so as their husbands' research assistants, sometimes paid, sometimes not. I was one of the lucky ones. When I received my degree in 1949, I was hired “as an experiment” at a major university and had an established research program before I married and first encountered the nepotism problem.
Aside from certain subfields such as clinical psychology and child development, mainstream psychology had paid very little attention to gender and gendered phenomena. A search of Psychological Abstracts through the 1950s reveals only a small handful of entries related to gender. Almost all investigators during this period were, of course, male. Later analyses of published studies involving undergraduate students showed that, quite often, only males were included as participants even when women were available and there was no discernible reason for their exclusion.
Thus, although the surface appeared calm during the postwar period in the United States at large, rumblings of discontent among women could be detected by those who listened. These rumblings were given full voice by Betty Freidan in her blockbuster 1963 book The Feminist Mystique, in which she described “the problem that has no name,” that is, the myth of the happy, fulfilled, suburban housewife. It quickly became apparent that many women were ready to rebel and to support the newly emerging feminist movement. The cause of women was quickly swept up in the larger Civil Rights movement that in the 1960s became a powerful force in the country as a whole and led to supportive federal legislation. Colleges and universities were beginning to discover the existence of members of the “second sex” in their midst who were worthy of being hired; anti-nepotism rules that had worked against women were beginning to crumble.
Research-oriented psychologists, particularly women psychologists, were quick to take advantage of these societal upheavals. A few articles related to gender began to appear in the 1960s, and by the 1970s the trickle had become a flood. Several new journals were founded to accommodate the outpouring, most notably Psychology of Women Quarterly whose first 35 years these Special Sections are designed to commemorate.
This new field of gender studies, like feminism itself, was not met with universal enthusiasm. Such studies were regarded with suspicion in some quarters as a fad, as trivial, or as at best, second rate. Young women early in their careers who chose to follow this line of inquiry did so at some risk. Establishing such investigations as “respectable” and able to hold their own in journals not specifically confined to gender-related issues has taken time and can still be something of a challenge. Although I was hardly young at the time, one might still ask how, in that less than hospitable climate, I quite suddenly became involved in gender research.
How Did a Nice Girl Like Me …?
I stumbled into this newly emerging field of gender studies and the psychology of women in the early 1970s when I was on the faculty of the University of Texas at Austin (where I remained until I retired in 1997). My life at the time was in chaos. I was finishing up a major project that had nothing to do with gender and had no idea where to go next. My husband had recently died after a brief illness. Cruelly, the barrier that had kept me from an appointment in the department of psychology was now gone. I soon moved to psychology from my position in the department of educational psychology in the College of Education to face a new set of challenges. Thinking about my future research was not one of my top priorities.
One day I happened to be reading an article by two of my Texas colleagues, Elliot Aronson and Robert Helmreich, reporting an experiment in which judges were shown either a video of a highly competent male student purportedly being interviewed for a job or a video of a less competent one. As the authors predicted, the competent student was liked better than the less competent one, and liked even better in a version in which he ended the interview with a pratfall—a clever study, I thought, with counterintuitive results.
Apparently, in the phrase of the day, my consciousness had been raised by the events in the larger world. It suddenly struck me that not only were the stimulus persons male, but so were all the student judges, a feature unmentioned and unexplained by the two male investigators. The question that immediately popped into my head was “But who likes competent women?” Having no new research projects to command my attention, I went to Bob Helmreich with the suggestion that it would be fun to do a little study to find out.
The “little study,” however, kept growing in complexity. Among other things, it occurred to me that people’s beliefs about gender roles might influence the outcomes. So we developed an attitude scale, dubbed the Attitudes toward Woman Scale ([AWS]; Spence, Helmreich, & Stapp, 1973) which as it has turned out, has gone on to a long life of its own. The results of the “Who Likes Competent Women?” study, later published (Spence & Helmreich, 1972), were complicated and intriguing. The upshot was that I was hooked. A whole new field of research was opening up under the feminist umbrella and after some serious thought, I decided I wanted to be part of it.
The Beginning: Spence, Helmreich, and Stapp (1975)
At last, I now am ready to describe the beginning of my story with the first of a series of interrelated studies in which I was to engage over the next 25 years. Spence, Helmreich, and Stapp (1975) belonged to one of the first generation of investigations questioning traditional views about the fundamental psychological differences between men and women, views widely accepted both inside and outside psychology. Studies demonstrating stereotyped beliefs about differences between the characteristics of men and women had already been conducted. When participants were asked to rate themselves as well as the typical man and woman, the results suggested that these stereotypes had some validity. But questions remained about the source of these self-perceptions and their importance to our understanding of a broad range of gender phenomena. We therefore centered our initial inquiries around our new measure, the Personal Attributes Questionnaire (PAQ).
For reasons that will become apparent, I first present a barebones description of the study and its results, largely bereft of the theoretical context in which we embedded them. I follow this account by remarks on what we subsequently recognized were common conceptual confusions that we helped to perpetuate and later attempted to correct.
In preliminary work, we drew a number of items from the extensive list of ‘‘masculine’’ and “feminine” trait stereotypes developed by Rosenkrantz, Vogel, Bee, Broverman, and Broverman (1968) and obtained ratings from several samples of college students of both the typical and the ideal male and female student. We then selected a shorter list of masculine and feminine items that were rated as socially desirable to some degree in both men and women as well as more characteristic of the typical male or the typical female. Self-ratings on these two sets of items made up the PAQ.
In the study itself, our measures included the PAQ, ratings of the typical student of each gender on the same items, and two other inventories: the Attitudes toward Women Scale (AWS) as a measure of beliefs about appropriate roles for men and women and a measure of social self-esteem. As expected, the male and female students differed significantly in the predicted direction in their self-ratings on the PAQ items. In their ratings of the typical male and female student, both men and women also exhibited significant stereotypes on all the items. Nothing exciting here; they merely confirmed what several previous studies by others had suggested. Further analyses, however, provided several surprises.
We examined the proposition, long accepted as a given, that the wide array of “masculine” and “feminine” characteristics (i.e., attributes and behaviors that distinguish normatively between men and women in a given society) are incompatible. The possession of one to any degree, it was presumed, tends to preclude possession of the other. Together they constitute a bipolar dimension with feminine characteristics (and most women) falling at one end and masculine characteristics (and most men) at the other. Reflecting this bipolar presumption, stereotype ratings on our masculine and feminine items were negatively related. In both genders, the relationship was particularly strong among those who were traditional in their gender role attitudes (AWS scores).
As is well known by now, the results of the self-ratings on the PAQ were radically different. In both men and women, the correlations between their perceptions of their own masculine and feminine characteristics were not only very low in value but also positive rather than negative in sign. Furthermore, correlations of the PAQ scales with the AWS were close to zero. At the time, such findings came as a revelation.
Our self-esteem measure also produced outcomes that did not support conventional wisdom. Contrary to the view that possession of gender-congruent attributes and absence of cross-gender attributes were associated with higher self-esteem and other indices of mental health, our data indicated a substantial positive correlation with the PAQ’s masculine scale not only in men but also in women. Furthermore, a smaller but still significant positive correlation with the feminine scale emerged not only in women but also in men. These findings also created quite a stir among those who were eager to challenge traditional “truths.”
Some New Insights: Masculinity, Femininity, and Sex Roles
Earlier I confessed that our initial article helped perpetuate several conceptual confusions endemic to the field. First, echoing the title given by Rosenkrantz et al. (1968) to their list of trait stereotypes, we mistakenly described both our stereotype measure and the PAQ as assessing “sex-role attributes.” In this we were joined by Sandra Bem (1974), whose well-known questionnaire is similar in content to the PAQ: indeed, her measure is titled the “Bem Sex Role Inventory” (BSRI). However, traits are not roles, gender-related or not. Describing “masculine” and “feminine” traits as “sex roles” confounds two different sets of phenomena.
In re-reading our article, I find it a little embarrassing that in the Discussion we mentioned that our PAQ scales were heavily loaded with items describing personality traits often identified as instrumental and expressive, or as tapping the agentic–communal distinction proposed by Bakan (1966), but without recognizing that we inappropriately gave them the “sex role” label. We did this despite the fact that we included in our study the AWS, our home-grown measure of sex-role attitudes, and we knew (or should have known) the difference between traits and roles.
After both Bem with the BSRI and we with the PAQ demonstrated that the masculine and feminine scores on our respective instruments were essentially uncorrelated, we each went on to refer to the two scales as measuring two independent global concepts: masculinity and femininity. The implicit premise behind this assertion is that the full assortment of feminine and of masculine attributes, attitudes, and behaviors each contribute to a single monolithic factor. Individuals who are masculine or feminine in one psychological domain are likely to exhibit the same degree of masculinity or femininity in all other domains. What specific characteristics are used to identify an individual’s degree of masculinity and femininity in a given instance is of no matter. Yet we had no evidence to support this sweeping premise. We had simply taken its truth value for granted, just as many of our predecessors had assumed without question the validity of an earlier premise: masculinity and femininity are end points of a single hypothetical dimension (Constantinople, 1973). In time, our findings led us to question both presumptions and to doubt the utility of the concepts of masculinity and femininity themselves.
PAQ Versus BSRI
Over the course of our investigations we had become increasingly aware not only that the original PAQ was heavily made up of gender-differentiating instrumental and expressive personality traits, but also that these and other similar distinctions had often been regarded as one of the most basic and consequential psychological differences between men and women. Accordingly, we revised our instrument so it contained a fewer number of items on each scale, all of which were either instrumental or expressive. We also adopted the position that gender phenomena are multidimensional in nature and that the PAQ measured only what its manifest content indicated. At the same time, we contended that the implications of these two trait dimensions were, per se, important to discover. It was this perspective that guided our subsequent research.
In contrast to this limited perspective, Bem (1974) proposed in her initial article—and continued to pursue—a global theoretical model, operationally tied to BSRI scores. Her theory basically represents a different take on the old monolithic bipolar conception of masculinity and femininity. (To make life simpler, I hereafter refer to the two BSRI scales as M and F scales, a practice we adopted in our later articles.) After finding the M and F scores for each respondent, she combined the two scores essentially by subtracting one from the other. Men falling at the masculine extreme (M > F), she proposed, and women falling at the feminine extreme (F > M) are sex-typed, prone to act in accord with the behavioral standards expected of their gender and to inhibit those behaviors associated with the other gender.
Her attention was focused on respondents falling at the middle of the difference-score distribution, those who have approximately equal scores on the two scales, whatever their level. These “balanced” individuals, she postulated, were non-sex-typed and therefore flexible in their self-concepts and behaviors. Thus, they might be “both masculine and feminine, both assertive and yielding, both instrumental and expressive—depending on the situational appropriateness of these various behaviors” (Bem, 1974, p. 153). She further suggested that it is the non-sex-typed who are emotionally healthier.
Research centered on the PAQ and BSRI thus generated two sets of hypotheses which created a fair amount of controversy and left a number of onlookers confusing the two. Bem’s very appealing and far-reaching theory about sex-typing or its lack, defined by difference scores on the BSRI, attracted more attention. Our less ambitious contention that gender phenomena are multifactorial in nature so that one cannot automatically generalize from one domain to another and that the BSRI (as compared to the revised PAQ) is simply a flawed measure of certain desirable instrumental and expressive personality traits held less excitement.
The Middle: Spence and Helmreich (1980)
In our 1980 article, we reviewed the fairly substantial literature involving the PAQ and BSRI that had already accumulated since the appearance of the two studies introducing these inventories. Our particular interest was in evaluating the rival hypotheses this pair of instruments had generated. We reported the outcomes of several factor analyses of the revised PAQ that, not surprisingly, produced two orthogonal, unidimensional factors in each gender—one consisting of instrumental traits (I) and the other of expressive traits (E).
In contrast, the BSRI yielded a 4-factor structure in both men and women. Although the BSRI is heavily loaded with desirable instrumental and expressive traits, it contains other kinds of items as well so that the factor analyses reflected this diversity. The most theoretically suggestive was a bipolar factor made up of two items, the adjectives “masculine” and “feminine.”
We examined evidence from both the BSRI and PAQ bearing on Bem’s flexibility hypothesis—the expectation that balanced individuals with approximately equal scores on the M and F scales were non-sex-typed and, as such, would manifest both masculine and feminine behaviors according to situational demands. Relatively few studies have tested this prediction, and those that have found mixed results. Her prediction that respondents with balanced scores would be psychologically healthier has been disconfirmed. In retrospect, I realize that the problem with Bem’s theory may lie less in her ideas about sex-typing, considered in the abstract, than in linking her measure of the sex-typing dimension to scores on the BSRI. It remains a theory yet to be adequately tested.
More broadly, our review of the literature led us to reaffirm our conclusion that instrumentality and expressiveness, as measured by the PAQ and largely by the BSRI, have little or no relationship with gender-related attitudes and behaviors that are not influenced by these trait dimensions per se. The implications of these personality dimensions, we concluded, could be pursued most profitably by being disentangled from global and unverified notions about sex-typing and masculinity and femininity.
This restricted conclusion about what the PAQ measures, I should quickly point out, does not imply that instrumentality and expressiveness have little significance for gender issues. Role-related attitudes and preferences, for example, can join together with these trait dimensions to influence life choices. On the one hand, a woman high in expressiveness might choose nursing as a profession not only because it appeals to this aspect of her personality but also because she believes that nursing is an occupation more suitable for a woman than a man. On the other hand, a man high in expressiveness might shy away from nursing for the same reason and choose being a school counselor instead.
The End of the Story: Spence and Buckner (2000)
The theoretical climate in which we wrote the final article in this set of three had changed markedly in the 20 years following our 1980 review. Psychology had laid to rest the old imperialistic concepts of masculinity and femininity. Demonstrations that gender-related phenomena fall into multiple classes with complex interrelationships had opened up profitable new lines of research. Once lively theoretical controversies in which we had been enmeshed no longer needed to be addressed.
Our final article reported an investigation that was an expansion of our original 1975 study. Some of the students who took part in this follow-up might well have been the children of those who took part at the same university some 25 years before. In the intervening years, a good many changes had taken place in society at large. Our concern was how the results would compare with those we found in our original study.
In our battery of questionnaires, which we gave to two large samples of undergraduates, we obtained self-report ratings and stereotype ratings of items on the PAQ’s I and E scales plus all the items on the BSRI describing desirable instrumental and expressive traits. We also included the two BSRI items, “masculine” and “feminine,” which we scored separately. To these trait measures we added the AWS and several more recently developed measures of role-related attitudes and behaviors: Modern Sexism (Swim, Aikin, Hall, & Hunter, 1995), Benevolent Sexism and Hostile Sexism (Glick & Fiske, 1997), and the Male Female Relations Questionnaire ([MFRQ]; Spence, Helmreich, & Sawin, 1980). The latter is a little-known measure we had earlier devised to assess unmarried students' tendency to purposely adopt “masculine” or “feminine” role behaviors to impress a date.
Most of our original findings were replicated; some were not. Most of the outcomes were expected; others were not. We anticipated that because of societal changes, women’s self-perceptions of their instrumental characteristics would now be closer to men’s. In confirmation of this expectation, we found nonsignificant differences between the men and women in both samples (and significant differences in both samples on another 41% of the items). This outcome was a striking change. We could only speculate post hoc about the differences between the two sets of items. The outcomes clearly suggested that a finer analysis of traits falling into the instrumental category might reveal that women and men continue to differ in some subclasses but not in others.
More to our regret than surprise, men continued to rate themselves significantly lower than women on all of the expressive items. The results of the stereotype measures were also disappointing. Except for one BSRI item on each scale, significant stereotypes continued to be found on all items in both men and women in both samples. Stereotype ratings of the I and E traits were moderately negatively correlated in all comparisons, further suggesting that in the public at large, the popular assumption that “masculine” and “feminine” traits are incompatible still had some life. However, other results suggested that something more was going on in men.
Based on prior data, we expected that there might still be little or no relationship between I and E self-ratings and our several role measures, as well as between stereotype ratings and the role measures. This pattern was indeed the case in women but not in men. In both samples, self-ratings on the expressiveness items were modestly but significantly negatively correlated with the AWS and MFRQ; that is, men higher in expressivity tended to be more egalitarian in their gender role attitudes and less likely to role-play on dates.
The reverse pattern of correlations was found in both samples of men between their scores on the AWS and the MFRQ and their ratings of instrumentality in the typical male student: those who perceived the typical male student as higher in instrumentality than did their peers tended to be less egalitarian and more prone to role-play on dates. On the optimistic assumption that these were reliable findings (closely replicated outcomes in two large samples), we offered speculations about what these two sets of findings might imply. I admit, however, that as intriguing as they are, all that is safe to conclude from all our findings described so far is that tracking gender relationships over time has value.
Self-Ratings on the Adjectives “Masculine” and “Feminine”
The terms “masculinity” and “femininity,” along with the words”masculine” and “feminine,” remain popular terms in the media and everyday speech. Once my own thinking about these constructs was clarified, I began to recognize how rarely attempts have been made to define the words “masculinity” and “femininity,” even in dictionaries. Why this failure? Obviously, they have meaning to those who employ them, but what is it?
Years ago (post-enlightenment), I became concerned with these questions and attempted to find answers (e.g., Spence, 1984). I suggested that in everyday usage, these gender-related terms have two major types of meanings. The first is empirical; they are labels people use to describe observable qualities, objects, or behaviors that are widely perceived in a given culture to distinguish one gender from the other. They are often used loosely, but the information these terms are meant to convey can usually be inferred from the context in which they occur. For example, “I decided to skip the blue jeans and go for the feminine look at tonight’s party” or even “Alfie just radiates masculinity.”
In other contexts, people’s references to “masculinity” and “femininity” may have an alternate (or additional) meaning: they imply the presence in themselves or others of some basic, underlying essence. This essence, I suggested, is people’s ineffable sense of their gender identity, a psychological sense that for the vast majority of women and men is a fundamental aspect of self. This sense of belonging is acquired relatively early in children’s lives and parallels their conscious awareness of their biological sex. In part because gender identity develops so early, its meaning remains unarticulated and unelaborated. From one’s phenomenological perspective, gender identity just is. In pursuing this line of inquiry, I have attempted to capitalize on the research of other investigators on gender identity and its development by bringing their findings to bear on a broader range of gender phenomena.
Including the adjectives “masculine” and “feminine” in our test battery provided a low-cost opportunity to further explore the implications of this hypothesis about gender identity. We proposed that our participants' ratings of themselves on this pair of adjectives would, to some extent, be influenced by their ratings of their own instrumental and expressive traits and their gender attitudes. At the same time, their ratings would also be heavily influenced by their basic sense of gender identity. The results were, of course, more complex than we had initially imagined, but they essentially confirmed our hypotheses.
Final Remarks
Plus ça change plus c'est la même chose (The more it changes, the more it’s the same thing)—so goes the French saying. In some ways, our findings appear to refute this observation. By the beginning of the new millennium, psychology’s conceptions of certain core gender phenomena had markedly changed and new areas of inquiry had broadened and deepened our knowledge. Our society has changed in significant ways related to gender, and some of our earlier findings changed in accordance with them. But in other ways, the saying remains dismayingly accurate. Most discouraging was the persistence of stereotypes, even among college-age students who are usually quicker to change than their parents. I suspect that is still the case, a decade later. The work on stereotype threat, which has included women as well as members of minority groups, indicates that a woman’s sheer knowledge of stereotypes may have deleterious effects on her performance that reinforce the stereotype in the eyes of others. The search for ways to modify these false beliefs and to put them into action goes on.
Footnotes
Almost all of the studies mentioned in this essay in which I was involved were coauthored. For simplicity, I will always use the pronoun “we” in referring to these studies even though my coauthors were not always the same. They are identified in the references. They have my gratitude.
