Abstract
Academic publishing gender gap has been surprisingly under covered across all disciplines and over a longer timeframe. Our study fills this gap, by analyzing how the proportions of women authors change in academic publications over 20 years in all fields from 31,219 journals from 2001 to 2021. Our results indicate that the ratio of female to male authors keeps increasing steadily across disciplines. The increases are field-neutral—in other words, they are not bigger, for example, in science, technology, engineering, and mathematics, in spite of multiple initiatives focusing specifically on STEM. The increases are also decelerating in time, which could suggest that the equilibrium of female to male authors may be plateauing. Finally, although the within-field gender gap is decreasing, it actually widened between fields. Thus, our results have major consequences for science policy in the area of the gender gap.
Introduction
The academic realm has historically been characterized by systemic gender disparities that manifest in various ways. Authorship in academic publications, a tangible representation of contribution and acknowledgment, has long been a focus of gender studies across various disciplines. Over the past few decades, considerable efforts have been made to bridge the gender gap in academia, particularly in Science, Technology, Engineering, and Mathematics (STEM) fields, which historically showed vast disparities. However, whether these efforts are translating into a more equitable distribution of authorship remains a pertinent question.
To truly understand the trends and dynamics of gender representation in academic publishing, a comprehensive, long-term, cross-disciplinary examination is required. Past research efforts have either been confined to specific disciplines or limited time frames, offering only a partial view of the larger picture. In this paper, we seek to expand this scope, providing a panoramic view of gender distribution in academic authorship over a span of two decades across diverse fields, drawing from an expansive dataset of 31,219 journals. This effort not only lends insights into the current state of gender distribution in academic publishing but also offers predictive cues for the future, underlining the critical areas that need policy intervention.
Background
There Is a Well-Documented Gender Gap in Academic Publications
A decade ago, Nature initiated a discussion on its own sexism, by reporting that only 19% of its Comment and World View articles included a female author, and only 14% of reviewers were women (Editorial, 2012). Some limited progress has been made since then: for example, in 2015, 22% of authors were women (Editorial, 2017). As other studies show, women scholars suffer from funding gap, publish fewer articles, receive fewer citations on average, and receive less credit for their citations (Beaudry & Larivière, 2016; Bol et al., 2022; Budrikis, 2020; Kaatz et al., 2014; Larivière et al., 2013; Lerchenmueller & Sorenson, 2018; Ross et al., 2022). There is also strong gender inequality among academic editors (Liu et al., 2023). Under the growing publish-or-perish pressure, a toxic, competitive environment is making it less and less likely for marginalized groups to be fairly recognized in submissions, publications, or even a reviewer pool (Anlar & Phillips, 2023; Bell et al., 2021).
So far, most studies of gendered publication patterns focused either on specific disciplines or on short periods of time. For instance, we know that between 2008 and 2012, among the estimated 27.3 million authors, who published 5.5 million articles indexed in the Web of Science, women were a little over 29% (Larivière et al., 2013). Women’s first authorship position in six high-impact medical journals increased from 27% in 1997 to 37% in 2014, although the progress stalled after 2009 (Filardo et al., 2016). We also know that women are less likely to have their papers accepted or cited in fields such as chemistry (Schiermeier, 2019), neuroscience (Budrikis, 2020), or dentistry (Wadia, 2021), to mention just a few. However, fields differ significantly in terms of the number of authors, ordering, co-authorship practices, and productivity (Andersen, 2023; Parish et al., 2018).
The reasons for gender disparities in academic publishing are complex (Lundine et al., 2018), and many factors may play a role: for instance, male authors are more likely to use sensationalist titles, which may boost their citations (Woolston, 2020), and they are more likely to receive better reviewer scores in funding applications (Bol et al., 2022). Additionally, our perceptions of gender bias also vary. For example, women academics report that they have been disproportionately affected by lockdowns (Górska et al., 2021), although the data about publication outputs are not clear yet (Jemielniak et al., 2022; Son & Bell, 2022). Their work was also systematically undervalued (Remery et al., 2022), and the pandemic has deepened pre-existing inequalities in academia (Plotnikof et al., 2020; Özkazanç-Pan & Pullen, 2020). Women scholars also have different career paths available, which affects their overall productivity (Huang et al., 2020), and they find it more difficult to fit career shocks into the expected academic career script (Lisanne et al., 2023). It is also well-established that an implicit gender bias in research evaluation exists (Kim et al., 2022). As a result of these and many other phenomena affecting the outcome, as well as major differences between fields, it is difficult to get a full picture of the role of gender in academic publishing without solid, big data studies about all disciplines spanning a long timeframe.
Method
In our study, we decided to study the gender of authors from all academic fields, over the span of 20 years, based on citations from CrossRef, covering a total of 31,219 journals listed in Scopus. Every journal was counted in every field listed in the Scopus database—therefore, some journals are counted in multiple fields.
For each journal in the Scopus sources database, we performed a query to CrossRef (crossref.org) to retrieve 100 publications for a given journal in a specific year, for years: 2001, 2006, 2011, 2016, and 2021 (for journals with more than 100 publications, we took the first 100 articles with respect to the date of addition to the database). We decided to analyze the last 20 years and limited our data collection to 5 years spread evenly in this period due to the fact that it is a very time-consuming process that also burdens external CrossRef API. Next, we tried to determine the gender of all the authors (Berg, 2019). In order to assign gender to authors, we followed the procedure described below for each name from our database: First, we eliminated all names of authors, whose first name was one-letter long (contained only initials)—32% of all records. Second, we relied on the gender guesser Python library to establish the gender (it identified 77.7% of all names as male or female), an approach already well established (Santamaría & Mihaljević, 2018; Squazzoni, Bravo, Farjam, et al., 2021; Squazzoni, Bravo, Grimaldo, et al., 2021). If the name was found and was uniquely and exclusively male or female, this value was applied. For all remaining, unidentified names, we used the same Python library on a name with diacritic converted to ASCII (using unidecode python package) and stored the unambiguous results. Identification of gender, based on name, is the cornerstone of this research. We decided to use additional, reputable sources (Zehetbauer et al., 2022) to identify gender for as many names as possible. Therefore, in the next step, we checked the names that were further unidentified as males or females in a database collected from https://github.com/MatthiasWinkelmann/firstname-database (both simplified and the original form)—an additional 3.8% recognized. Finally, we attempted to determine the gender of the remaining names by using a database of Chinese names (Song et al., 2022): https://github.com/psychbruce/ChineseNames—0.7% of names were identified. This last database does not provide a binary classification but rather the overall number of men and women who use a given name. We chose to assume the author’s gender if a name was used by men or women in more than 80% of cases.
Statistics Describing Change Over Time of the Share of Female Authors.
avg. - average, p.p. - percentage points, 5 y. - five-year. Data regarding the share of female authors in percentage points between 2001 and 2021, relative change between 2001 and 2021 as well as in 10-year periods, and average 5-year relative increase. Fitted r shows the value of the estimated coefficient for the geometric series of the relative rate of growth.
We observed that in most disciplines, the pace of relative growth decreases over time. Therefore, to make a forecast (see Figure 1), we decided to treat this relative growth as a geometric series and find a ratio (see Table 1, column fitted r) for every discipline that best fits the data by minimizing mean square errors. We report an average of four geometric fittings per discipline with different starting points in time and different starting values (starting from 2011 and 2016 with initial growths from 11/16 and 16/21). Averaging four fittings reduces the variance of fitted r value. This may give reasonable predictions for fields with r lower than 1 but not very good ones otherwise. Therefore, one should interpret the predictions very cautiously for disciplines with the highest fitted coefficient r. Share of female authors over time across disciplines. Each circle represents the observed share of female authors for every discipline for years 2001–20221 and our forecast for years 2026–2046. A vertical line representing parity between male and female authors is added for 0.5.
It is also worth noting that the gender parity implications of the trends are further complicated by the relative distribution of higher impact versus lower impact publications, and we are not accounting for the journals’ prestige.
Results and Discussion
The first conclusion of our study is that the gender gap in academic publishing remains highly discipline-related. In 2021, we observe the lowest share of women authors in STEM fields, with Mathematics and Physics and Astronomy having a share of women authors below 22% and Energy and Engineering slightly above 24%. On the other hand, the highest share of women is present in health-related fields, with Psychology and Nursing being the only ones having a ratio above 50%. Social sciences with Arts and Humanities are among the fields with the share of female authors above 40%. This result is not surprising but confirms the expectations on such a large dataset across all fields.
In the studied period, we observe an increase in the share of women authors in all disciplines. This conclusion holds in almost all fields and chosen time intervals (2001–2006, 2006–2011, 2011–2016, and 2016–2021): for 108 changes (each discipline in one of the four intervals), 97.2% are increases of the share of women authors. It means that the increase in the proportion of women authors in academic publishing is stable, continuous, and happening across all disciplines (see Figure 1). It is worth observing that the percentages may be the result of gender imbalances in PhD production and scientist hiring as much as it is the result of authoring.
Before we describe more results, we would like to point out that two disciplines can be considered gender gap-free and in a relatively steady state. In Health Professions and Nursing, not only is the share of female authors close to or above 50%, but also the growth of the share of female authors (in both relative and absolute terms) is close to zero in analyzed years.
Although we see that the mean of the relative growth of female authors share from 2001 to 2021 in disciplines where the share was below the median in 2001 is much higher (53%) than in the group where it was above the median (30%), we can observe that the difference in average shares of female authors between the top and the bottom ends actually widened in that time. This result suggests that although the within-field gender gap is decreasing, it actually widened between fields: while within disciplines the proportions of women authors are increasing, the differences between disciplines are growing.
Different disciplines peak their increases of female to male authors’ ratio at different times. Between 2001 and 2011, we observe its highest relative change in Economics, Econometrics and Finance, Pharmacology, Toxicology and Pharmaceutics, Environmental Science, Energy, and Dentistry and between 2011 and 2021 in Neuroscience, Engineering, Computer Science, Decision Sciences, Dentistry, and Multidisciplinary.
However, the relative overall pace of change slowly decreased over time. Across all disciplines, average relative change was equal to 11.1%, 10.5%, 8.5%, and 6.0% in analyzed time intervals (06/01, 11/06, 16/11, and 21/16). Additionally, out of 26 disciplines, only 4 had relative pace of growth higher in the second decade than in the first. However, only for Computer Science, we can call the difference noticeable (24.5% increase in the second decade compared to 15.5% in the first), whereas for Neuroscience, Health Professions, and Nursing the relative growth was higher in the second decade but not more than 1.5 ppt.
When analyzing the relative growth in 5-year periods from 2001 to 2021 as a geometric series, we fitted the coefficient (r) for every discipline (see Table 1). In all but four disciplines (Computer Science, Neuroscience, Health Professions, and Nursing), the fitted coefficient is below 1. This additionally confirms that, indeed, the relative pace of growth of the share of female authors steadily decreases. For 9 out of 13 disciplines with the lowest share of female authors in 2021 have fitted r below 0.9. Therefore, there is no fast closing of gender gap in the fields where the gap is the biggest. Furthermore, there are 8 disciplines with fitted coefficient below 0.8. This clearly shows that the process of closing the gender gap is not only slowing down but can slow down to a halt in multiple fields.
Conclusions
All those observations together let us conclude that the change in the share of women in all fields occurs at a similar relative pace and that reducing the gender gap in academic publishing is, actually, field-neutral. Consequently, the gap between fields widened in absolute terms. This further suggests that, in spite of many efforts, STEM interventions have not yielded progress beyond the (diminishing) positive change in academic publishing generally.
The relative overall pace of reducing the gender gap slowed down over time. If the pace of deceleration does not change, the gender gap will stop closing in about 25 years, and we may never achieve gender equilibrium in some disciplines. Making a precise forecast for the changes in the share of female authors in science is a difficult if not impossible task. Still, although one should interpret such forecasts with caution, the prevalence of this conclusion across the disciplines suggests that it might come to fruition if other factors are not at play. We decided to include the forecast in our analysis more as a means for putting our results in additional context, rather than making a prediction for the future. We observed that in most disciplines, the pace of relative growth decreases over time. Therefore, we decided to treat this relative growth as a geometric series and find a ratio (see Table 1, column fitted r) for every discipline that best fits the data by minimizing mean square errors.
According to other studies, closing the gender gap is not likely to happen without additional interventions (Holman et al., 2018), and new research policy changes. Our study shows that the interventions in specific fields so far have likely not brought major results, as the growth of women’s share in authorship is quite stable across all fields, and not visibly correlated with any targeted efforts.
In order to slay the “unbeatable seven-headed dragon” of gender inequality (van den Brink & Benschop, 2012), a more targeted approach is clearly needed, and relying on data is key in not dispersing our efforts (Ceci et al., 2023).
Footnotes
Author Contributions
DJ led the literature review and writing up, as well as contributed to interpretation. MW collected the dataset, conducted the data analysis, and contributed to the literature review, to writing up, and to interpretation.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Code Availability
Our code is available upon reasonable request.
