Just another clickbait title: A corpus-driven investigation of negative attitudes toward science on Reddit

Abstract

The public understanding of science has produced a large body of research about general attitudes toward science. However, most studies of science attitudes have been carried out via surveys or in experimental conditions, and few make use of the growing contexts of online science communication to investigate attitudes without researcher intervention. This study adopted corpus-based discourse analysis to investigate the negative attitudes held toward science by users of the social media website Reddit, specifically the forum r/science. A large corpus of comments made to r/science was collected and mined for keywords. Analysis of keywords identified several sources of negative attitudes, such as claims that scientists can be corruptible, poor communicators, and misleading. Research methodologies were negatively evaluated on the basis of small sample sizes. Other commenters negatively evaluated social science research, especially psychology, as being pseudoscientific, and several commenters described science journalism as untrustworthy or sensationalized.

Keywords

corpus-based discourse analysis public understanding of science science attitudes and perceptions

1. Introduction

A research agenda known as the public understanding of science (PUS) was mobilized in the late twentieth century to understand the public’s knowledge about and attitudes toward science (Sturgis and Allum, 2004). However, this agenda to “scientize” the public garnered mixed results (Miller, 2001: 116), and the traditional PUS model has been criticized for its deficiency view of the public and reliance on large-scale survey instruments (Sturgis and Allum, 2004: 59). Alternative methods adopted for PUS include more contextualized methods like case studies and interviews (e.g. Putsche et al., 2017), and emphasis has been placed on multidirectional, as opposed to one-way, models of science communication (Trench, 2008: 132). This new paradigm attributes the public’s crisis of confidence and deficit of trust not (wholly) to the public’s scientific unawareness but (also) to scientific communities’ attitudes toward the public (Bauer and Falade, 2014: 145).

In recent years, many scholars have shifted their attention to the study of online science communication (Davies and Hara, 2017). Examples of online science communication include YouTube videos, science blogs and science podcasts, and social media websites like Facebook, Twitter, and Reddit. Recently, several studies have investigated discourse at one particular Reddit forum (or “subreddit”) called r/science, especially for the ways that scientists interact with users via Ask Me Anything (AMA) posts (e.g. Edwards and Ziegler, 2022; Hara et al., 2019; Hubner and Bond, 2022; Moriarty and Mehlenbacher, 2019). These studies have been particularly interested in the types and quality of interaction found there, such as whether communication is unidirectional or multidirectional (e.g. Lai et al., 2020).

However, while r/science AMA posts represent intriguing examples of scientist–public interaction in an online setting, these posts make up a very small portion of the overall discourse at r/science, as most posts link users to science news articles and offer a place for discussing them in the comments section (Jones et al., 2019). Moreover, due to a change in Reddit’s underlying architecture, the moderators of r/science discontinued AMAs as of 2018 (Edwards and Ziegler, 2022: 485), making the posting and commenting of science news the primary discourse there.

To “pay due attention to the defining feature” of r/science, namely “the actual engagement and discussion with redditors” (Moriarty and Mehlenbacher, 2019: 522), the current study adopts a corpus-driven discourse analysis (Baker, 2006; Partington et al., 2013) approach to the investigation of online science communication at r/science. Against the backdrop of a so-called public mistrust of science (Wynne, 2006: 211), I adopt corpus linguistic methods to investigate the content of commenters’ negative science attitudes. As science attitudes research has traditionally been investigated via large-scale surveys (Bauer and Falade, 2014) and experimental designs (e.g. Gierth and Bromme, 2020), a corpus-driven approach and a wealth of online science communication offer an additional method for tapping into attitudes unsolicited by the researcher.

Below, I review relevant literature relating to attitudes toward science, studies of science communication on Reddit, and corpus-driven discourse analysis and evaluative language. Then, I provide an overview of the current study and its guiding research questions.

2. Literature review

Attitudes Toward Science

Since the well-known “Bodmer report” was published in 1985 (Miller, 2001), large institutions across several continents have regularly surveyed their population’s attitudes toward science and science literacy (see Bauer and Falade, 2014: 141–143). Early research was interested in measuring the public’s knowledge of scientific facts (e.g. “the center of the earth is very hot,” true, false, idk), as well as their attitudes toward science (e.g. “The benefits of science are greater than any harmful effects,” agree = positive) (examples from Bauer and Falade, 2014: 146). This approach often adopted large-scale survey instruments, wherein participants would indicate positive or negative responses via a Likert scale (Bauer et al., 2000: 30). Findings often showed consistently positive responses (Bodmer, 1985: 13–14), a trend which has continued in more recent surveys (e.g. National Science Board, National Science Foundation, 2020).

Despite these positive attitudes, friction between scientific communities and lay publics remain a concern. The UK House of Lords Select Committee on Science and Technology reported in their 2000 publication that “public unease, mistrust and occasional outright hostility are breeding a climate of deep anxiety among scientists themselves.” These tensions often coalesce around particular issues, such as vaccine hesitancy and autism (Goldenberg, 2016), COVID-19 denial (Rothmund et al., 2022), dichlorobiphenyl-trichloroethane (DDT), and bovine spongiform encephalopathy (BSE) (Sturgis and Allum, 2004: 56). Events and scandals like these have led to a so-called public mistrust of science (Wynne, 2006: 211), which remains a concern to researchers today (Putsche et al., 2017: 2).

Recent research has investigated the factors that predict such attitudes (e.g. Blank and Shaw, 2015; Gauchat, 2011; Lee and Kim, 2018; Putsche et al., 2017). While this body of research provides important contextual information for understanding science attitudes, many studies record participant attitudes via (partially or wholly) close-ended surveys or experiments. As a result, what participants contribute is mediated by the researcher. Sites of online science communication provide opportunities for examining attitudes toward science as they are expressed naturally. The social media website Reddit.com, in particular, has been the subject of several recent studies, which will be reviewed next.

Reddit as a source of online science communication

In 2020, the social media platform Reddit hosted over 220 million users in the United States alone (Statista Research Department, 2021). The subreddit r/science, just one of many on the website, describes itself as a “place to share and discuss new scientific research” (r/science, n.d.), and is the largest subreddit dedicated to science with over 26 million members in late 2021. As a result, it has been the subject of several recent studies of online science communication.

Several studies have investigated scientist–user interactions in AMAs, during which a scientist or team of scientists respond to written comments/questions pertaining to scientific topics in an online forum. For example, Hara et al. (2019) cataloged how many questions were asked by r/science users, their content, and whether they were answered, among other things. Hubner and Bond (2022) investigated how, if at all, female and male scientists engaged in AMAs differently or were treated differently by users. Edwards and Ziegler (2022) adopted a two-pronged approach, which they call “assembled” and “disassembling,” performing a content analysis of AMAs and discussing how the organization of r/science influences the kind of discussion that can be had there. Moriarty and Mehlenbacher (2019) also considered the infrastructure of Reddit and r/science and how it mediates the scientist’s performance of expertise. Lai et al. (2020) turned their attention to a similar subreddit, r/Coronavirus, and examined whether communication was one-way or multidirectional during AMAs hosted by COVID-19 scientists.

Rather than focusing solely on AMAs, Jones et al. (2019) and August et al. (2020) collected data from posts and comments made to r/science, generating large datasets. August et al. (2020) applied a bevy of machine learning techniques to learn how, if at all, the language of r/science is different from that of other subreddits, while Jones et al. (2019) interviewed r/science users, posters, and moderators and applied grounded theory to the analysis of posts and comments. An important contextual detail uncovered by their interviews is that r/science users are diverse, ranging in occupation from graduate student to science outreach professional to construction manager. The authors conclude that many r/science users are science-interested and/or science-educated (Jones et al., 2019: 2; also Edwards and Ziegler, 2022: 475).

Most studies of r/science have focused the interactions between scientist and r/science users, and no studies have examined r/science discourse specifically for user attitudes toward science. The current study seeks to contribute to the growing body of literature of online science communication by investigating science attitudes expressed in comments made to the non-AMA posts at r/science, which contribute the bulk of discourse there. To do so, it adopts corpus-driven discourse analysis, which will be described next.

Corpus-driven discourse analysis and evaluative language

The current study adopts the methods and perspectives of corpus linguistic and discourse analysis (Baker, 2006; Partington et al., 2013). A “corpus” is a collection of authentic language representing a specific domain (Paltridge, 2012: 144), usually collected and analyzed using computational tools. “Discourse” is less easily summed up, but a useful definition is the complex relationship between language and the contexts in which it is used (Paltridge, 2012: 3). Three oft-cited approaches to discourse analysis include (1) the study of language use, (2) the study of linguistic structure beyond the sentence, and (3) the study of social practices and ideological assumptions associated with communication (Biber et al., 2007: 1). While the first two approaches pay explicit attention to linguistic form, the third has a socio-cultural orientation that attempts to understand “the broader social contexts of the discourse” (Biber et al., 2007: 6). The usefulness of corpus-driven discourse analysis to the study of online science communication largely falls under this category.

Corpus-driven discourse analysis may target a range of micro (e.g. vocabulary) and macro (e.g. organization) features, but the current study focuses on the expression of evaluation, or the subjective attitudes taken toward a person, situation, or entity (Hunston, 1994: 210). Evaluative language lends itself to understanding the attitudes of the speaker or writer because attitude is arguably the central concern of evaluation (Thompson, 2014: 80). Forms of evaluation are diverse, including lexical (e.g. adverbs and adjectives), lexicogrammatical (e.g. noun + that-complement clauses), and even whole paragraphs. At its most fundamental, evaluation expresses goodness (positive) or badness (negative) (Thompson, 2014: 80), which are most clearly expressed via overt linguistic items indicating positive (e.g. thankfully, hopeful, beautiful) or negative (e.g. unfortunately, failure, lose) consequences (Partington et al., 2013: 44–45).

There is general agreement that identifying evaluation from a corpus linguistic approach often means reading through many concordance lines, often looking for patterns in commonly co-occurring language (Hunston, 2010: 57–59). For example, the verbs cause and bring about hold little evaluative meaning on their own but can pattern with other language that together paint a clearer evaluative picture (Table 1).

Table 1.

Concordance lines around the verbs cause and bring about.

CAUSE
people come to therapy because they cause pain to others and have become so and distorted build ups of calcium cause enlarged joints. Many people sufferon viruses that make eyes bleed and cause lethal diarrhoea in infants and on for bikini lines because it does not cause rashes. Erma says that when the There is nothing more likely to cause trades unions to blow a fuse over a
BRING ABOUT
people that will empower them to bring about their own liberation. Even in the twentieth century was bound to bring about a transformation of African the hope that mass action can bring about a fairer society is not just Sihanouk, as part of efforts to bring about peace. He said this matched s personal role in helping bringing about the agreement in Berlin and his. Jesus

Source: Adapted from Hunston (2010: 59).

These concordance lines demonstrate that cause can co-occur with negative circumstances, while bring about can pattern with more positive scenarios. While this approach is largely manual and thus can introduce reliability concerns, researchers can add transparency to their process by focusing on overt markers of evaluation, providing sufficient examples of concordance lines, and generally abiding by the principle that substantial concordance line review is a necessary and best practice (Baker, 2006: Ch. 4).

Partington et al. (2013: 52–53) describe different categories of evaluative lexis. One such category is denotational nouns, which without context appear neutral but alongside context may attract positive or negative evaluation. For example, the noun phrase former president Donald Trump in isolation does not express evaluation or attitude, but investigation of the contexts around this phrase in a particular corpus might reveal patterns of evaluation. For the current study, denotational nouns serve as a productive means for identifying contexts wherein evaluation of science may be present (see section “Forms and identification of evaluative language”).

In sum, the current study adopts corpus-driven discourse analysis to study online science communication and learn about what aspects of science a group of science-interested social media users hold potentially negative attitudes toward, and why. The research question that guided this study is:

Toward what aspects of science do r/science users express negative attitudes and why?

Before moving to the methods section, it is important to stress that the current study does not seek to question and then ultimately confirm a public mistrust in science, which would be particularly easy to do with a large corpus of social media comments. Rather, this study proceeds by accepting the premise that tensions can and do arise between institutions of science and lay publics and seeks to understand the content of those negative attitudes using less adopted methods for attitude research. In doing so, it may be able to contribute novel insights into the largely survey- and experiment-based body of science attitudes literature.

3. Data and methods

Corpus collection

The target discourse was discussion of scientific research found in contexts outside of formal scientific spaces (e.g. academic journals and conferences; what Hilgartner (1990: 528) would call “upstream science”). Social media, in particular, the Reddit forum r/science, was found to be a good match for this kind of discourse. Posts to r/science always include a title and a link to a variety of sources reporting on scientific research. So a script was written in the programming language Python to scrape comments made to posts at r/science from early 2017 through July 2021, resulting in about one million comments made to about 3750 posts. The data were then entered into the programming environment R (R Core Team, 2020) for cleaning and analysis.

Several steps were taken to help clean the corpus. Only posts which attracted >199 comments were retained. The corpus included few posts from 2017, so only those from 2020, 2019, and 2018 were retained; very short (<5 words) and very long (>499 words) comments were removed. No single commenter (identified via their Reddit username) contributed >5 comments to the corpus and no one source was over-represented, achieved by taking near equal random samples of comments made to the various linked sources.

The final r/science corpus consists of 7,740,287 words from 177,296 comments across 3 years, and the typical comment was about 44 words in length (Table 2). Similar to Jones et al. (2019: 6), posts represented in this corpus largely link to popular science or science news sources (e.g. inverse.com, theguardian.com) (N = 50), followed by sources which share the original research directly (e.g. thelancet.com, pnas.org) (N = 31) and university press releases (e.g. news.osu.edu, cam.ac.uk) (N = 19). A list of all sources can be found in Supplemental Materials A.

Table 2.

Summary statistics of the r/science corpus.

Year	Number of comments	Mean number of words (SD)
2018	45,448	45.7 (50)
2019	62,049	42.6 (48.2)
2020	69,799	43.3 (48.4)
Total	177,296	7,740,287

SD: standard deviation.

In presenting excerpts in section 4, I provide a shorthand reference to which of these categories the comment comes from. Specifically, I consider comments to sources linking directly to original research as [prof sci] (i.e. “professional science”), and comments to all other sources as [pop sci] (i.e. “popular science”).

Forms and identification of evaluative language

As noted earlier, I searched for what Partington et al. (2013: 53) call denotational nouns, or nouns that pick out something in the world but do not on their own express evaluation. Specifically, I selected denotational nouns referring to science and science communication. To do so, a keyword analysis (Scott, 1997) was first carried out. Keywords refer to words that are found to occur more frequently than expected in one corpus compared to another. A comparison corpus of general language at Reddit from six subreddits totaling ~39 million words was collected (see Supplemental Materials B). The software AntConc (Anthony, 2021) was used to generate a keyword list using a log-likelihood test statistic and Bonferroni-corrected .05 p value. All resulting keywords were reviewed for relevant denotational nouns to be investigated.

In analyzing selected denotational nouns for evaluation, I utilized collocational and concordance analysis. Collocations refer to words that appear near one another more often than their individual frequencies would predict (Baker, 2006: Ch. 5). This study adopts a modified version of Mutual Information (MI) score called MI2. MI2 measures the strength of association between a node word (the searched term) and a collocate (any word appearing around the node). The formula used to calculate MI2 is (Brezina, 2018: 72):

1. MI2 score

= \log_{2} \frac{observed {frequency}^{2}}{expected frequency}

where

2. Observed frequency

= frequency of occurrence of node word together with collocate

3. Expected frequency

= \frac{node frequency \times collocate frequency \times window size}{number of words in corpus}

MI2 adopts a corrected version of expected frequency where window size, or the number of places on either side of a node word that a collocate can appear in, is taken into account. In doing so, MI2 corrects for MI’s bias for very-low-frequency collocates (Brezina, 2018: 74). Nonetheless, a minimum frequency threshold of 3 was put in place for all collocational analyses in this study, and the default window size was 4 to the left and 4 to the right of the node. However, because evaluation is not always found within a certain window, concordance analysis, or the reviewing of short excerpts of text around a queried term, was adopted to identify attitudes in a broader context. A program was written in R to identify collocates and produce concordance lines.

Before presenting the results, it is worth highlighting that the identification of attitudes here was qualitative rather than quantitative, as there was no attempt to quantify the frequency of any attitude/evaluation. For this reason, the attitudes presented below should be taken as suggestive and exploratory.

4. Results

Keywords

From the initial list of ~4000 keywords, ~50 denotational nouns relevant to science communication were identified (Table 3).

Table 3.

List of denotational nouns—keywords.

Denotational nouns related to science communication

(1) General: stud(y/ies), science(s), article(s), researcher(s), scientist(s), academic(s), academia

(2) Methods: data, theory, sample, experiment(s), survey(s), methods, stats, statistical, correlation, empirical, measurement, framework, outlier(s)

(3) Implications & limitations: result(s), conclusion(s), finding(s), limitation(s)

(4) Field: psychology, engineering, physics, biology, economics, humanities

(5) Source: NIH, journal(s), news, PubMed, Lancet, universit(y/ies), PLOS, NEJM, ScienceMag, Newsweek, Facebook

(6) Other: read, headline(s), fund(ed/ing), reporting, expert(s), journalism, lay(man/men/person), journalists, citation(s), title, specialist(s)

Keywords within each category are ordered by decreasing keyness.

Keywords were grouped thematically into six categories and analyzed for potential negative attitudes. Below, discussion of the most relevant findings is presented in two sections. Section “Negative evaluations of professional science” reports on a selection of keywords from categories (1)–(4), which generally report on attitudes aimed at what may be called “professional” science, including scientific fields, scientists, and their research. In section “Negative evaluations of popular science,” I present a selection of keywords from categories (5) and (6), which focus on what may be called “popular” science, including the reporting of science through online media.

Negative evaluations of professional science

Science and scientist(s)

The keywords science and scientist(s) were of particular interest, as they relate closely to the research questions of this study. To get a better sense of these keywords, all concordance lines for science and scientist(s) were reviewed for negative evaluation. Comments relating to three common evaluations were identified, namely that scientists are corruptible, poor at communication, and can be misleading when sharing their research with the broader public.

Complaints about the fallibility of science and scientists were fairly common. Frequently, such complaints rebutted the idea that science is methodological, unbiased, and objective: positive characteristics which were frequently attested in other comments. Varieties of corruptibility include financial corruptibility and political bias, among others. For example, as shown in example (1), the commenter laments a shift from science “just being science,” to science as having a political agenda, leading to mistrust.

(1)Science used to just be science, but these days science is politics and can’t be trusted [. . .] now you have to question every scientific finding to see what political bias was used in forming the question [. . .] [prof sci]

Another theme was a perceived lack of communication skills on the part of scientists, including a difficulty with communicating science to laypeople (2) and treating the general public with respect (3).

(2)[. . .] I hate how many times “cancer has been cured!” and then I’m left trying to explain why that isn’t true to my entire family. . . I just wish scientists did a better job communicating in bite sized nuggets for lay peoples. [prof sci]

(3)[. . .] Scientists need to engage with communities beyond simply educating them [. . .] Overall, scientists need to put in more effort to communicate and make use of evidence from science communication literature if they are actually interested in building trust in science. [prof sci]

While not surprising, comments (1) and (3) reinforce the idea of a public mistrust in science (Wynne, 2006). Interestingly, while the mistrust crisis is reported more than two decades ago (House of Lords, 2000), the commenter in (1) suggests that, for some, mistrust has increased in more recent years.

Statistics and sample

The keyword statistics was often found alongside negative language; however, concordance lines suggested that these evaluations were frequently aimed at other commenters rather than at the use of statistics in research. This criticism toward statistical misunderstanding was also attributed to members of the general public, who were positioned as being more susceptible to misinterpreting statistics in research, as shown in example (4).

(4)[. . .] this is exactly how statistics can get misrepresented to a lay-person. [. . .] It is very important how you present statistics, as there will be many who hear of the statistic as you present it, whilst not fully understanding statistics. [prof sci]

Overall, the use of statistics was strongly attributed to good science, and an inaccurate understanding of statistics was attributed to a science consumer’s lack of training.

In contrast, the keyword sample did frequently attract negative attitudes toward science research. Table 4 presents the top 10 collocations for sample and sample size(s), with most collocations showing a preoccupation with the size of samples.

Table 4.

Top 10 collocations of sample and sample size(s).

Sample			Sample size(s)
collocate	Raw freq	MI2	collocate	Raw freq	MI2
size	443	16.84	small	125	13.22
sizes	74	14.53	bugger	3	11.31
small	154	12.73	large	39	10.08
representative	29	11.93	blah	6	9.92
large	65	10.46	larger	21	9.61
bugger	3	10.22	smaller	14	8.78
larger	32	9.74	low	22	8.28
blah	6	8.83	tiny	9	8.25
random	21	8.79	study	35	8.06

Function words were removed from the list.

Concern about sample sizes was often expressed relative to the need for large samples in order to generalize to a larger population. Example (5) portrays a typical comment, wherein the commenter reports the sample size of the study and cautions other readers against drawing conclusions from the study.

(5)Extremely small sample (N = 31). Be *very* cautious generalizing the results beyond the study. Personally, I would not have published this without a larger sample. [pop sci]

Criticism of sampling procedure, reflected in the collocations representative, random, and several others not shown in Table 4, was also present.

Conclusion(s) and finding(s)

Among keywords related to implications and limitations of research, conclusion(s) and finding(s) most often attracted criticism of research. Example collocates of these keywords expressing negative evaluation include drawing (MI2 = 11.66), jumping (MI2 = 10.3), and reaching (MI2 = 7.35). Most instances of these attitudes expressed the concern that a study’s conclusions could not be supported by the data, including the complaint that a study’s implications were exaggerated, as shown in example (6).

(6)[. . .] Public perception always assumes that “science doesn’t lie” [. . .] but reporters certainly do embellish, and researchers almost ALWAYS overreach in their conclusions (how else are we to sell papers?!?) [prof sci]

This commenter expresses not only the less surprising attitude that science news embellishes findings (see Hilgartner, 1990), but also the more surprising one that the actual authors of research also “overreach” to “sell papers,” and thus are also subject to pressures to attract readership.

The keyword finding(s) showed a greater variety of attitudes, including many neutral ones. However, several commenters cautioned against taking one study’s findings as confirmation of its implications, instead remaining skeptical until more research can replicate and confirm them (7).

(7)In short, interesting findings, but stay skeptical until more research comes out. [prof sci]

Attitudes like those pertaining to conclusion(s) and finding(s) portray r/science users as cautious and at times skeptical of research; however, they rarely portray themselves as “anti-science” but more so as science-educated, consuming research and news thoughtfully and critically.

Psychology and social sciences

In reviewing keywords related to academic fields/disciplines, it soon became clear that psychology attracted attitudes rarely found alongside similar keywords like physics and engineering, and that these attitudes extended to the bigram social science(s). Commonly expressed sentiments toward these keywords included the belief that conclusions are irreproducible and untrustworthy (8) or simply obvious (9), and that social science research is particularly susceptible to subjective bias (10).

(8)Honestly we should probably stop accepting conclusions from any psy study until the reproducibility crisis is sorted out. [pop sci]

(9)These psychology articles always have such vague premises and words, what makes a person “nice” and “agreeable.” This sub has too many of these making it to hot, more often than not the conclusions they come to are common sense. [prof sci]

(10)It really shows a disturbing confirmation bias and an agenda in these social sciences. [. . .] This is the opposite of science and should be a red flag so that the real scientific community can protect themselves from any of this leaking over. [pop sci]

Examples (8)–(10) portray attitudes at an array of levels. Example (8) delegitimatizes psychology research on the grounds that results are irreproducible; example (9) also takes issue with the conclusions of psychology research, but also takes issue with, assumedly, the motivations for such research; and example (10) defines “real” science by contrasting it with the biases and agendas of social science research.

Not all commenters were so quick to critique this area of science, however. Several commenters also highlighted what appeared to be a confirmation bias among commenters who originally leveled the critiques, as shown in example (11).

(11)It’s weird how Reddit only trusts the social sciences when it affirms their bias. Any other time you’d have hordes of people talking about sample size, cultural differences, researcher bias, etc. [pop sci]

These commenters suggest that r/science posts which link to social science research can attract predictable discourses wherein users either agree or disagree on the basis of pre-established beliefs, a pattern of communication which can foster polarization (Knobloch-Westerwick et al., 2015: 576).

Negative evaluations of popular science

Several keywords within the categories source and other related to attitudes toward popular science, which can be thought of as a range of genres that “involve the transformation of specialized knowledge into ‘everyday’ or ‘lay’ knowledge” (Calsamiglia and Van Dijk, 2004: 370). While difficult to accurately and precisely define (see Hilgartner, 1990, for a good discussion), intended audience is thought to be key to differentiating many examples of popular science from other forms (Gotti, 2014: 16). A prototypical example of popular science could be a science journalist penned news article about recently published research.

When reviewing keywords like journalism, news, and layman, it became clear that this discourse involved negative evaluation in a way that other similar keywords, like journals(s), university, and NEJM, did not. Critiques of discourse relating to popular science include the beliefs that these articles can be untrustworthy, inaccurate, and sensationalist. Example (12) below illustrates this general attitude, where the “bad” journalist is contrasted with the more trustworthy scientist.

(12)Bad science journalism has been around as long as science has been. It’s not some new “activist streak,” and you can still trust scientists (be careful with science journalists). [prof sci]

Earlier in example (1), a commenter lamented that science today is influenced by political bias. For others, popular science articles may exacerbate this issue by concealing whatever conflicts of interest may have been present and disclosed in the original research, as shown in example (13).

(13)Too bad this looks like a news article and not an actual research article. Research and journal articles list conflict of interests in order for their papers to be published. [prof sci]

Another reason commenters felt the need to be cautious of popular science is the fact that these articles tend to be summaries of the original research, and thus can misrepresent the source by lack of full, appropriate detail (14).

(14)The linked article is a terrible summation of the actual scientific article, and cherrypicks a remark by the researcher at the expense of the actual findings. [pop sci]

Moreover, other commenters were irked by popular articles’ tendency to frame findings as newsworthy (15), misrepresenting the scientific process by focusing too much on implications and not giving due time to methods and procedures (16).

(15)[. . .] But why does this article tells us it is “previously unknown effect”? It is exactly why it is used! [. . .] I am always triggered by these articles that make you think it is a major breakthrough, even when it was totally suspected. Stop making regular science news like this! [prof sci]

(16)[. . .] I’m honestly slightly pissed that these kind of press releases/articles never indicate how much time and effort went into the development of these treatments. The article makes it sound so easy and fast, as if they just came up with this treatment on the fly, then quickly grew a few cells, put them on a heart and done. [. . .] [pop sci]

Given the above negative attitudes, I then detoured away from the keywords in Table 4, searching manually through the corpus for reference to sources of popular science/science news, including Psychology Today, BBC, CNN, The Guardian, Bloomberg, and ScienceAlert, among others. Contrary to expectations, named sources did not appear very frequently in the corpus. For example, ScienceAlert was only mentioned by name twice, though both instances were negatively evaluated as in example (17).

(17)did anyone notice the name of the website is science alert? as if the name of the site itself didn’t scream click bait, just look at the site. [pop sci]

A few other sources, such as BBC and CNN, were also negatively evaluated. For example, the commenter in (18) compares the wording of a news article with its original scientific publication, criticizing how exclamatory language made its way into the popularization.

(18)The word “unachievable” isn’t even used in the Lancet article. Not once. Yet the CNN article puts the words in quotes. Very shoddy if you ask me. [. . .] [pop sci]

By contrast, in the few instances that professional science sources were criticized, other commenters were quick to defend the source’s credibility, as in the commenter in (19), who lists several top STEM journals.

(19)Anyone involved in medical research, however, knows the Lancet and the NEJM are the top English-language medical journals, just like Nature and Science are considered top science journals, and Cell a top biology journal. [pop sci]

Headline(s) and title

Despite the small space devoted to titles relative to the body of news articles, the keywords title and headline(s) were some of the most negatively evaluated keywords in the corpus, with just about all instances levied toward the titles presented by media outlets and r/science posts. Many top collocates of these keywords reflect the primary concern of titles/headlines—that they can be sensationalized (Table 5).

Table 5.

Top 10 collocates of headline(s) and title.

Headline(s)			Title
collocate	Raw freq	MI2	collocate	Raw freq	MI2
editorialized	16	13.3	misleading	147	14.28
misleading	58	12.61	post	126	12.23
sensationalized	18	12.57	clickbait	25	11.55
clickbait	18	11.62	sensationalistic	3	11.27
read	81	10.49	editorialized	10	10.94
catchy	5	10.43	clickbaity	8	10.71
sensationalist	7	10.4	exploitativeness	3	10.68
flashy	5	10.36	brats	4	10.51
bait	10	10.25	article	105	10.36

Function words were removed from list.

Some commenters portray the issue of sensationalized headlines as a problem of popular science in general. For example, the commenter in (20) contrasts the honest presentation of the scientific publication with the “clickbaity” orientation of science journalism, similar to the way that example (20) cautions against construing professional and popular science as being essentially part of the same institution and thus affected by each other’s credibility.

(20)The clickbaity headlines are usually a problem of pop-science journalism though, not the original papers. You can’t fault the scientists if their data gets twisted [. . .] in a petridish-style by journalists. [pop sci]

(21)It’s really important not to demonize researchers for the way journalists report their headlines. [. . .] If journalists report the findings in a twisted way to implicate something it was not, or if readers take that as gospel without reading the article, that’s on them. [prof sci]

To complicate matters, a review of titles posted to r/science suggests that they do not always align with the titles of their respective popular science article (see Supplemental Materials C for several examples). While often similar, r/science titles include often additional detail such as sample size and can include short quotes illustrating immediate findings and implications. Thus, it is difficult to ascertain whether commenters’ attitudes toward titles and headlines on r/science are legitimate criticisms of science journalism or just customs of r/science. Nevertheless, some commenters attributed the negative evaluation of titles/headlines to science journalism in general, as the example (22) suggests.

(22)[. . .] How long until we see some breathless headlines about a study on sensationalist journalism eroding public trust in science. [pop sci]

5. Discussion and conclusion

Discussion

This study sought to contribute to the PUS literature by applying corpus-driven discourse analysis to a corpus of comments made to the Reddit community r/science. Using keyword, collocation, and concordance analysis, this study set out to examine the content of and motivations behind negative attitudes held toward science. Below, I discuss three themes stemming from section 3, namely, negative attitudes toward popular science and sensationalization, the challenges of communicating social science research, and the criticism of research methodology and sample sizes. I conclude with future directions for further research.

Many comments in the r/science corpus replied to posts that linked sites which report on, comment on, and summarize scientific research, which I have referred to as popular science, due to the fact that they report on rather than contribute original research (Gotti, 2014: 16). While media scholars have sought to understand the quality of interactions between scientists and the media (e.g. Peters et al., 2008), and sociologists have sought to dissemble the stereotype that popular science is vulgarized science (e.g. Hilgartner, 1990), fewer studies have empirically investigated what views science consumers hold toward popular science.

Examples highlighted in this study suggest that, for some r/science users, science news written by media members can be seen as untrustworthy. One simple complaint was that some articles did not provide a link to the original research. A more common complaint was that headlines were sensationalized to the point that they misinterpreted or exaggerated the original work. While scholars and journalists have long been engaged in advancing and improving the ways that science is communicated to the public (see, for example, Bauer and Bucchi, 2007), it is notable that keywords investigated here, such as journalist, news, and headlines, stood out, in particular, as attracting negative attitudes. As several commenters contrasted the negative attitudes toward popular science with positive ones toward professional science, the communication of science via mediators like journalists can be seen as an important influence on the public’s trust in science.

PUS research has historically placed greater focus on the natural sciences than the social sciences and humanities (Schäfer, 2012). Traditional media tends to cover the natural sciences more (Massarani et al., 2007; Weiss and Singer, 1988). And journalists can be tougher on social science research relative to the natural sciences (Schmierbach, 2005: 272–273). Against this backdrop, Cassidy (2014) poses the question, “is communicating the social sciences a specific challenge?” (p. 186). Given that online science communication is growing (Davies and Hara, 2017), and sites of online science communication like r/science focus on social science research more than other domains (Jones et al., 2019: 7), it is a relevant question to ask.

Several commenters in this study were particularly critical of psychology, though this may not be surprising: psychology was one of the first social sciences to take its relationship with the mass media seriously (Cassidy, 2014: 188), and online sites dedicated to writing about psychology research are now common (e.g. psypost.com, psychologytoday.com, psychnewsdaily.com). Thus, for many consumers of online science communication, psychology is the face of social science research. Which psychology studies are most shared online is an important question. The r/science moderators interviewed by Jones et al. (2019: 7) revealed that the most viral and common posts cover topics like gender, sex, and drugs. In my experience reviewing online science news sources, much of this research originates in social science and psychology journals. Thus, in addition to asking whether the reporting of social science presents a unique problem, an additional question is what kind of social science research gets reported by the increasing number of science-dedicated websites.

Finally, certain aspects of research methodology, including statistics and sampling procedure, were found to be a sticking point among some r/science commenters. In particular, the critique of small sample sizes was prevalent, and several comments suggested that it negatively affected the perceived credibility of the research. This is not a novel finding. Several decades ago, Fenton et al. (1998) reported from interviews with journalists that research with large sample sizes were more highly valued than those with smaller ones, an attitude which may also affect the interpretation of study conclusions. Schmierbach’s (2005) experimental study suggests that journalists view quantitative studies with larger sample sizes as more accurate and newsworthy, even if they reach the same conclusion as qualitative studies.

It is notable that similar sentiments were found in the r/science corpus, some two decades later. Certainly, there are benefits to quantitative research and large samples sizes, and the fact that a study becomes published is not justification for withholding criticism of its methods. However, the “crude reliance on N” sizes arguably should not be the primary means by which to judge the importance of a study (Schmierbach, 2005: 285), and accounts of minimal or missing discussion of methodology in science reporting (Hyland, 2010; 121; Pellechia, 1997: 51) suggest that this is an area deserving of more attention by researchers, popularizers, and consumers.

Future directions

There are several future directions for science communication scholars to explore. The current study only examined negative attitudes, ignoring the wide spectrum of other attitudes that Reddit and other social media users may have. Examining positive attitudes may be especially important, since previous research has found that consumption of online media can increase positive attitudes toward science (Dudo et al., 2011), though how social media use impacts attitudes has been underexamined (Huber et al., 2019). Brookes and Baker’s (2017) study of positive and negative evaluation of health care comments provides a good example of a corpus-based perspective. The researchers first created a list of the most frequent positive and negative adjectives identified in a word frequency list of their corpora. Then, the authors reviewed concordance lines around those words to learn what aspects of health care experiences were being evaluated, which was then quantitatively described with basic descriptive statistics. Thus, widening the scope of attitudes explored, in addition to the contexts of online communication, are areas of future direction. In general, the growing amount and variety of online discourse has put online science communication increasingly into the spotlight, and corpus linguistic and discourse analytic perspectives arguably provide useful tools for examining the attitudes and other practices of the science-interested public.

Supplemental Material

sj-docx-1-pus-10.1177_09636625221146453 – Supplemental material for Just another clickbait title: A corpus-driven investigation of negative attitudes toward science on Reddit

Supplemental material, sj-docx-1-pus-10.1177_09636625221146453 for Just another clickbait title: A corpus-driven investigation of negative attitudes toward science on Reddit by Jordan Batchelor in Public Understanding of Science

Footnotes

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Jordan Batchelor

Supplemental material

Supplemental material for this article is available online.

Author biography

Jordan Batchelor is a doctoral candidate in the Department of Applied Linguistics & ESL at Georgia State University. His research interests are related to corpus-based discourse analysis, science communication, and health communication. Some of his other research examines how genres of popular science use language to make science writing accessible to audiences.

References

Anthony

(2021) AntConc (Version 4.0.0) (Computer software). Tokyo, Japan: Waseda University.

August

Card

Hsieh

Smith

Reinecke

(2020) Explain like I am a scientist: The linguistic barriers of entry to r/science. In: Proceedings of the 2020 CHI conference on human factors in computing systems, Honolulu, HI, 25–30 April, pp. 1–12. New York, NY: ACM Digital Library.

Baker

(2006) Using Corpora in Discourse Analysis. London: Continuum.

Bauer

Bucchi

(eds) (2007) Journalism, Science and Society. London; New York, NY: Routledge.

Bauer

Petkova

Boyadjieva

(2000) Public knowledge and attitudes to science: Alternative measures that may end the “science war.” Science, Technology, & Human Values 25(1): 30–51.

Bauer

Falade

(2014) Public understanding of science: Survey research around the world. In: Bucchi

Trench

(eds) Routledge Handbook of Public Communication of Science and Technology. New York, NY: Routledge, pp. 140–159.

Biber

Connor

Upton

(2007) Discourse on the Move: Using Corpus Analysis to Describe Discourse Structure. Amsterdam; Philadelphia, PA: John Benjamins Publishing Company.

Blank

Shaw

(2015) Does partisanship shape attitudes toward science and public policy? The case for ideology and religion. Annals of the American Academy of Political and Social Science 658: 18–35.

Bodmer

(1985) The Public Understanding of Science. London: Royal Society.

10.

Brezina

(2018) Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

11.

Brookes

Baker

(2017) What does patient feedback reveal about the NHS? A mixed methods study of comments posted to the NHS Choices online service. BMJ Open 7: e013821.

12.

Calsamiglia

Van Dijk

(2004) Popularization discourse and knowledge about the genome. Discourse & Society 15(4): 369–389.

13.

Cassidy

(2014) Communicating the social sciences: A specific challenge? In: Bucchi

Trench

(eds) Routledge Handbook of Public Communication of Science and Technology. New York, NY: Routledge, pp. 186–197.

14.

Davies

Hara

(2017) Public science in a wired world: How online media are shaping science communication. Science Communication 39(5): 563–568.

15.

Dudo

Brossard

Shanahan

Scheufele

Morgan

Signorelli

(2011) Science on television in the 21st century: Recent trends in portrayals and their contributions to public attitudes toward science. Communication Research 38(6): 754–777.

16.

Edwards

Ziegler

(2022) Examining science communication on Reddit: From an “Assembled” to a “Disassembling” approach. Public Understanding of Science 31(4): 473–488.

17.

Fenton

Bryman

Deacon

(1998) Mediating Social Science. London: SAGE.

18.

Gauchat

(2011) The cultural authority of science: Public trust and acceptance of organized science. Public Understanding of Science 20: 751–770.

19.

Gierth

Bromme

(2020) Attacking science on social media: How user comments affect perceived trustworthiness and credibility. Public Understanding of Science 29(2): 230–247.

20.

Goldenberg

(2016) Public misunderstanding of science? Reframing the problem of vaccine hesitancy. Perspectives on Science 24(5): 552–581.

21.

Gotti

(2014) Reformulation and recontextualization in popularization discourse. Iberica 27: 15–34.

22.

Hara

Abbazio

Perkins

(2019) An emerging form of public engagement with science: Ask Me Anything (AMA) sessions on Reddit r/science. PLoS ONE 14(5): e0216789.

23.

Hilgartner

(1990) The dominant view of popularization: Conceptual problems, political uses. Social Studies of Science 20(3): 519–539.

24.

House of Lords (2000) Science and Society. London: Her Majesty’s Stationary Office.

25.

Huber

Barnidge

De Zúñiga

Liu

(2019) Fostering public trust in science: The role of social media. Public Understanding of Science 28(7): 759–777.

26.

Hubner

Bond

(2022) I am a scientist . . . Ask me anything: Examining differences between male and female scientists participating in a Reddit AMA session. Public Understanding of Science 31(4): 458–472.

27.

Hunston

(1994) Evaluation and organization in academic discourse. In: Coulthard

(ed.) Advances in Written Text Analysis. London: Routledge, pp. 191–218.

28.

Hunston

(2010) Corpus Approaches to Evaluation: Phraseology and Evaluative Language. London: Routledge.

29.

Hyland

(2010) Constructing proximity: Relating to readers in popular and professional science. English for Academic Purposes 9: 116–127.

30.

Jones

Reinecke

Colusso

Hsieh

(2019) r/science: Challenges and opportunities for online science communication. In: Proceedings of the CHI conference on human factors in computing systems, Glasgow, 4–9 May, pp. 1–14. New York, NY: ACM Digital Library.

31.

Knobloch-Westerwick

Johnson

Silver

Westerwick

(2015) Science exemplars in the eye of the beholder: How exposure to online science information affects attitudes. Science Communication 37(5): 575–601.

32.

Lai

Wang

Calvano

Raja

(2020) Addressing immediate public coronavirus (COVID-19) concerns through social media: Utilizing Reddit’s AMA as a framework for public engagement with science. PLoS ONE 15(10): e0240326.

33.

Lee

Kim

S-H

(2018) Scientific knowledge and attitudes toward science in South Korea: Does knowledge lead to favorable attitudes? Science Communication 40(2): 147–172.

34.

Massarani

Buys

Amorim

Veneu

(2007) Growing, but foreign source dependent: Science coverage in Latin America. In: Bauer

Bucchi

(eds) Journalism, Science and Society. London; New York, NY: Routledge, pp. 71–79.

35.

Miller

(2001) Public understanding of science at the crossroads. Public Understanding of Science 10: 115–120.

36.

Moriarty

Mehlenbacher

(2019) The coaxing architecture of Reddit’s r/science: Adopting ethos-assessment heuristics to evaluate science experts on the Internet. Social Epistemology 33(6): 514–524.

37.

National Science Board, National Science Foundation (2020) Science and technology: Public attitudes, knowledge, and interest. Science and Engineering Indicators 2020. NSB-2020-7. Alexandria, VA: National Science Board, National Science Foundation. Available at: https://ncses.nsf.gov/pubs/nsb20207/ (accessed 15 December 2021).

38.

Paltridge

(2012) Discourse Analysis, 2nd edn. London: Bloomsbury.

39.

Partington

Duguid

Taylor

(2013) Patterns and Meanings in Discourse: Theory and Practice in Corpus-Assisted Discourse Studies (CADS). Amsterdam; Philadelphia, PA: John Benjamins Publishing Company.

40.

Pellechia

(1997) Trends in science coverage: A content analysis of three US newspapers. Public Understanding of Science 6(1): 49–68.

41.

Peters

Brossard

De Cheveigne

Dunwoody

Kallfass

Miller

, et al. (2008) Interactions with the mass media. Science 321: 204–205.

42.

Putsche

Hormel

Mihelich

Storrs

(2017) “You end up feeling like the rest of the world is kind of picking on you”: Perceptions of regulatory science’s threats to economic livelihoods and Idahoans’ collective identity. Science Communication 39(6): 1–26.

43.

r/science (n.d.) Reddit Science. Available at: https://www.reddit.com/r/science/ (accessed 20 December 2021).

44.

R Core Team (2020) R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Available at: https://www.R-project.org/

45.

Rothmund

Farkhari

Ziemer

C-T

Azevedo

(2022) Psychological underpinnings of pandemic denial—Patterns of disagreement with scientific experts in the German public during the COVID-19 pandemic. Public Understanding of Science 31(4): 437–457.

46.

Schäfer

(2012) Taking stock: A meta-analysis of studies on the media’s coverage of science. Public Understanding of Science 21(6): 650–663.

47.

Schmierbach

(2005) Method matters: The influence of methodology on journalists’ assessments of social science research. Science Communication 26(3): 269–287.

48.

Scott

(1997) PC analysis of key words—And key key words. System 25(2): 233–245.

49.

Statista Research Department (2021, August) Reddit—Statistics & facts. Available at: https://www.statista.com/topics/5672/reddit/#dossierKeyfigures (accessed 1 December 2021).

50.

Sturgis

Allum

(2004) Science in society: Re-evaluating the deficit model of public attitudes. Public Understanding of Science 13: 55–74.

51.

Thompson

(2014) Introducing Functional Grammar. New York, NY; London: Routledge.

52.

Trench

(2008) Towards an analytical framework of science communication models. In: Cheng

Claessens

Gascoigne

Metcalfe

Schiele

Shi

(eds) Communicating Science in Social Contexts: New Models, New Practices. New York, NY: Springer, pp. 119–135.

53.

Weiss

Singer

(1988) Reporting of Social Science in the National Media. New York, NY: Russell Sage Foundation.

54.

Wynne

(2006) Public engagement as a means of restoring public trust in science: Hitting the notes but missing the music? Community Genetics 9(3): 211–220.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.03 MB