Abstract
This article reports a language analysis of breast cancer patients’ posts in an online support group. Adopting web-scraping techniques, the study analyzed 27,078 online posts contributed by 1443 users along multiple linguistic dimensions to investigate the trajectory of the patients’ psychosocial adaptation of the disease. The findings suggested that breast cancer patients’ emotional experiences and adjustment in the course of illness vary from one stage to another. They reached the peak of emotional expression, struggle and despair, and self-focus at Stage III, whereas wiped out negative emotions and signaled a desire for connections with others at Stage IV.
Breast cancer is the most common cancer among women in the United States (Centers for Disease Control and Prevention, 2019). Despite its prevalence, breast cancer represents a high survival rate: if diagnosed at an early stage, 99 percent of patients survive after 5 years (American Cancer Society, 2002). Its diagnosis and subsequent treatments are undoubtfully a painful and traumatic experience (Ganz, 2001). Research showed that receiving the treatments can be very arduous and suffering, which often accompanies the loss of energy, threats of disfigurement, and declined life quality (Spiegel, 1997). These difficulties and sufferings can last for years, even decades. The frequency and the prolonged duration of the disease make it an important issue to focus on the survivors’ disease adjustment and psychosocial well-being.
A large amount of research looking into breast cancer survivors’ coping and well-being found that diagnostic and treatment phases often engender a range of emotional distress and mood disturbances, including fear, shock, anxiety, and feelings of isolation (Mitchell et al., 2017). Most previous studies primarily rely on retrospective self-reported data, which are prone to social desirability concerns and recall biases, especially in highly emotional contexts. One goal of the current study is to present a more objective picture of breast cancer survivors’ psychosocial state through the lens of their language use in an online support group (OSG). Furthermore, previous psychological data were analyzed without relating to survivors’ disease progression stage or medical background. Indeed, breast cancer patients regularly experience both physical and psychological changes as the disease progresses. As each phase presents a new set of challenges, understanding how patients’ thoughts and feelings change across different disease stages is an important factor in facilitating positive coping with this disease. Therefore, another aim of the current study is to map breast cancer patients’ psychosocial change along the illness trajectory via a linguistic approach.
Coping as a transactional process over breast cancer stages
It is well established that the course and outcome of a disease are largely shaped by the patient’s coping abilities (Lazarus, 1993; Siwik et al., 2017). The Cognitive Theory of Stress and Coping argues that the essence of coping with a stressful encounter is change. Specifically, it views coping as a transactional process of psychosocial adaptation, which varies with the diverse situational demands and concerns (Lazarus, 1993). Research showed that various concerns predominate at different points in the disease process, such that early-stage patients were more concerned for family and about dying threat (Shands et al., 2006). In contrast, those at a later stage worry about the side effects of the treatment, as well as physical pain (Tapper, 2000). Similarly, the Kübler-Ross model (Kübler-Ross, 1973) suggests five stages by which an individual may respond to a terminal diagnosis: (1) shock and denial; (2) anger, resentment, and guilt; (3) bargaining; (4) depression; and (5) acceptance. Both of these two theoretical frameworks suggest that the knowledge of change is bound to be important for a better understanding of how breast cancer patients cope with this stressful encounter.
Breast cancer stage is usually expressed as a number on a scale of I through IV based on the size and location of the cancer tumor, with Stage I of cancer cells being non-invasive and Stage IV of cancer cells spreading outside the breast to other parts of the body (American Joint Committee on Cancer, 2019). Patients at different stages experience distinct clinical situations (hospitalization, chemotherapy, etc.) and psychological changes (Schain, 1976). Some research showed that patients experience the most severe distress at the time of diagnosis (Edgar et al., 1992). In contrast, other studies observed stress accumulates over the course of illness and reaches the peak at the metastatic stage (Heim et al., 1997). To address this inconsistency, this study aims to compare the four stages in one dataset and understand breast cancer survivors’ coping and psychosocial changes over the whole disease course.
In total, 27,078 online posts made by 1443 users were analyzed along multiple linguistic dimensions in order to investigate the trajectory of breast cancer patients’ coping and psychosocial adaptation of the disease. An unobtrusive analysis of breast cancer survivors’ language use over the disease trajectory can provide more accurate information about their psychosocial status that goes beyond their self-reported well-being.
Language
People are able to think symbolically and use language to express emotions. An enormous amount of research shows that the language people use in their daily lives reflects affective and cognitive state (Tausczik and Pennebaker, 2010). The researchers have systematically analyzed people’s writing and transcripts of spoken language to identify associations between the language patterns and various psychological dimensions of interest (e.g. Chung and Pennebaker, 2007). For example, research showed that people exhibiting better coping and greater resilience after experiencing trauma used more words associated with causal thinking and positive emotions when narrating their traumatic experiences (Pennebaker et al., 1997). A discourse analysis of poems found that writings of suicidal poets contained more death-related and first-person singular words compared with nonsuicidal poets, indicating suicidal people tend to be detached from others and preoccupied with self (Stirman and Pennebaker, 2001).
The current study applied this language-based approach to breast cancer survivors’ psychosocial state by identifying linguistic cues in their online posts. Specifically, the online posts made by breast cancer survivors across four stages in an OSG were analyzed by a language analysis software called Linguistic Inquiry and Word Count (LIWC; Pennebaker et al., 2015). LIWC was initially developed to understand the emotion expression in narratives and has been solidly validated (Pennebaker and Chung, 2007). It is capable of performing automated word searches and capturing more than 76 preset categories, among which are linguistic categories (e.g. pronouns, articles), content categories (e.g. religious and emotion words), and psychological processes or styles (e.g. psychological distance). These word categories were developed based on psychological measurement scales and validated by independent judges (Pennebaker and Chung, 2007). The current study examined four critical areas of breast cancer survivors’ psychological state: emotional experience and expression, identity and social relationships, time orientation, and cognitive abilities.
Emotional expression and experience
Diagnosis and treatment of breast cancer bring shock and accompany psychological distress. Previous research showed that disclosure of one’s emotions helps to mitigate distress and benefit the physical well-being (Lazarus, 1993). However, the findings regarding cancer patients’ emotion expression behavior have been mixed. Some found that breast cancer patients tend to openly express their feelings as a way to cope with the disease (Gotay, 1984), whereas other studies found that some women are hesitant to share their psychological distress and tend to suppress or control their emotions, possibly due to the social stigma associated with the disease (Giese-Davis and Spiegel, 2003). These discrepant observations might be due to the variability in illness stages of the investigations. A research question was then proposed to detect any difference in breast cancer patients’ emotional expression in the OSGs across the four stages:
RQ1. To what extent do breast cancer survivors express their emotions in the OSGs across four cancer stages?
With regard to the emotional experience, there have been mixed results. A group of studies showed a decline in cancer patients’ mental health status across sites of disease (e.g. Ell et al., 1989). For example, one study found that psychological distress tends to accumulate as the “magnitude” of disease unfolds (Cella et al., 1987). Another study interviewed 50 breast cancer patients and reported that disease progression brings up an increased financial burden, decreased activities, and declined life quality (Meyerowitz et al., 1979). Besides, extensive lifespan literature suggested a reversed relationship between distance from death and life satisfaction due to declined physical conditions and elevated death anxiety (e.g. Gerstorf et al., 2008; Shrira et al., 2014). Therefore, it is conceivable that breast cancer patients experience more and more distress and depression as they reach a more advanced stage of the disease or even death. We hypothesize that this elevating negative emotion along the disease development is reflected in decreasing positive and increasing negative emotion word use across the four stages.
Nevertheless, the Kübler-Ross’ Stages of Grief Model suggests the opposite direction such that patients tend to “accept” the mortality and exhibit less distress as they approach the end of their life (Figure 1). An empirical study confirmed the pattern of Grief Model such that when people face a traumatic loss, negative emotions (such as yearning, anger, and depression) peak at the beginning and gradually decrease to a minimum value as time passes by; by contrast, acceptance steadily increases through the study observation period. Since breast cancer is relatively easy to detect and most breast cancer patients are diagnosed at an early stage (Cancer.net, 2020), we expect that the patients experience the highest level of distress at an early stage, and that the negative emotions tend to tail off toward the final stage of the disease. In this light, we hypothesize that patients experience the highest level of struggle and distress at early stages but ultimately end with more and more acceptance. Therefore, we expect to observe an increase of positive emotion words and a decrease of negative emotion words along the disease stages. Two competing hypotheses were proposed in an attempt to map the fluctuation of breast cancer patients’ emotional experiences over the course of the disease:
H1(a). The forum members’ use of positive emotion words declines across the four stages of breast cancer and reaches the bottom at Stage IV;
H1(b). The forum members’ use of positive emotion words increases across the four stages of breast cancer and reaches the peak at Stage IV;
H2(a). The forum members’ use of negative emotion words increases across the four stages of breast cancer and reaches the peak at Stage IV;
H2(b). The forum members’ use of negative emotion words declines across the four stages of breast cancer and reaches the bottom at Stage IV.

Kübler-Ross’ Stages of Grief Model.
Among the discrete emotions, anxiety and fear undoubtedly are a theme that pervades the experience of patients over the entire course of the disease (Champion et al., 2004). A study interviewing patients of cervical and breast cancer found that distress accumulates in the disease course and that advanced-stage patients were more likely to discuss their fear of cancer with others compared with early-stage patients (Gotay, 1984). By contrast, several other studies found that the strongest anxiety was triggered at the time of diagnosis (e.g. Visser et al., 2006). A research question was proposed to look into breast cancer patients’ expression and experience of anxiety over the disease course:
RQ2. How does forum members’ use of anxiety-related words change across four stages of breast cancer?
Cognitive processing
Another linguistic dimension of interest in this study is breast cancer patients’ cognitive words use. People engage in cognitive processing to understand and make meaning of a traumatic experience (Pennebaker and Harber, 1993). In the OSGs, the survivors may also use cognitive processing words to reduce uncertainty and confusion of their disease. A number of studies of expressive writing interventions found that increased use of cognitive processing words (e.g. insight and causation) is linked with positive health outcomes (Pennebaker and Seagal, 1999). One study comparing male and female cancer patients’ language use in online forums found that breast cancer patients made greater use of words related to both emotional disclosure and cognitive processing than prostate cancer patients. Given that no research has explored breast cancer patients’ cognitive processing word use across time, a research question was proposed to map this linguistic category over the four stages:
RQ3. How does forum members’ use of cognitive processing words change across four stages of breast cancer? Does it have a similar pattern with the emotion words?
Time orientation
Confronted with a terminal illness, breast cancer patients’ perceptions of time are closely associated with their psychological and social coping. Presumably, as people get older and eventually approach the end of life, they tend to focus more on the past than the future (e.g. Carstensen et al., 1999). Surprisingly, in a study looking into the diaries kept by people at different age stages, it was found that although older participants’ general reference to time diminished as age increases, older participants used more future-tense and fewer past-tense verbs (Pennebaker and Stone, 2003). One possible explanation is that when older people perceive time as limited, they tend to delicately plan for the limited future and focus more on the present. By contrast, younger people who perceive time as open-ended are less concerned about the future or the present. We expect that the same explanation applies to our subjects. As breast cancer patients approach to the end of their life, foreseeing a limited future, they appear to be more anchored in the present and future. Two hypotheses were proposed to reflect a similar pattern of breast cancer patients’ language use regarding time orientation:
H3. The forum members’ use of time reference words shows a general decline across the four stages of breast cancer.
H4. Across from Stage I to Stage IV, past-tense verbs will decrease (a), and the present- and future-tense verbs will increase (b).
Identity and social relationships
Besides psychological well-being, social relationships and self-identity are another focus of this study. There is a robust finding that sociological factors, such as friendship network, marital, and employment status, are associated with survival rate (Waxler-Morrison et al., 1991). Therefore, the knowledge of breast cancer survivors’ social relationships and self-identity is of great significance for a more comprehensive understanding of their disease coping and adjustment.
As the disease develops, the ravages of severe illness increasingly prevent patients from regularly functioning and constrict their social network. For example, social activities with families and friends may be reduced due to the loss of energy (Gorsky and Calloway, 1983). This strain may lead to feelings of being socially isolated and focusing on self. Therefore, it is predicted that breast cancer survivors reduce their networks and focus more on themselves as the disease develops.
An accumulation of research evidence suggests that pronoun use reflects one’s relation to self and others. For example, frequent use of first-person singular form (I, me, my) predicts the tendency of being preoccupied with self and detached from others (Stirman and Pennebaker, 2001). By contrast, the use of first-person plural form (we, us, our), as well as references to other people, served as a measure of social integration (Pennebaker and Graybeal, 2001). These two first-person forms are usually inversely correlated in people’s self-reflection diaries (Pennebaker and King, 1999). Linking this linguistic dimension with breast cancer survivors’ well-being, the following hypothesis was proposed:
H5. Across from Stage I to Stage IV, the use of social words and first-person plural pronouns will decrease (a), whereas the use of first-person singular pronouns will increase (b).
Method
Testbed
The current study web-scraped the online posts of 1443 of forum members on BreastCancer.org (https://community.breastcancer.org/). This online community was the first result after searching for “breast cancer forum.” BreastCancer.org is a nonprofit organization founded in 2000 by a breast cancer oncologist. As the leading patient-focused resource for breast cancer information and support, it has reached 134 million people worldwide and hosts 219,000 registered members. The forum has four distinct sections from Stage I to Stage IV, which allows us to test hypotheses and explore research questions raised in this study.
Data collection and preparation
The second author developed a Web scrawler in the Python language to scrape data from the BreastCancer.org and saved the data to text files. Only initial threads were included in the data analysis since replies may come from patients of an earlier or advanced stage. The initial data collection yielded 27,933 threads, and more than 637,000 elicited replies. Four users’ posts were excluded from analysis, as they posted on multiple sections. Six hundred ninety-five threads posted by forum moderators were excluded from data analysis, resulting in 27,078 threads made by breast cancer patients. Among them, 1160 threads are from Stage I, 274 threads from Stage II, 3693 from Stage III, and 21,951 from Stage IV.
Each thread was then exported into LIWC 2015 software (Pennebaker et al., 2015). LIWC calculates the percentages of words of different linguistic categories in each text file. Its categories have been well validated and widely used in a variety of disciplines, including psychology (e.g. Robinson et al., 2013), advertising (e.g. Hewett et al., 2016), and education (e.g. Mehl and Pennebaker, 2003). We included linguistic measures that were relevant to our hypotheses, including emotional processing, cognitive processing, time orientations, and pronouns. The data that support the findings of this study are openly available in Dryad at https://doi.org/10.5061/dryad.b5mkkwh92.
Data analysis
Descriptive analysis showed that except for the cognitive processing category (standard error (SE) = 0.16), all the linguistic variables are skewed (1.34 ⩽ SE ⩽ 15.67), which violates the assumption of normal distribution. Therefore, we performed nonparametric Kruskal–Wallis test on interested linguistic dimensions (except for cognitive processing words category) among the four stage groups (Theodorsson-Norheim, 1986). The null hypothesis H0 is rejected at p ⩽ .05. If any significant effects are detected, a further ad hoc comparison was planned to identify the specific differences. Means and standard deviations (SDs) are reported in Table 1.
Means and standard deviations (in parentheses) on the study variables for each of the four stages of breast cancer.
All data are percentages.
Results
RQ1 examined breast cancer’s overall use of emotional words in their online posts. To get a sense of the trajectory of emotional words use across four stages, we started with a visual exploration using SPSS visual tools (Figure 2(a)). The plot showed a cumulative distribution, indicating a dramatic increase of emotional expression from Stage II to III, with Stage I and II, as well as Stages III and IV being at the same level. A Kruskal–Wallis H test confirmed that there was a statistically significant difference in affective words among the four groups, χ2(3) = 132.40, p < .01, η2 = 0.005. A planned post hoc test showed that people at Stage III and IV used significantly more affective words than people at Stage I and II (t = –7.0, p < .001, d = 0.09).

Percentage of emotion-related words use across four stages. (a) emotional words overall, (b) positive emotional words, and (c) negative emotional words
The first two sets of competing hypotheses examined the trend of breast cancer survivors’ positive and negative words use. A Kruskal–Wallis H test showed that there was a statistically significant difference in positive-emotion words among the four groups, χ2(3) = 107.040, p < .01, η2 = 0.004, with a mean rank of 2.58 (Stage I), 2.90 (Stage II), 3.40 (Stage III), and 3.36 (Stage IV). A post hoc comparison test was conducted to determine where the differences lie among the four stages. The test showed a significant difference between all two stages except for Stage III and Stage IV (t = 0.44, p = .97). Thus, as shown in Figure 2(b), the positive affective words showed a climbing trend. H1(a) was rejected, and H1(b) was supported.
Similarly, a Kruskal–Wallis H test was performed to determine any differences in negative emotional words use among the four stages. A statistically significant difference showed up, χ2(3) = 36.56, p < .01, η2 = 0.001, with a mean rank of 1.52 for Stage I, 1.56 for Stage II, 1.86 for Stage III, and 1.82 Stage IV (Figure 2(c)). A post hoc comparison test revealed a significant difference exists between Stage I and Stage III (t = –5.769, p < .01, d = 0.17), Stage I and Stage IV (t = –5.50, p < .01, d = 0.08), and Stage II and Stage III (t = –1.87, p = .06, d = 0.06) (Figure 2(c)). However, Stages III and IV do not differ in their negative words’ use (t = –1.60, p = .11). Thus, H2(a) was partially supported, and H2(b) was rejected.
Among negative emotions, the current study particularly examined the anxiety-related words’ use. As Figure 3(a) shows, the anxiety words’ use reaches the peak at Stage II and then declines all the way to Stage IV. A Kruskal–Wallis H test showed that patients at Stage IV (M = 0.40, SD = 0.80) used significantly less anxiety-related words compared with those at Stage I (M = 0.59, SD = 0.94), Stage II (M = 0.61, SD = 0.92), and Stage III (M = 0.57, SD = 0.95) (ts > 4.03, ps < .001, ds > 0.05). Therefore, breast cancer patients decreasingly use anxiety words in their online posts as the disease unfolds. Interestingly, a supplemental analysis showed that anger- and sadness-related words showed an opposite trend with anxiety-related words. Unlike anxiety words, patients increasingly use anger (χ2(3) = 33.63, p < .01, η2 = 0.001) and sadness words (χ2(3) = 39.76, p < .01, η2 = 0.001) from Stage II through Stage IV (Figure 3(b) and (c)).

Percentage of discrete emotion words’ use across four stages: (a) anxiety, (b) anger, and (c) sadness.
Since the variable of cognitive processing words use is normally distributed, a regular analysis of variance (ANOVA) test was conducted, showing that no significant difference was found between the four stages (F = 0.89, p = .44). As regards to the lengthy words, the Kruskal–Wallis H test showed people at different stages used a different amount of lengthy words (χ2(3) = 136.418, p < .001, η2 = 0.005). A post hoc comparison showed that people at Stages I and II used significantly more lengthy words than those at Stages III and IV (t = –7.87, p < .001, d = 0.10).
H3 examined breast cancer patients’ time reference word usage across four stages. The results of Kruskal–Wallis H showed that except for Stages I and II, other pairs of stages have a significantly different use of time reference words (χ2(3) = 79.55, p < .001, η2 = 0.003). As Figure 4(a) shows, patients at Stage III used most time reference words (M = 6.70, SD = 3.96), followed by Stage IV (M = 6.46, SD = 3.83), Stage II (M = 5.79, SD = 4.02) (ts > 3.05, ps < .01, ds > 0.10). Thus, H3 was rejected.

Percentage of time reference words use across four stages: (a) general time reference, (b) past-tense, (c) present-tense, and (d) future-tense.
A Kruskal–Wallis H test was performed to examine H4. First, a significant difference exists among the four groups, χ2(3) = 36.56, p < .01, η2 = 0.001. The post hoc comparison revealed that people in Stage IV used significantly fewer past-tense words than those in Stage III (t = 3.48, p < .01, d = 0.04) (Figure 4(b)). No significant difference was observed between other groups. Thus, H4(a) was partially supported. As regards the present tense words, a main effect exists among the four groups, χ2(3) = 8.41, p < .05, η2 = 0.003. A further comparison test showed that people at Stage III (t = –2.55, p < .05, d = 0.08) and Stage IV (t = –2.42, p < .05, d = 0.03) used significantly more present tense words than people at Stage II (Figure 4(c)). The main effect also showed up for future-tense words use, χ2(3) = 10.26, p < .05, η2 = 0.004. Specifically, people at Stage IV used significantly more future-tense words than people at Stage III (t = –2.77, p < .01, d = 0.03). In contrast, people at Stage III used significantly fewer future-tense words than people at Stage I (t = 2.30, p < .05, d = 0.06) (Figure 4(d)). Thus, H4(b) was also partially supported.
H5 examines the trajectory of both first-person singular and plural pronouns across the four stages. First, contradictory to our expectation, first-person plural pronouns significantly increased from Stage I up to Stage IV, χ2(3) = 128.49, p < .01, η2 = 0.005. A break-down comparison showed that except for Stage I and II (t = –0.05, p = .96), all other stages are different from each other (ts < 2.31, ps < .04, ds > 0.06) (Figure 5(a)). Thus, H5(a) was rejected. Second, first-person singular pronoun use peaked at Stage III and dropped to the bottom at Stage IV. There is a significant difference between the four groups, χ2(3) = 174.35, p < .01, η2 = 0.006. A post hoc comparison test showed that all stages have significant different first-person singular pronouns usage (ts > 2.02, ps < .05, ds > 0.04), except for Stages I and II (t = –0.91, p = .36). As Figure 5(b) shows, the patients increasingly use first-person singular pronouns from Stage I all the way through Stage III but had a dramatic drop at Stage IV. Thus, H5(b) was partially supported.

Percentage of first-person pronoun use across four stages: (a) plural first-person and (b) singular first-person.
Discussion
The diagnosis of breast cancer and its treatment bring dramatic changes to patients and their loved ones. Many patients live with the disease for extended periods, and thus coping with it becomes one aspect of their lives. By performing a linguistic analysis on a large-scale web-scraped data, the current study sought to understand breast cancer patients’ psychosocial adjustment trajectories over four disease stages, as reflected via their natural language use in online posts.
Natural language analysis is widely used in psychological diagnosis and therapy. Multiple linguistic categories in people’s expressive writings are recognized as valid indicators of people’s psychological status, such as stress level (Gandino et al., 2017), suicidal proneness (Stirman and Pennebaker, 2001), and narcissism personality (Carey et al., 2015). The OSGs for breast cancer are similar to such expressive writing tasks in which patients share thoughts, express feelings, and seek support. Unlike expressive writing tasks, however, posting in such support groups happens more openly and voluntarily, with little or no guidance regarding the writing. Therefore, the patients’ online posts provide scholars with a natural and credible corpus to look into their social and psychological adjustment with the disease.
In this study, both positive and negative affective words showed similar patterns across four stages: patients at Stage I had the least level of emotion expression, regardless of positive or negative ones; both of these two categories displayed a dramatic increase from Stage II to III. This finding suggested that breast cancer patients tend to experience an emotional outburst as they reach Stage III, probably due to the quick disease progression and significant treatment transformation. It is also possible that at Stage III, patients feel an increasing need to express and vent their emotions as a way to cope with the disease. Previous research studying the effects of expressive writing found that greater expression and lower suppression would facilitate adjusting to the ongoing stressors associated with surviving cancer (Han et al., 2008; Shim et al., 2011). In this light, this study found that compared with advanced stages, Stage I and II patients tend to suppress their emotions. Thus, health practitioners and support group moderators may motivate patients at these stages to openly express their emotions to cope with the disease better.
Counter to our expectation, positive and negative emotion words display similar patterns across the four stages: climbing up through Stage I–III, and slightly decline at Stage IV. We interpret this finding as patients’ increasing need to express and manage their emotions, regardless of positive or negative ones. This speculation resonates with Socioemotional Selectivity Theory (Carstensen et al., 1999), which argues that when people anticipate a limited future, they would devote more time and effort to emotional regulation, whereas reduce social and knowledge-related goals (such as making friends, acquiring knowledge, and accomplishing tasks). This speculation awaits further empirical tests.
Although statistically insignificant, breast cancer patients at Stage IV displayed a more optimistic attitude (more positive and less negative words’ use) compared with Stage III. Along with the finding of decreasing use of anxiety words, it suggested that Stage IV patients have accepted the disease and are able to face it with an assured attitude, which is consistent with the Stages of Grief Model (Kübler-Ross, 1973). However, it is interesting to note that anger and sadness words peaked at Stage IV. The Stages of Grief Model argued that people facing a traumatic event tend to feel intense anger at an early stage. Our finding suggested the opposite way such that the patients experience the highest level of anger at the last stage of their disease. Future studies may consider a thematic analysis to identify the causes of breast cancer patients’ anger emotions toward the end of their life.
The cognitive processing words category has been used as a proxy for measuring people’s cognitive complexity abilities. Surprisingly, breast cancer survivors’ cognitive processing words’ use remained stable across four stages. This may be due to the sampling bias in the current study such that patients who are able to read and write online do not experience a dramatic deterioration of cognitive capabilities.
Our hypotheses for time reference language received mixed support. Counter to our expectations, patients increasingly used time orientation words as the disease unfolds, reaching the peak at Stage III. Nevertheless, patients used less and fewer past-tense verbs across the four stages, indicating they look to the past less and to the future more despite of facing the terminal disease. It is noteworthy that patients at Stage III used the least future-tense verbs, suggesting that survivors at Stage III possibly experience the maximum level of despair and hopelessness, which deserves health practitioners and psychological therapists’ attention.
The first-person singular and plural are used as a proxy to imply one’s mental status regarding the relationship with self and others. Previous research found that the use of the word I indicates self-focus and a detached status from social life, whereas the use of the word we reflects connection and bonding with others (Stirman and Pennebaker, 2001). In the current study, we found the use of word I peaked at Stage III and a constant climbing pattern of the word we from Stage I to IV. It may reflect that patients at Stage III are preoccupied with their own topic, the cancer, whereas Stage IV patients tend to shift their focus to families and friends when facing the last stage of their disease.
The aforementioned findings have practical implications. Characterizing patterns of patients’ language use in an unobtrusive environment will improve our understanding of patients’ needs and adjustments at different phases of the disease. This knowledge would assist health care providers in anticipating difficulties patients may face, sharing their perspectives, and providing effective support. Specific psychological therapy and social support, along with clinic practice, can be designed and tailored accordingly for each stage of patients. For example, observing anger and sadness words peak at Stage IV, social support workers and health practitioners are suggested to employ interventions facilitating patients at this stage to vent and regulate their intense emotions, such as expression writing programs, group support interventions, and so on. The findings regarding first-person pronouns suggested that patients at Stage I tend to focus on selves and may stay detached from others. Particular attention should be given to patients at this stage to direct their attention to other aspects and ease their anxiety.
This study has limitations. First, linguistic analysis, like many other methods such as self-report surveys, provides a partial view of the topic of interest. In-depth interviews and thematic coding are needed in future research for a more comprehensive understanding of breast cancer patients’ psychosocial status and needs. Besides, the findings are based on only OSG users. Therefore, cautions should be exercised when generalizing to all breast cancer patients’ populations. Third, we did not have access to patients’ treatment information. As such, we were unable to examine how psychosocial coping varies across different treatment stages. Considering the fact that different treatment stages and methods (such as surgery, chemotherapy, and radiation therapy) accompany different challenges and concerns, it is essential for future research to examine how patients’ psychosocial coping varies as medical treatment unfolds. Fourth, the current study is not able to accurately identify breast cancer patients’ concerns and needs using linguistic markers. Since patients at different stages may experience different concerns and needs, a focus group study or in-depth interviews are suggested to identify concerns’ and needs’ change along the disease trajectory.
In this study, we adopted language analysis as a window into trajectories of breast cancer patients’ psychosocial well-being across the four stages. Taken together, findings reveal that breast cancer patients’ emotional experiences and psychosocial demands confronted in the course of illness vary from one illness stage to another. As the disease develops, patients reached the peak of emotional expression, struggle and despair, and self-focus at Stage III. Consistent with the Stage of Grief Model, patients toward the last stage of the disease wiped out negative emotions and signaled a desire for connection and bonding with others. Overall, adopting the techniques of web-scraping and computational language analysis, the current study contributed to the understanding of breast cancer survivorship by mapping how breast cancer patients’ concerns and demands covaried with the disease stages.
Research Data
Breast_cancer_patients_language_use_across_four_stages – Supplemental material for Mapping breast cancer survivors’ psychosocial coping along disease trajectory: A language approach
Supplemental material, Breast_cancer_patients_language_use_across_four_stages for Mapping breast cancer survivors’ psychosocial coping along disease trajectory: A language approach by Meng Chen and Liang Zhao in Journal of Health Psychology
Footnotes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
