Handling the Inconsistency between Self-Report and the Actual Behavior: Validity of Excluding Survey Participants with Insufficient Effort Responding

Abstract

In this study, we aimed to understand and reduce the difference between self-report in a survey and the actual behavior. Thus, we investigated whether such a difference was caused by participants who engaged in insufficient effort responding (IER), which has been receiving increasing research attention. We collected and analyzed data of actual and self-reported smartphone game usage from behavior logs and survey responses including the items associated with the IER scale, respectively. The results confirmed a strong tendency of overreporting and low correlations between the behavior log and survey responses for IER participants. Although the distributions of survey responses differed between IER participants and others, those of the behavior log did not. In conclusion, when IER participants are excluded, the difference between the behavior log and survey responses reduces, but the distribution of the actual behavior remains the same without selection bias.

Keywords

behavior log survey response insufficient effort responding

Introduction

Due to the high applicability, questionnaire-based surveys are widely used not only for practical marketing research and social surveys but also for academic research—mainly in the social sciences such as economics, business administration, psychology, sociology, environmentology, and pedagogy. However, differences between self-reports and the actual behavior have been widely confirmed in prior studies (e.g., Araujo et al., 2017; Collopy, 1996; De Reuver & Bouwman, 2015; Junco, 2013; Lee et al., 2000; Prior, 2009; Vanden Abeele et al., 2013). When survey responses do not accurately reflect actual behaviors, the survey results are biased, and their validity is reduced. This can lead to erroneous conclusions, such as identifying the wrong behaviors or predictors through research.

Meanwhile, various studies on the improvement of survey accuracy have also been conducted, such as those related with response styles, data collection, and data quality in the marketing field (e.g., Baumgartner & Steenkamp, 2001; Brosnan et al., 2019; Goodman & Paolacci, 2017; Kees et al., 2017; Liu & Wronski, 2018; Meyvis & van Osselaer, 2018; Paas et al., 2018; Weijters & Baumgartner, 2012). In addition, in recent years, with the widespread implementation of both surveys and experiments online, more attention has been paid to studies that identify or explain the careless, inconsistent, or random responses by some participants who are not motivated or willing to answer appropriately (e.g., Bowling et al., 2016; Curran, 2016; Huang et al., 2012, 2015; Meade & Craig, 2012; Paas et al., 2018; Special Issue in 2018 on Applied Psychology: An International Review).

These inappropriate responses due to insufficient motivation are called Insufficient Effort Responding (IER), and many methodologies have been proposed to distinguish whether a participant is engaging in IER (e.g., Curran, 2016; Huang et al., 2012, 2015; Johnson, 2005; Meade & Craig, 2012; Oppenheimer et al., 2009). Although some of these studies deal with them by, for example, increasing participants’ internal motivation (Maniaci & Rogge, 2014) or assigning them personally identifiable surveys (Meade & Craig, 2012), many studies mention the exclusion or screening of IER participants as a realistic approach (e.g., DeSimone & Harms, 2018; Dunn et al., 2018; Jia et al., 2018; Meade & Craig, 2012; Ward & Pond, 2015). In fact, many empirical marketing studies also employ these identification techniques to exclude a defective sample (e.g., Beck et al., 2020; Bond et al., 2019; Consiglio & van Osselaer, 2019; Fernandes et al., 2016; Jung et al., 2022; Klein et al., 2019; Laran et al., 2019; Yang et al., 2018).

However, the extent to which IER actually causes the differences between survey responses and actual behavior is yet to be clarified. Therefore, (1) we investigate the extent of the impact of IER participants—who are identified using the IER technique—on the difference between survey responses and actual behavior and the validity, compared to that of other participants; (2) Although IER participants are generally excluded from analyses, we examine whether such exclusions cause selection bias and result in a loss of representativeness. This study aims to improve the accuracy of marketing research and its empirical methods by understanding and reducing the difference between self-report and the actual behavior in surveys.

The rest of the paper is organized as follows. In the next section, we describe the research questions of this study, based on the prior studies related with IER and the comparison of self-reported and objective facts. Subsequently, we present an overview of the behavior log and survey responses data, respectively. Further, we compare responses with the actual behavior and clarifies the associations between IER and the discrepancies. In addition, we examine whether the distributions of behavior logs differ between participants with and without IER and explore the possibility of excluding those who engage in IER from the analysis. Finally, we sum up the results and discuss the implications and limitations of this study.

Background

A Comparison Between Self-Reports and Objective Facts

As the widespread use of the Internet and digital devices has made it possible to collecting large-scale data from consumers, combining survey responses and behavior log data has received attention (e.g., Groves, 2011; Kreuter et al., 2020; Stier et al., 2020). Combining them, e.g., enables studies that capture both actual behavior and psychological attributes from behavior log and survey responses, respectively (e.g., Kosinski et al., 2013; Nakano & Kondo, 2018; Stopczynski et al., 2014), or that compare actual behavior and self-reports in communication studies.

In fact, by comparing these data, many studies have demonstrated a discrepancy between self-reports and the actual behavior. Such comparative studies have been actively conducted in communication-related fields, such as web browsing (Araujo et al., 2017; Scharkow, 2016), use of mobile devices (Boase & Ling, 2013; De Reuver & Bouwman, 2015; Deng et al., 2019; Vanden Abeele et al., 2013), and TV-watching (Nenycz-Thiel et al., 2013; Prior, 2009). As a critical implication, those studies indicate that self-report falls into overreporting, compared to actual behavior. (e.g., Araujo et al., 2017; Boase & Ling, 2013; Deng et al., 2019; Prior, 2009; Scharkow, 2016). For instance, Deng et al. (2019) compared self-report with the behavior log for smartphone usage, and found that the use time is widely overestimated both in many individual categories of smartphone apps and the total use time. In De Reuver and Bouwman (2015) as well, some or total overestimations were found in categories such as games, maps, online music, and news. Further, many studies in the communication field have reported only weak-to-moderate correlations between self-reports and actual behavior (Araujo et al., 2017; Boase & Ling, 2013; De Reuver & Bouwman, 2015; Scharkow, 2016); i.e., the validity of survey responses is not high.

In addition to the combination of behavioral log and survey responses, in educational studies, self-reported and the actual grades (e.g., GPA, school records) or entrance exam scores have long been compared, showing that self-reported scores are higher than actual ones and that there is a relatively high positive correlation between them (e.g., Cole & Gonyea, 2010; Dunnette, 1952; Kirk & Sereda, 1969; Kuncel et al., 2005; Mayer et al., 2007; Perry, 1940). In environmental studies, self-report and objective facts are compared as well (e.g., Corral-Verdugo, 1997; Corral-Verdugo & Figueredo, 1999; Corral-Verdugo et al., 2003; Delley & Brunner, 2018; Elimelech et al., 2019; Moore & Rutherfurd, 2020). A meta-analysis of existing studies in proenvironmental behavior by Kormos and Gifford (2014) found that self-report could not explain 79% of the variance in objective behaviors and that some overreporting occurred.

As a background of these systematic biases such as overreporting, the impact of social desirability, including maintaining pride or self-esteem, is often discussed in educational or environmental studies (e.g., Cole & Gonyea, 2010; Kormos & Gifford, 2014; Kuncel et al., 2005; Moore & Rutherfurd, 2020). However, even in such situations, participants do not always overreport their behavior. They report their behaviors and attitudes by distorting them as socially desired, which results in overreporting or underreporting. In fact, in proenvironmental behavioral studies, Delley and Brunner (2018) found that a substantial level of underreporting in food waste behaviors may be occured. Likewise, in communication studies, participants should be highly motivated to engage in underreporting since the excessive use of smartphones or the Internet is not a socially desirable behavior in terms that they are pointed out to be related with addiction (e.g., Chou et al., 2005; Lin et al., 2014, 2015). Therefore, the reason for the lower validity and consequent overreporting in such responses may involve IER, as discussed below.

Insufficient Effort Responding and Response Behavior

A definition of IER is “a response set in which the respondent answers a survey measure with low or little motivation to comply with survey instructions, correctly interpret item content, and provide accurate responses” (Huang et al., 2012, p. 100). Participants who engage in IER may give not only truly random responses but also non-random, careless, or inconsistent ones. Non-random responses here indicate not random but meaningless response patterns, such as straightlining (Herzog & Bachman, 1981), in which participants give identical responses for each item in the scale. In addition, the occasional careless response in some parts of a survey could lead to inconsistencies throughout the survey (Meade & Craig, 2012); these responses lead to the discrepancy between the actual and reported behavior. Such inappropriate responses not only contaminate the data quality but could also improperly capture the construct in which researchers and practitioners are interested. Regarding the survey error, satisficing (Krosnick, 1991; Simon, 1956) has been studied as well, which is almost synonymous with IER, and some treat it as a similar concept (e.g., Baumgartner & Weijters, 2019; Steedle et al., 2019; Ward & Pond, 2015).¹

These factors cause participants’ carelessness since they essentially affect the motivation toward a survey, which is related to participants’ attitudes of whether they respond to the survey with sufficient effort, including deliberation. Therefore, inappropriate responses such as IER occur when participants are not sufficiently motivated. As evidence, they have been shown to complete the survey with a shorter response time (Wise & Kong, 2005). In other words, survey error by IER differs from that by socially desirable responding in which participants read questions and items carefully and give self-disguised answers in order to give a better impression to others (Grau et al., 2019).

The measurement error stemming from IER might harm the validity of the survey (e.g., AlQuraan, 2019; Oppenheimer et al., 2009; Silber et al., 2019) and cause systematic errors. Systematic errors, in particular, have been pointed out to be a risk for overreporting in the results (e.g., Merckelbach et al., 2017; Meyer et al., 2013). Meyer et al. (2013) investigated the impact by dividing the participants into three groups: those who responded cooperatively, those who responded inattentively, and those who responded randomly. They found that the means of the scale scores related to Internet use in the inattentive and random groups were significantly higher than those in the cooperative group.

Thus, IER may have an undesirable impact on survey results, and detection methods can be broadly divided into two types: inserting specific scales or items in the survey and using post-hoc analysis on obtained responses (Huang et al., 2012). The former includes Instructional Manipulation Check (IMC; Oppenheimer et al., 2009; Paas et al., 2018), which detects inappropriate responses by examining whether participants follow the specified instructions; bogus items (Meade & Craig, 2012) and an IER scale (Huang et al., 2015), which obtain responses to items that are obvious or obviously untrue; and self-reports on response quality (Meade & Craig, 2012), asking participants to evaluate the quality of their responses.

The latter includes LongString (Huang et al., 2012; Meade & Craig, 2012), which counts straightlining responses in a scale; a short response time (Huang et al., 2012; Meade & Craig, 2012), which identifies participants who quickly finish the survey by measuring their response time; and consistency indices (Meade & Craig, 2012), which indicate within-person consistency of responses by measuring the correlations among items that are expected to have high similarity. Consistency indices have variations depending on how the items are grouped, such as psychometric antonym (Johnson, 2005; Meade & Craig, 2012), which computes correlation coefficients among negatively related items, and even-odd consistency (Johnson, 2005; Meade & Craig, 2012), which computes among subscales formed by even-odd splits of the scale.

Some researches imply how IER should be addressed. For instance, some approaches, such as changing surveys from an anonymous to non-anonymous format (Meade & Craig, 2012), increasing participants’ internal motivation (Maniaci & Rogge, 2014), presenting warning instructions (Huang et al., 2012; Ward & Pond, 2015), and combining a display of survey administrators such as a virtual human and warning instructions (Ward & Pond, 2015), have been found to reduce IER. These methods are useful in that any of them basically aim to increase participants’ motivations or sense of responsibility, which can be the cause of IER; however, they are, in fact, relatively less feasible for the following reasons. First, signed questionnaires are not used for privacy reasons. Second, market research companies recruit survey participants by providing external incentives, so it is difficult to use internal motivations based on personal interests. Furthermore, Paas et al. (2018) found that cautions or warnings issued at the beginning of a survey, compared to no warnings, reduce the IMC failure rate that investigates whether participants follow the instructions; however, the effect could be lost with repeated surveys.

Research Questions

Based on the research background, we focus on the difference between self-reports in surveys and the actual behavior. This study aims to answer two main research questions (RQs) by focusing on IER as a cause of this discrepancy and a measure to reduce it.

RQ1: To what extent do survey responses for IER participants identified using the IER technique deviate from the actual behavior, compared with those for non-IER participants?

Since this RQ has, to the best of our knowledge, yet to be clarified, both the validity and the extent of systematic bias should be investigated depending on whether the participants engage in IER or not, respectively. The reason IER participants give careless or inconsistent answers in the first place is that they have little interest in the survey or little internal motivation to do so (e.g., Huang et al., 2012, 2015; Meade & Craig, 2012). In this case, since they have little incentive to provide adequate and factual responses, the responses should be more random and inconsistent than those of non-IER participants. Therefore, regarding the survey validity, the following hypothesis is established.

H1:

When comparing survey responses and the actual behavior, the correlation coefficient for the IER participant group would be lower than that for the non-IER one.

Second, although it has already been pointed out that IER as a systematic bias may affect overreporting (Meyer et al., 2013), this result is solely based on a comparison of mean values of the scale between the experimental conditions. Therefore, it is important to compare and verify the survey responses and actual behavior. Comparisons between survey responses and the actual behavior in prior studies show the general tendency of overreporting (e.g., Araujo et al., 2017; Boase & Ling, 2013; Deng et al., 2019; Prior, 2009; Scharkow, 2016). Since the response can be infinite upward while zero-bounded downward (Lee et al., 2000), responses with low validity, such as IER, may have a larger effect on overreporting. Therefore, regarding the extent of systematic bias, another hypothesis is established.

H2:

When comparing survey responses and the actual behavior, the IER participant group would engage in overreporting more than those in the non-IER group.

Although participants identified as using IER are actually excluded from the sample in marketing studies, it is doubtful that they can be easily excluded. Excluding participants from the data can cause selection bias (Lu et al., 2019) or violate the assumption of random sampling (Ward & Pond, 2015) and result in the severe problem of losing the representativeness of the results. Therefore, we build another RQ as follows.

RQ2: Is it possible to exclude IER participants from the analysis without occurring selection bias?

To answer these RQs and hypotheses, we collected two kinds of comparable data of survey responses and the actual behavior, compared them, and examined the relationship with and possibility of excluding IER. For comparisons, we obtained both the behavioral log data of smartphone game applications and survey response data, including the time spent using game applications and the IER scale (Huang et al., 2015), and compared their correlations, means, and distributions. Moreover, this study administered two comparisons in different survey periods to demonstrate that the scale was robust enough to capture inadequate responses.

In addition to the verification above, in case IER is associated with a discrepancy between survey responses and the actual behavior and the exclusion does not cause selection bias, IER should be the explanatory factor for the survey response but not for the behavior log. Lu et al. (2019) argue that exclusion involves selection bias risk since intention or individual attribution differs depending on whether participants engage in IER or not. However, what they actually show in the study is only a difference in survey responses between IER and non-IER, which has yet to be validated using the behavioral log. In other words, the survey accuracy remains unclear, especially for IER participants. Therefore, this study investigates whether IER can explain the usage duration both for the behavior log and survey responses under demographics controlled and aims to provide additional evidence that IER is not related with the actual behavior.

Method and Procedures

An Overview of Data Collection

We acquired two datasets: behavior log and survey response data. Both datasets were collected with the single-source panel (INTAGE Single Source Panel, i-SSP) administered by INTAGE Inc, one of the largest market research firms in Japan. This panel consists of residents in Japan aged 15–69 years, and the market representativeness is ensured by sampling the participants along with the demographic statistics of Internet users in Japan (Nakano & Kondo, 2018).

It should be noted that we were not involved with the management of the survey panel. Although we acquired the common IDs to merge the behavioral log and survey responses in accordance with the company’s privacy policy, no personally identifiable information was obtained.²

Behavior Log Data

First, regarding the behavior log data, participants in the i-SSP who provided their consent in advance to data collection installed an app on their smartphones so that the usage log could be automatically collected. When participants operate their smartphones, the app name and the launch and termination time are recorded so that we can capture what apps, when, and how long participants launch. In this study, as described below, we were provided with the behavior log data for all game apps for each of the three months from February to April 2018 (Term I) and November 2018 to January 2019 (Term II) for 803 participants. However, the log does not contain apps that operate in the background, regardless of the intention of the participants.

Survey Response Data

Second, regarding the survey response data, two online surveys were conducted during May 17–21, 2018 (corresponding to Term I) and during February 1–5, 2019 (Term II). The number of participants who consented, participated the survey, and can be merged with the behavior log was 803 (504 male and 299 female). By age groups and survey periods, the twenties and under are 153 (Term I), 124 (Term II); the thirties are 197, 201; the forties are 197, 201; the fifties are 170, 177; and the sixties are 139, 149, respectively.

In the first survey, in May 2018, participants were asked to declare their average daily use (in minutes to the first decimal place) of smartphone game apps over the last three months,³ corresponding to Term I. “Never” was selected if participants had never played the game apps in the period. Participants completed the survey by answering additional questions about the game apps that are not related to this study. In the second survey, in February 2019, the same participants as in the previous survey answered the same questions about the game app as in Term I and four additional items from the IER scale (Huang et al., 2015). The IER-related questions were interspersed among the unrelated items.

Variable Description

Average Daily Duration and the Difference Between Datasets

To investigate the difference between the actual and reported usage in each term, we calculated the respective usage durations from the behavior log and survey responses. First, we obtained the actual usage by averaging the total duration per day in minutes for each participant and each term of log data for all the apps categorized as a game. Second, we employed the survey responses as self-reported usage in each term of the respective surveys (“Never” was replaced with zero). Finally, the difference was calculated for each term as the subtraction of the actual usage from the self-reported one.

The IER Scale

IER is sometimes identified with multiple indices (e.g., Meade & Craig, 2012); however, it is unfeasible to employ too many methods or to accommodate a too-long scale in the actual survey. Huang et al. (2015) showed that the IER scale, which does not contain many items, is an efficient and suitable index associated with other indices such as psychometric antonym, self-report, and response time and that it triggers fewer negative reactions.⁴ In particular, since participants may engage in multiple surveys or experiments offered by various clients, the fact that the scale prompts few negative emotions is quite important for market research firms or crowd-sourcing such as Amazon MTurk. In addition, a pilot study by Huang et al. (2015) shows that the IER scale is hardly related to social desirability. Therefore, since the IER scale is expected to have high feasibility and consideration for social desirability, our study focused on and adopted it to measure IER.

The IER scale (Huang et al., 2015) is used to investigate participants’ degree of fit to an unfeasible activity from their responses. In other words, it evaluates whether they engage in IER based on to what extent the answers are applicable to unfeasible contents. The second survey administers four items (“I have never used a computer”; “I work twenty-eight hours in a typical work day”; “I am interested in pursuing a degree in managemental genetics”;⁵ and “I can run 3 km in 2 min”)⁶ on a 7-point Likert scale (from “Strongly agree” to “Strongly disagree”).⁷ Cronbach’s $α$ for these items was .85. In this study, participants who selected positive statements (from “Strongly agree” to “Slightly agree”) in at least one of the four items are assumed to engage in IER; others are categorized as non-IER, that is, participants who do not engage in IER.⁸

Demographic Variables

In this study, for the analysis in Table 3, we employed sex, age groups, annual household income, and occupations as demographic variables. In each survey, the marketing research firm provided demographic data for each term. We excluded three participants in the first survey and one participant in the second survey who did not provide annual household income from the analysis.

Results and Discussion

Difference Between Survey Responses and Log Data

In this section, regarding RQ1, we verify H1 with correlation coefficients between survey responses and the behavior log and H2 with means and a scatter diagram including regression lines. It should be noted that, to satisfy the normality assumption, we use adding 1 and logarithmic-transformed values for both actual and self-reported durations.

First, comparing the correlation coefficients between survey responses and the behavior log by groups, IER and non-IER (Table 1

Table 1.

Correlation coefficients of survey responses and the behavior log.

		non-IER			IER			z-value (difference between non-IER and IER)
		Term I		Term II	Term I		Term II	Term I		Term II
		Log	Survey	Log	Log	Survey	Log	Log	Survey	Log
Term I	Survey	.62			.39			2.89 **
Term II	Log	.84	.55		.79	.43		1.26	1.38
Term II	Survey	.62	.83	.63	.42	.71	.48	2.43 *	2.79 **	1.92

Note. Values for behavior log (Term I) in rows and survey response (Term II) in columns are omitted because they are unnecessary.

*p < .05. **p < .01.

), the coefficient values in Term I are r = .39 for IER, while r = .62 for non-IER (z = 2.89, p < .01). They have the similar tendency in Term II with r = .48 for IER and r = .63 for non-IER (z = 1.92, p > .05). The correlation of survey responses between terms are r = .71 for IER and r = .83 for non-IER (z = 2.79, p < .01). Thus, IER tends to reduce the validity and to be less consistent between responses than do non-IER. These results indicate a stronger tendency that IER participants self-report inconsistently than that in previous studies. Therefore, H1, that when comparing survey responses and the actual behavior, the correlation coefficient for the IER participant group would be lower than that for the non-IER one, is supported.

Second, Figure 1, the scatter diagram, shows the relationship between survey responses and the actual behavior. The figure includes censored regression lines with a lower bound of 0, by overall, IER, and non-IER groups, showing that both IER and non-IER participants had an overreporting tendency. While the inclination of the regression line is almost 1 for the non-IER sample (Term I: β_nonIER = 1.01, z = 18.25, p < .01; Term II: β_nonIER = 1.06, z = 19.07, p < .01), that for the IER sample is substantially less than 1 (Term I: β_IER = .66, z = 6.84, p < .01; Term II: β_IER = .81, z = 8.07, p < .01), which indicates that the extent of overreporting tends to be larger with the longer play time for participants who engage in IER.

Figure 1.

Scatter diagram of survey responses and the behavior log (logarithm-converted min).

Furthermore, we compared the means and standard deviations of assessed and actual durations calculated from each datum and the difference (Table 2

Table 2.

Comparison Between Survey Responses and the Behavior Log (min).

	M (SD)				Diff. (SD) (Survey- Log)		paired t-value
	Term I		Term II		Diff. (SD) (Survey- Log)		paired t-value
	Survey	Log	Survey	Log	Term I	Term II	Term I	Term II
Overall (n = 803)	26.8 (58.7)	24.4 (57.9)	26.4 (52.0)	22.5 (57.0)	2.4 (54.1)	3.9 (54.9)	1.27	2.01 *
non-IER (n = 704)	25.9 (60.3)	24.7 (58.5)	26.3 (53.7)	23.4 (58.5)	1.2 (53.6)	3.0 (56.3)	.58	1.40
IER (n = 99)	33.4 (46.0)	22.1 (53.8)	26.6 (38.8)	16.1 (44.1)	11.3 (56.5)	10.5 (44.1)	1.99 *	2.37 *
Diff. (non-IER - IER)	7.6 (58.7)	–2.6 (58.0)	.2 (52.1)	–7.3 (56.9)	10.1 (54.0)	7.5 (54.9)
t-value	1.47	–.41	.05	–1.48	1.75	1.54

Note. Some of the t-test results for the differences between survey responses and the behavior log were performed using the Satterthwaite method since they do not satisfy the equality of variances condition. The t-values calculated with Satterthwaite’s are displayed in italics. All the SD values are pooled standard deviation.

*p < .05. **p < .01.

, Figure 2). The means and standard deviations show a slightly overreporting tendency in the overall sample, especially in Term II, with a significant difference (Term I: M_survey = 26.8 min (58.7), M_log = 24.4 min (57.9), paired t (802) = 1.27, p > .05. Term II: M_survey = 26.4 min (52.0), M_log = 22.5 min (57.0), paired t (802) = 2.01, p < .05). This tendency of overreporting in mobile phone games is consistent with previous studies that have compared actual behavior on the web and mobile, including studies showing the tendency of overreporting in several mobile apps (e.g., Araujo et al., 2017; Boase & Ling, 2013; De Reuver & Bouwman, 2015; Deng et al., 2019; Scharkow, 2016). The more critical results regarding H2 are as follows. The non-IER group does not show a significant difference between the behavior log and survey responses (Term I: M_survey = 25.9 min (60.3), M_log = 24.7 min (58.5), paired t (703) = .58, p > .05. Term II: M_survey = 26.3 min (53.7), M_log = 23.4 min (58.5), paired t (703) = 1.40, p > .05); in contrast, the IER group, shows significant overreporting in both terms (Term I: M_survey = 33.4 min (46.0), M_log = 22.1 min (53.8), paired t (98) = 1.99, p < .05. Term II: M_survey = 26.6 min (38.8), M_log = 16.1 min (44.1), paired t (99) = 2.37, p < .05). The same tendency in both groups can be confirmed in the histograms showing their differences (Figure 2, bottom). The histogram for the non-IER group, compared to the IER group, has a lower proportion of classes with zero difference, and the distribution is more right-skewed. Therefore, H2, that when comparing survey responses and the actual behavior, the IER participant group would engage in overreporting more than those in the non-IER group, is supported. In addition, it has been pointed out that games can lead to addiction and mental disorder (American Psychiatric Association, 2013; WHO, 2021); in terms of social desirability, underreporting is expected. Thus, the overreporting that is more prevalent in the IER group in this study is unlikely to reflect social desirability.

Figure 2.

Histograms of the difference (survey responses minus behavior log).

Difference Within Each Datum: Impact of IER

First, we focus on survey responses and investigate the difference depending on whether participants engaged in IER. According to Table 2, within survey responses, IER made a difference by 7.6 min (Term I) and .2 min (Term II), which is not significant. However, upon closer examination (Figure 3, upper), Term I, in particular, shows the tendency of overreporting for IER participants than that for non-IER ones. The result is also confirmed using the Kolmogorov–Smirnov test (Term I: D = .1701, p < .05. Term II: D = .1354, p > .05).

Figure 3.

Cumulative distributions of the behavior log and survey response by IER status.

Next, we investigated whether the distributions of actual behavior differ between IER and non-IER participants. The mean difference between the groups is not significant (Table 2), and the distributions also remain similar (Figure 3, bottom; Term I: D = .0795, p > .05. Term II: D = .0604, p > .05). Based on these two results, in conclusion, although actual smartphone game usage differs little depending on whether participants engage in IER or not, self-reported usage could be different. Thus, this is the supportive result for H2.

Exclusion of IER and Its Effect

Our findings so far strongly motivate us to exclude IER participants from the analysis. However, even though the distribution of actual behavior is not significantly affected by IER, it still cannot be denied that simply excluding IER participants from the data without much consideration may lead to selection bias and cause deviations of means or distributions. Therefore, in this subsection, we verify RQ2 and, in case the exclusion is not problematic, show the extent to what the exclusion can approximate the responses to actual behavior.

Figure 4 shows the effect of exclusion by comparing the cumulative distributions of actual duration per day for non-IER and overall participants. Both groups have similar distributions in both terms, even with the exclusion of 99 IER-identified participants. The difference is also validated by the Kolmogorov–Smirnov test with bootstrap (resampling 1000 times) since the statistics are not independent, and it is not significant based on whether IER is included in the sample (Term I: D = .0098, p > .05; Term II: D = .0075, p > .05). Therefore, from the results, we can hardly conclude that excluding IER distorts the actual distribution because of selection bias.

Figure 4.

Comparison of the cumulative distributions of actual duration per day between overall and non-IER responses. Note. The two lines of overall and non-IER groups almost overlap since they have similar distributions in both terms.

Next, we show the effect of exclusion. In Table 2, the response error for the overall sample is overreporting in 2.4 min (Term I) and in 3.9 min (Term II); however, that without IER is overreporting in 1.2 min (Term I) and 3.0 min (Term II). By the exclusion of IER, overreporting is reduced by 25%–50% and approximates survey responses to actual behaviors.

Additional Verifications on the Exclusion of IER Participants

Furthermore, we verify whether the results that IER has an effect not on the actual duration but only on the self-reported one, do not change even by controlling demographic attributes. In this verification, we conduct censored regressions with a lower bound of 0 with adding 1 and logarithm-converted values for both actual and self-reported durations as the response variable. In the regression on survey responses, we additionally used the actual duration as the control variable since assessed durations are recalled based on the actual behavior. Consequently (Table 3

Table 3.

Comparison Between Survey Responses and the Behavior Log (logarithm-converted min).

Note. Coefficients of categorical variables (age, income, and occupation) are computed from the baseline “Over 60,” “Over 900,” and “Not employed,” respectively. *p < .05. **p < .01.

), IER cannot be an explanatory factor for the actual duration in both terms. That is to say, the IER attribute is not associated with the actual behavior, even if demographics are controlled for. In contrast, IER becomes a significant positive factor for the survey responses in both terms. These results mean that, in contrast to prior studies implying the risk of selection bias by excluding IER (Lu et al., 2019), the exclusion does not result in selection bias since the actual behavior is not related to IER even if the distributions of survey responses change depending on IER or not. Thus, regarding RQ2, we demonstrated that excluding participants who engage in IER does not lead to selection bias by the comparison of the distributions before and after the exclusion and the verification with regression.⁹

Conclusions

In this study, we focus on the difference between survey responses and the actual behavior and the associations with IER. By comparing the differences, we found some interesting results associated with IER and the possibility of excluding such participants. To the best of our knowledge, this is the first study to verify whether the extent of the difference between the behavior log and survey responses changes depending on participants’ IER; hence, this study will be a bridge between two research fields: the difference between surveys and log data and IER.

Based on the findings of this study, in case the behavior log is unavailable for the survey period, we recommend identifying and excluding participants who engage in IER with using indices such as the IER scale (e.g., Huang et al., 2015). This is because, in the first place, participants tend to overreport in survey responses. In details, the tendency of overreporting deteriorates, especially for IER-identified participants; furthermore, the validity problem, meaning that the correlation between survey responses and the actual behavior decreases for such participants compared to those for non-IER ones, is confirmed. However, even if the distribution of survey responses differs depending on IER, that of the behavior log remains the same. In addition, even if the distribution of the behavior log remains constant before and after the exclusion and IER can be the explanatory factor for the deviation of survey responses, it cannot be the explanatory factor for that of actual behavior. These can be additional evidence for us to comprehend that the selection bias stemming from IER does not distort the actual behavior.

Theoretical Implications

This study has several academic implications. First, prior studies comparing self-reports and the actual behavior in web and mobile use have mainly reported either overreporting or moderate correlations or both (e.g., Araujo et al., 2017; Boase & Ling, 2013; De Reuver & Bouwman, 2015; Deng et al., 2019; Scharkow, 2016). Our findings on mobile games are not only consistent with such results but also extend them by identifying IER as a cause of bias and low validity. Regarding the possibility of social desirability influencing the results, in the first place, it has been pointed out that excessive game playing may lead to addiction and mental disorder (American Psychiatric Association, 2013; WHO, 2021); for this reason, underreporting would be rather expected in terms of social desirability. In addition, since social desirability bias is different from IER in that participants respond after carefully reading the questions (Grau et al., 2019), it is unlikely that the IER scale captures it. Since the response can be infinite upward while zero-bounded downward (Lee et al., 2000), such overreporting can stem rather from responses with low validity, such as IER. In fact, IER does not distort the responses and behaviors in a specific direction based on beliefs or characteristics, but it rather disrupts responses randomly due to the lack of incentives to answer accurately, and it is not associated with actual behavior. Our result that the exclusion of IER does not cause selection bias may stem from these reasons.

Second, although prior IER-related studies have proposed or compared methods (e.g., Huang et al., 2015; Maniaci & Rogge, 2014; Meade & Craig, 2012), they generally have not shown the extent to which the survey responses by the participants engaging in IER are distorted against actual behavior because they have not conducted validation using behavior log data. Our finding that the IER scale (Huang et al., 2015), which is associated with many IER indices, captures the participants who give responses that deviate from their actual behavior strongly supports the validity of the extant literature dealing with IER.

Third, although many methods to reduce IER have been proposed (e.g., Maniaci & Rogge, 2014; Meade & Craig, 2012; Paas et al., 2018; Ward & Pond, 2015), there remain problems in that some are difficult to implement while, as Paas et al. (2018) mentioned, others lose their effectiveness with repeated surveys. Therefore, many studies in marketing or consumer behavior studies have attempted to exclude or screen IER from their analyses as the most practical approach (e.g., Beck et al., 2020; Bond et al., 2019; Consiglio & van Osselaer, 2019; Fernandes et al., 2016; Jung et al., 2022; Klein et al., 2019; Laran et al., 2019; Yang et al., 2018); in fact, it has been pointed out that the results lose their statistical representativeness because of selection bias (e.g., Lu et al., 2019; Ward & Pond, 2015). However, our findings that the behavior log distribution remains constant with the exclusion of IER participants and that the actual behaviors do not change even though survey responses do, depending on IER (e.g., Lu et al., 2019), increases the validity of such survey studies by easing the concerns over excluding IER participants.

Practical Implications

We also propose several implications for practical marketing research and social surveys. First, regarding the survey, it goes without saying that ensuring data quality must be one of the vital matters of concern. However, if IER participants should occur, excluding them without much consideration would imply selection bias, while not doing so would mean the acceptance of possible data distortion. However, the utility of excluding them ensured by this study should alleviate such problems, including selection bias.

Second, when the operational company or researchers conduct the survey or the experiment, they often commission the contractors, such as market research firms and crowd-sourcing. Market research firms are engaged in various initiatives to maintain and improve quality on a daily basis in order, for instance, to ensure the market representativeness. In contrast, crowd-sourcing, which has been widely utilized in academic research, recruits workers intending to outsource various tasks not limited to the survey and the experiment. The IER scale, whose usefulness was found in this study, may be able to be utilized in further efforts in quality management. For example, in case of conducting independent surveys or surveys intended for quality control, IER participants can be identified in advance by adding the IER scale. Based on the responses of the scale, they can suspend requesting the survey to such participants, provide such responses merged with survey data collected at the request of researchers and operational companies, and provide the survey data with selectively excluded such participants. In addition, these efforts would lead to preventing survey participants from repeatedly answering IER-related questions in many surveys (In fact, many empirical studies using the means such as crowd-sourcing adopt the questions intended to identify inappropriate responses). This could lighten the workload of answering the IER-related questions many times.

Finally, regarding the operational company, in case of directly conducting a survey involving own customers or members, it would be preferable for marketers to create a survey including the IER scale (the same is required for the entities that conduct social surveys as well). If a concern arises on data quality, such as that the results differ depending on whether participants engage in IER or not, they can decide to exclude IER participants.

Limitations and Future Research

Our study has a few limitations. First, the results presented are dependent on the IER scale. Although we chose a comprehensive scale (Huang et al., 2015) associated with many other IER indices, the differences between survey responses and actual behavior or the extent of the bias due to IER exclusion might change if applied to other methods. In the future, continuous examinations using other key IER indices are needed to investigate the differences and the possibility of excluding IER.

Second, IER in this study mainly captures overreporting. We discussed the possibility of overreporting due to insufficient motivation and willingness to provide accurate answers among IER participants. However, it is also possible that low-motivation participants processed information with insufficient memory retrieval or availability heuristics. In that case, IER may reflect “creation,” in which participants create imaginary facts (e.g., participants mistakingly answer other people’s game use or their own non-mobile game use for their proper use), or “telescoping” (Malhotra, 2009), in which participants recognize out-of-period behavior as in-period. In contrast, behaviors such as “forgetting” or “omission,” which lead to under-reporting, might not have been captured with the IER in this study. These effects of memory and information-processing systems on IER and response errors must be further investigated. Finally, although this study compares behavior log data and survey responses, we focus on smartphone game use. Therefore, future studies need to explore the impact of excluding IER in other fields, for example, by comparing purchase logs, such as scanner panel data that compile entire purchase records, with survey responses.

Footnotes

Acknowledgments

We were provided behavioral log data (i-SSP) from INTAGE Inc.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the JSPS KAKENHI Grant Numbers JP18H03209, JP19K13826, JP20H01545, JP22K13495.

ORCID iDs

Makito Takeuchi

Junichiro Niimi

Takahiro Hoshino

Notes

References

AlQuraan

(2019). The effect of insufficient effort responding on the validity of student evaluation of teaching. Journal of Applied Research in Higher Education, 11(3), 604–615. https://doi.org/10.1108/JARHE-03-2018-0034.

American Psychiatric Association . (2013). Diagnostic and statistical manual of mental disorders (5th ed.).

Araujo

Wonneberger

Neijens

de Vreese

(2017). How much time do you spend online? Understanding and improving the accuracy of self-reported measures of internet use. Communication Methods and Measures, 11(3), 173–190. https://doi.org/10.1080/19312458.2017.1317337.

Baumgartner

Steenkamp

J. -B. E. M.

(2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research, 38(2), 143–156. https://doi.org/10.1509/jmkr.38.2.143.18840.

Baumgartner

Weijters

(2019). Measurement in marketing. Foundations and Trends® in Marketing, 12(4), 278–400. https://doi.org/10.1561/1700000058.

Beck

J. T.

Rahinel

Bleier

(2020). Company worth keeping: Personal control and preferences for brand leaders. Journal of Consumer Research, 46(5), 871–886. https://doi.org/10.1093/jcr/ucz040.

Boase

Ling

(2013). Measuring mobile phone use: Self-report versus log data. Journal of Computer-Mediated Communication, 18(4), 508–519. https://doi.org/10.1111/jcc4.12021.

Bond

S. D.

S. X.

Wen

(2019). Speaking for “free”: Word of mouth in free- and paid-product settings. Journal of Marketing Research, 56(2), 276–290. https://doi.org/10.1177/0022243718821957.

Bowling

N.A.

Huang

J. L.

Bragg

C. B.

Khazon

Liu

Blackmore

C. E.

(2016). Who cares and who is careless? Insufficient effort responding as a reflection of respondent personality. Journal of Personality and Social Psychology, 111(2), 218–229. https://doi.org/10.1037/pspp0000085.

10.

Brosnan

Babakhani

Dolnicar

(2019). “I know what you’re going to ask me” why respondents don’t read survey questions. International Journal of Market Research, 61(4), 366–379. https://doi.org/10.1177/1470785318821025.

11.

Chen

R. S.

C. H.

(2015). Investigating the relationship between thinking style and personal electronic device use and its implications for academic performance. Computers in Human Behavior, 52, 177–183. https://doi.org/10.1016/j.chb.2015.05.042.

12.

Chou

Condron

Belland

J. C.

(2005). A review of the research on internet addiction. Educational Psychology Review, 17(4), 363–388. https://doi.org/10.1007/s10648-005-8138-1.

13.

Cole

J. S.

Gonyea

R. M.

(2010). Accuracy of Self-reported SAT and ACT test scores: Implications for research. Research in Higher Education, 51(4), 305–319. https://doi.org/10.1007/s11162-009-9160-9.

14.

Collopy

(1996). Biases in retrospective self-reports of time use: An empirical study of computer users. Management Science, 42(5), 758–767. https://doi.org/10.1287/mnsc.42.5.758.

15.

Consiglio

van Osselaer

S. M. J.

(2019). The devil you know: Self-esteem and switching responses to poor service. Journal of Consumer Research, 46(3), 590–605. https://doi.org/10.1093/jcr/ucz001.

16.

Corral-Verdugo

(1997). Dual “realities” of conservation behavior: Self-reports vs observations of re-use and recycling behavior. Journal of Environmental Psychology, 17(2), 135–145. https://doi.org/10.1006/jevp.1997.0048.

17.

Corral-Verdugo

Bechtel

R. B.

Fraijo-Sing

(2003). Environmental beliefs and water conservation: An empirical study. Journal of Environmental Psychology, 23(3), 247–257. https://doi.org/10.1016/S0272-4944(02)00086-5.

18.

Corral-Verdugo

Figueredo

A. J.

(1999). Convergent and divergent validity of three measures of conservation behavior: The multitrait-multimethod approach. Environment and Behavior, 31(6), 805–820. https://doi.org/10.1177/00139169921972353.

19.

Croteau

A. M.

Dyer

Miguel

(2010). Employee reactions to paper and electronic surveys: An experimental comparison. IEEE Transactions on Professional Communication, 53(3), 249–259. https://doi.org/10.1109/TPC.2010.2052852.

20.

Curran

P. G.

(2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006.

21.

Delley

Brunner

T. A.

(2018). Household food waste quantification: Comparison of two methods. British Food Journal, 120(7), 1504–1515. https://doi.org/10.1108/BFJ-09-2017-0486.

22.

Deng

Kanthawala

Meng

Peng

Kononova

Hao

Zhang

David

(2019). Measuring smartphone usage and task switching with log tracking and self-reports. Mobile Media & Communication, 7(1), 3–23. https://doi.org/10.1177/2050157918761491.

23.

De Reuver

Bouwman

(2015). Dealing with self-report bias in mobile Internet acceptance and usage studies. Information Management, 52(3), 287–294. https://doi.org/10.1016/j.im.2014.12.002.

24.

DeSimone

J. A.

Harms

P. D.

(2018). Dirty data: The effects of screening respondents who provide low-quality data in survey research. Journal of Business and Psychology, 33(5), 559–577. https://doi.org/10.1007/s10869-017-9514-9.

25.

Dunn

A. M.

Heggestad

E. D.

Shanock

L. R.

Theilgard

(2018). Intra-individual response variability as an indicator of insufficient effort responding: Comparison to other indicators and relationships with individual differences. Journal of Business and Psychology, 33(1), 105–121. https://doi.org/10.1007/s10869-016-9479-0.

26.

Dunnette

M. D.

(1952). Accuracy of students’ reported honor point averages. Journal of Applied Psychology, 36(1), 20–22. https://doi.org/10.1037/h0054387.

27.

Elimelech

Ert

Ayalon

(2019). Exploring the drivers behind self-reported and measured food wastage. Sustainability, 11(20), 5677. https://doi.org/10.3390/su11205677.

28.

Fernandes

Puntoni

van Osselaer

S. M. J.

Cowley

(2016). When and why we forget to buy. Journal of Consumer Psychology, 26(3), 363–380. https://doi.org/10.1016/j.jcps.2015.06.012.

29.

Goodman

J. K.

Paolacci

(2017). Crowdsourcing consumer research. Journal of Consumer Research, 44(1), 196–210. https://doi.org/10.1093/jcr/ucx047.

30.

Grau

Ebbeler

Banse

(2019). Cultural differences in careless responding. Journal of Cross-Cultural Psychology, 50(3), 336–357. https://doi.org/10.1177/0022022119827379.

31.

Groves

R. M.

(2011). Three eras of survey research. Public Opinion Quarterly, 75(5), 861–871. https://doi.org/10.1093/poq/nfr057.

32.

Herzog

A.R.

Bachman

J.G.

(1981). Effects of questionnaire length on response quality. Public Opinion Quarterly, 45(4), 549–559. https://doi.org/10.1086/268687.

33.

Huang

J. L.

Bowling

N. A.

Liu

(2015). Detecting insufficient effort responding with an infrequency scale: Evaluating validity and participant reactions. Journal of Business and Psychology, 30(2), 299–311. https://doi.org/10.1007/s10869-014-9357-6.

34.

Huang

J. L.

Curran

G. C.

Keeney

Poposki

E. M.

DeShon

R. P.

(2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27(1), 99–114. https://doi.org/10.1007/s10869-011-9231-8.

35.

Jia

Konold

T. R.

Cornell

Huang

(2018). The impact of validity screening on associations between self-reports of bullying victimization and student outcomes. Educational and Psychological Measurement, 78(1), 80–102. https://doi.org/10.1177/0013164416671767.

36.

Johnson

J. A.

(2005). Ascertaining the validity of individual protocols from web-based personality inventories. Journal of Research in Personality, 39(1), 103–129. https://doi.org/10.1016/j.jrp.2004.09.009.

37.

Junco

(2013). Comparing actual and self-reported measures of Facebook use. Computers in Human Behavior, 29(3), 626–631. https://doi.org/10.1016/j.chb.2012.11.007.

38.

Jung

Peck

Palmeira

Kim

(2022). An unintended consequence of product upgrades: How upgrades can make current consumers feel left behind. Journal of Marketing Research, 59(5), 1019–1039. https://doi.org/10.1177/00222437221078551.

39.

Kees

Berry

Burton

Sheehan

(2017). An analysis of data quality: Professional panels, student subject pools, and amazon’s mechanical turk. Journal of Advertising, 46(1), 141–155. https://doi.org/10.1080/00913367.2016.1269304.

40.

Kirk

B. A.

Sereda

(1969). Accuracy of self-reported college grade averages and characteristics of non and discrepant reporters. Educational and Psychological Measurement, 29(1), 147–155. https://doi.org/10.1177/001316446902900110.

41.

Klein

Völckner

Bruno

H. A.

Sattler

Bruno

(2019). Brand positioning based on brand image–country image fit. Marketing Science, 38(3), 516–538. https://doi.org/10.1287/mksc.2019.1151.

42.

Kormos

Gifford

(2014). The validity of self-report measures of proenvironmental behavior: A meta-analytic review. Journal of Environmental Psychology, 40, 359–371. https://doi.org/10.1016/j.jenvp.2014.09.003.

43.

Kosinski

Stillwell

Graepel

(2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110(15), 5802–5805. https://doi.org/10.1073/pnas.1218772110.

44.

Kreuter

Haas

G. C.

Keusch

Bähr

Trappmann

(2020). Collecting survey and smartphone sensor data with an app: Opportunities and challenges around privacy and informed consent. Social Science Computer Review, 38(5), 533–549. https://doi.org/10.1177/0894439318816389.

45.

Krosnick

J. A.

(1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology, 5(3), 213–236. https://doi.org/10.1002/acp.2350050305.

46.

Kuncel

N. R.

Credé

Thomas

L. L.

(2005). The validity of self-reported grade point averages, class ranks, and test scores: A meta-analysis and review of the literature. Review of Educational Research, 75(1), 63–82. https://doi.org/10.3102/00346543075001063.

47.

Laran

Janiszewski

Salerno

(2019). Nonconscious nudges: Encouraging sustained goal pursuit. Journal of Consumer Research, 46(2), 307–329. https://doi.org/10.1093/jcr/ucy071.

48.

Lee

M. Y.

Toh

R. S.

(2000). Are consumer survey results distorted? Systematic impact of behavioral frequency and duration on survey response errors. Journal of Marketing Research, 37(1), 125–133. https://doi.org/10.1509/jmkr.37.1.125.18724.

49.

Lin

Y. H.

Chang

L. R.

Lee

Y. H.

Tseng

H. W.

Kuo

T. B.

Chen

S. H.

(2014). Development and validation of the smartphone addiction inventory (SPAI). PLoS ONE, 9(6), e98312. https://doi.org/10.1371/journal.pone.0098312.

50.

Lin

Y. H.

Lin

Y. C.

Lee

Y. H.

Lin

P. H.

Lin

S. H.

Chang

L. R.

Tseng

H. W.

Yen

L. Y.

Yang

C. C. H.

Kuo

T. B.

(2015). Time distortion associated with smartphone addiction: Identifying smartphone addiction via a mobile application (App). Journal of Psychiatric Research, 65, 139–145. https://doi.org/10.1016/j.jpsychires.2015.04.003.

51.

Liu

Wronski

(2018). Trap questions in online surveys: Results from three web survey experiments. International Journal of Market Research, 60(1), 32–49. https://doi.org/10.1177/1470785317744856.

52.

Wang

Lin

Wang

Zhou

(2019). Inequalities in the health survey using validation question to filter insufficient effort responding: Reducing overestimated effects or creating selection bias? International Journal for Equity in Health, 18, 131. https://doi.org/10.1186/s12939-019-1030-2.

53.

Malhotra

(2008). Completion time and response order effects in web surveys. Public Opinion Quarterly, 72(5), 914–934. https://doi.org/10.1093/poq/nfn050.

54.

Malhotra

N. K.

(2009). Marketing research: An applied orientation (6th ed.). Pearson Education.

55.

Maniaci

M. R.

Rogge

R. D.

(2014). Caring about carelessness: Participant inattention and its effects on research. Journal of Research in Personality, 48, 61–83. https://doi.org/10.1016/j.jrp.2013.09.008.

56.

Mayer

R. E.

Stull

A. T.

Campbell

Almeroth

Bimber

Chun

Knight

(2007). Overestimation bias in self-reported SAT scores. Educational Psychology Review, 19(4), 443–454. https://doi.org/10.1007/s10648-006-9034-z.

57.

McKay

A. S.

Garcia

D. M.

Clapper

J. P.

Shultz

K. S.

(2018). The attentive and the careless: Examining the relationship between benevolent and malevolent personality traits with careless responding in online surveys. Computers in Human Behavior, 84, 295–303. https://doi.org/10.1016/j.chb.2018.03.007.

58.

Meade

A. W.

Craig

S. B.

(2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437–455. https://psycnet.apa.org/doi/10.1037/a0028085.

59.

Merckelbach

Boskovic

Pesy

Dalsklev

Lynn

S. J.

(2017). Symptom overreporting and dissociative experiences: A qualitative review. Consciousness and Cognition, 49, 132–144. https://doi.org/10.1016/j.concog.2017.01.007.

60.

Meyer

J. F.

Faust

K. A.

Faust

Baker

A. M.

Cook

N. E.

(2013). Careless and random responding on clinical and research measures in the addictions: A concerning problem and investigation of their detection. International Journal of Mental Health and Addiction, 11(3), 292–306. https://doi.org/10.1007/s11469-012-9410-5.

61.

Meyvis

van Osselaer

S. M. J.

(2018). Increasing the power of your study by increasing the effect size. Journal of Consumer Research, 44(5), 1157–1173. https://doi.org/10.1093/jcr/ucx110.

62.

Moore

H. E.

Rutherfurd

I. D.

(2020). Researching agricultural environmental behaviour: Improving the reliability of self-reporting. Journal of Rural Studies, 76, 296–304. https://doi.org/10.1016/j.jrurstud.2020.04.012.

63.

Nakano

Kondo

F. N.

(2018). Customer segmentation with purchase channels and media touchpoints using single source panel data. Journal of Retailing and Consumer Services, 41, 142–152. https://doi.org/10.1016/j.jretconser.2017.11.012.

64.

Nenycz-Thiel

Beal

Ludwichowska

Romaniuk

(2013). Investigating the accuracy of self-reports of brand usage behavior. Journal of Business Research, 66(2), 224–232. https://doi.org/10.1016/j.jbusres.2012.07.016.

65.

Oppenheimer

D. M.

Meyvis

Davidenko

(2009). Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology, 45(4), 867–872. https://doi.org/10.1016/j.jesp.2009.03.009.

66.

Paas

L. J.

Dolnicar

Karlsson

(2018). Instructional manipulation checks: A longitudinal analysis with implications for mturk. International Journal of Research in Marketing, 35(2), 258–269. https://doi.org/10.1016/j.ijresmar.2018.01.003.

67.

Perry

J. D.

(1940). The reliability of high school averages computed from students’ estimates of their high school grades. School and Society, 52, 63–64.

68.

Prior

(2009). The immensely inflated news audience: Assessing bias in self-reported news exposure. Public Opinion Quarterly, 73(1), 130–143. https://doi.org/10.1093/poq/nfp002.

69.

Revilla

Ochoa

(2015). What are the links in a web survey among response time, quality, and auto-evaluation of the efforts done? Social Science Computer Review, 33(1), 97–114. https://doi.org/10.1177/0894439314531214.

70.

Scharkow

(2016). The accuracy of self-reported internet use: A validation study using client log data. Communication Methods and Measures, 10(1), 13–27. https://doi.org/10.1080/19312458.2015.1118446.

71.

Silber

Danner

Rammstedt

(2019). The impact of respondent attentiveness on reliability and validity. International Journal of Social Research Methodology, 22(2), 153–164. https://doi.org/10.1080/13645579.2018.1507378.

72.

Simon

H.A.

(1956). Rational choice and the structure of the environment. Psychological Review, 63(2), 129–138. https://doi.org/10.1037/h0042769.

73.

Steedle

J. T.

Hong

Cheng

(2019). The effects of inattentive responding on construct validity evidence when measuring social–emotional learning competencies. Educational Measurement: Issues and Practice, 38(2), 101–111. https://doi.org/10.1111/emip.12256.

74.

Stier

Breuer

Siegers

Thorson

(2020). Integrating survey data and digital trace data: Key issues in developing an emerging field. Social Science Computer Review, 38(5), 503–516. https://doi.org/10.1177/0894439319843669.

75.

Stopczynski

Sekara

Sapiezynski

Cuttone

Madsen

M. M.

Larsen

J. E.

Lehmann

(2014). Measuring large-scale social networks with high resolution. PLoS One, 9(4), e95978. https://doi.org/10.1371/journal.pone.0095978.

76.

Vanden Abeele

Beullens

Roe

(2013). Measuring mobile phone use: Gender, age and real usage level in relation to the accuracy and validity of self-reported mobile phone use. Mobile Media and Communication, 1(2), 213–236. https://doi.org/10.1177/2050157913477095.

77.

Ward

M.K.

Meade

A.W.

Allred

C. M.

Pappalardo

Stoughton

J.W.

(2017). Careless response and attrition as sources of bias in online survey assessments of personality traits and performance. Computers in Human Behavior, 76, 417–430. https://doi.org/10.1016/j.chb.2017.06.032.

78.

Ward

M.K.

Pond

S.B.

(2015). Using virtual presence and survey instructions to minimize careless responding on internet-based surveys. Computers in Human Behavior, 48, 554–568. https://doi.org/10.1016/j.chb.2015.01.070.

79.

Weijters

Baumgartner

(2012). Misresponse to reversed and negated items in surveys: A review. Journal of Marketing Research, 49(5), 737–747. https://doi.org/10.1509/jmr.11.0368.

80.

WHO. (2021). 6C51 gaming disorder. International Classification of Diseases, Eleventh Revision (ICD-11). World Health Organization (WHO) . https://icd.who.int/browse11/l-m/en#/http%3a%2f%2fid.who.int%2ficd%2fentity%2f1448597234 (Accessed 19 Apr 2023).

81.

Wise

S. L.

Kong

(2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 163–183. https://doi.org/10.1207/s15324818ame1802_2.

82.

Yang

L. C.

Toubia

de Jong

M. G.

(2018). Attention, information processing, and choice in incentive-aligned choice experiments. Journal of Marketing Research, 55(6), 783–800. https://doi.org/10.1177/0022243718817004.