Cognitive Dyadic Measurements: A Game-Changer? Construction and First Validation of Three Cognitively Demanding Competitive Tasks

Abstract

Competition among individuals is a natural mode of determining who is fittest. While in nature, economics, and sports, it is common to infer ability or aptitude from the outcome of competitions, our knowledge on its effects in regard to psychological/educational assessment is scarce. In the present pilot study, we explore a measurement approach for assessing individual differences in interpersonal, face-to-face competitions, based on a set of cognitively demanding, competitive, fast-paced, two-opponent tasks. For initial task evaluation, we conducted comprehensive reliability and construct validation analyses, considering cognitive ability, motivation, and personality measures. Moreover, using structural equation models we conducted a simultaneous factorization of the tasks with the other validation measures. The results suggest that the newly developed tasks measure both cognitive ability (intelligence) as well as a competition-specific component. The competition-specific component was positively associated with experience in competitive gaming and negatively correlated with neuroticism. While the pattern of validities was promising, the measurements’ reliabilities were yet unsatisfactory. Implications for future research as well as the design of competition-based measurements are discussed.

Keywords

Competitive measurement cognitive testing gamification education validation

Introduction

Interpersonal competition, in which two or more persons strive for a goal, which cannot be shared, is a natural mode of human interaction. Situations fitting this pattern occur frequently in various professional and private settings. Furthermore, competition is a common element of games as well as an effective gamification element in learning settings (e.g., Sailer & Homner, 2020).

While specific characteristics of the situation as well as the type of goal may differ, succeeding in competitive situations is often considered as an indicator of success in general. This conclusion appears natural, because––from an evolutionary perspective––asserting oneself in competitions for resources is widely considered as an essential aspect for the “success” of a species (e.g., Christiansen & Loeschcke, 1990).

Although competition is not the only way in which success may be achieved, it is, nevertheless, a dominant mode in specific situations. Particularly in professional settings––such as in the fields of military, law, economy, or sports––succeeding in interpersonal competitions can account for a large proportion of professional success in general.

Clearly, the probability of succeeding in competitive situations differs between individuals. To some extent, these differences may be attributed to factors that are independent from the competitive nature of a situation, such as abilities or traits required for a specific operation or task. However, some contributing factors may be specific to competitive settings. While there is some evidence pointing to the unique influences of competitive elements on performance in interactive dyads (Munkes & Diehl, 2003), further information on their structure, suitable measurement approaches, as well as the actual relevance are not available.

In general, despite some singular results (Berge et al., 2015), little is known about competition-specific individual performance differences. Previous research has focused more on individual differences with respect to preferences for competitive situations (Balafoutas et al., 2012; Ross et al., 2003) or differences in competition motivation (Franken & Brown, 1995). In fact, neither theoretical foundations nor measurement approaches addressing competition-specific individual differences are yet well-developed. However, we consider it essential to gain a deeper understanding of the potential impact of such individual differences because competition is often a natural element inherent in situation in which human performance is assessed. This understanding is not only necessary for a more profound comprehension of the construct, but also for assessing its actual relevance, which is currently unclear as well.

The present study, therefore, aims to address this lack of knowledge by investigating an operationalization approach based on a set of competitive two-opponent tasks. Because neither structure nor measurement approaches are established, this study is intended as a pilot study that explores these two aspects simultaneously.

Competition-Specific Measurements

Clearly, a necessary step for investigating competition-specific individual differences is the development of viable measurement approaches. However, this poses particular challenges, because of the absence of a well-defined theoretical foundation. For one, a competitive situation is always interwoven with specifics tasks or operations that might require different sets of abilities or skills. Whether or not they might interact with competitive features is unclear. Moreover, properties of the competition situation itself might affect outcomes. Specifically, the consequence of winning or losing, whether competitions are face-to-face or not, and the number of competing individuals/parties may yield different effects, limiting the generalizability of singular settings. Also, it is important to consider the measurement’s suitability for standardization.

As guiding principles for selecting specific features for measurement development, we focused on (a) what might be relevant from the perspective of psychological/educational assessment, (b) which operationalization is economic and therefore viable, and (c) how measurements might relate to non-competitive approaches.

With regard to relevance, we consider it desirable to measure success in direct (face-to-face), dyadic, and cognitively demanding tasks, because various daily interactions comprise these components. It appears, furthermore, unlikely that these components are easily addressed by conventional tests.

With regard to operationalization, there are several options. One might, for instance, consider an observational approach, rating the success in actual competitive situations. However, objectivity and economic feasibility might be limited. An approach allowing for more standardized measurements may be found in the intersection of board games and test gamification.

In the context of board games, cognitively demanding dyadic competitions have long-since been a traditional mode. Chess, for instance, has (at least in public perception) been considered as a suitable training as well as assessment tool for strategic and military thinking (Burgoyne et al., 2016). Equally important, board games usually comprise standardized instructions and scoring rules and their materials can be similar to those of cognitive ability tests, which facilitates contrasting competitive and non-competitive measurements.

Introducing game-like attributes (such as points, leaderboards or competition) to an assessment or test context has gained some popularity in the past years and is referred to as gamification (e.g., Deterding et al., 2011; Rabah et al., 2018). As Armstrong et al. (2016) point out: “A gamified assessment is not a stand-alone game, but it is instead an existing form of assessment that has been enhanced with the addition of game elements” (p. 672). Commonly, gamification strategies are used for enhancing participants’ motivation and engagement (e.g., Berg, 2021; Cechanowicz et al., 2013; Looyestyn et al., 2017; Sailer & Homner, 2020) rather than for assessing their differential impact. Consequently, while gamification has been considered in many studies (e.g., Dicheva et al., 2015; Koivisto & Hamari, 2019; Rabah et al., 2018; Sailer & Homner, 2020), validations of individual differences due to game-like components are virtually nonexistent. However, the differential perspective is certainly important, because it is likely that measures are considerably altered by gamification (Landers et al., 2020). In addition, in application-oriented settings, little attention is paid to the psychometric properties of gamified assessments although these are the basis for any successful use (Armstrong et al., 2016; Nikolaou et al., 2019).

We conjecture that combining features of competitive board games, such as alternating moves with an opponent including an objective-based scoring-system, with cognitively demanding operations that relate to contents of standardized test procedures, could be a viable measurement approach for competition-specific individual differences (Lumsden et al., 2016). However, merely using existing games, such as Scrabble, as a measurement approach might be problematic for different reasons. For one, single matches often take a long time and are therefore uneconomical. Moreover, instructions are often complex and their presentation is not standardized between games. Also, cognitive components related to gameplay, such as figural, verbal, or numeric content domains, are usually not clearly and definably implemented and may be confounded with other features of the game. Accordingly, for the present study, we decided to create three separate tasks that are more suitable for assessment. In these tasks, two opponents compete in a face-to-face manner. Task contents were either predominantly figural, verbal, or numeric, relating to the content domains typical in intelligence tests (Jäger, 1984; Valerius & Sparfeldt, 2014). Other features, such as instructions, duration and other materials (as far as feasible) were matched.

We refer to the new tasks as cognitive dyadic measurements (CDM). Essentially, CDM are closely related to conventional dyadic board games. Their tasks are restricted by clear rules, defining potential actions, their sequences, and scoring. For obtaining multiple measurements within a limited time frame, single matches have strict time limits. Individual scores are obtained as average scores over multiple CDM matches, whereby each match is treated as the equivalent of one test-item, however, instead of the item content, only the opponent changes. More specific details about these features and the reasons for selecting them are given in the “Methods” section.

While CDM may only represent one of various measurement options, we think that their relation to both, board games and traditional ability tests, may be particularly useful. Specifically, their close relation to cognitive operations may facilitate analyzing competition-specific individual differences, by enabling us to isolate them statistically from cognitive ability. Furthermore, CDM allow for a high standardization and, because of their relation to board games, they offer a rich source of inspiration for task development.

Properties of CDM-Type Measures

While dyadic competitions are a well-established mode of assessment in other fields, they are only rarely used for cognitive/psychological measurements. In fact, competitive settings in general are scarce in this context. Exceptions are (group) discussions as part of assessment centers or, to a minor extent, gamified test materials with a competitive component.

This raises the question what specific properties CDM might have and whether the apparent neglect of such approaches might be justified. The amount of information for answering these questions is rather limited. However, some properties may be inferred from the board game and gamification literature.

Clearly, CDM are only useful if they yield incremental validity over already established measures. Because of the CDM’s cognitive component, a substantial amount of their variance may be explained by general mental ability (GMA). However, this assumption is not supported by prior evidence. On the contrary, cognitively demanding competitive games yielded only small correlation with GMA. A meta-analysis by Burgoyne et al. (2016) reports that correlations between GMA and chess scores are averagely 0.25. This result was roughly equivalent in experienced and unexperienced player collectives. A singular non-chess result by Lim and Furnham (2018), who considered the verbal game Taboo, reported a similarly small correlation with Raven’s progressive matrices of 0.33.

One might conjecture that these small correlations with GMA might result from the fact that test item content and competitive (game) tasks do not bear a strong resemblance. However, this conjecture appears unlikely because games without a human opponent usually yielded medium to large correlations with GMA (e.g., Buford & O’Leary, 2015; Quiroga et al., 2016; Sobczyk et al., 2015). Quiroga et al. (2015) even argue convincingly that typical game contents may be a viable replacement for conventional GMA measures, reporting a correlation of r = .93 between latent variables representing GMA and game scores.

So why do games with human opponents yield such poor correlations with GMA, although typical game tasks appear to represent GMA quite well? We conjecture that this may either be the result of (a) a lack of reliability or (b) differing construct validity.

Reliabilities may be comparably decreased because a computer opponent may act more standardized and, therefore, more reliable than continuously changing (and also developing) human opponents. Differences in validity, on the other hand, may be attributed to the additional social component in face-to-face competitions.

To the best of our knowledge, previous research on measures obtained in face-to-face competitions did not report reliabilities. One reason for this might be that most measures of internal consistency are not applicable to a competitive assessment situation, having opponents instead of items, including the partially dynamic way in which opponents are chosen.

With respect to validity, results, other than correlations with GMA, are rare.

Nomological Net

The forced social comparison, which is an essential component of competitions (e.g., Johnson & Johnson, 1989), suggests that opponents who are confident in social situations and who are motivated by competition, might have an advantage in competitive games (Berge et al., 2015). Accordingly, it has been suggested that personality traits associated with positive and stable emotion, such as high emotional stability or low neuroticism, might have a positive effect on competitive task performance (e.g., Lobel et al., 2014). While neuroticism may also have an impact on the performance in other tests, correlations are often small (r = −.08 to r = −.12 for measures of intelligence) and non-significant (e.g., Freund & Holling, 2011). In stressful situations, correlations are comparably increased but still largely insignificant (Dobson, 2000).

Moreover, test anxiety has been reported to reduce positive emotions and engagement during competitive games (Hong et al., 2015). Test anxiety has also repeatedly been negatively associated with performance in (cognitive) tests in general (Cheng et al., 2014; Hembree, 1988; Gaye-Valentine & Credé, 2013). Whether this relation differs between competitive and traditional test settings is uncertain.

Motivational variables may also account for some proportion of game success. Specifically, competition motivation has been regarded as an independent motivation facet (Tauer & Harackiewicz, 1999). However, its effect on success in competitive tasks is unclear. In addition, achievement motivation may also be related. For instance, achievement motivation was significantly higher in professional female chess players as compared to a non-professional collective (Vollstaedt-Klein et al., 2010).

Of course, results pertaining to gaming (particularly chess) may not apply directly to CDM. However, we conjecture that the setting and the type of measurement should be sufficiently similar to generalize the listed findings to some extent. Accordingly, we assessed the aforementioned other constructs in order to map a potential nomological net. Specifically, we assessed the Big-5 personality dimensions, test anxiety, competition as well as achievement motivation in addition to GMA. While relation with other board games might also have been interesting for establishing the nomological net, we did not include them in our study for the same reasons that have be outlined in the “Competition-specific Measurements” section.

The Present Study

The present study investigates CDM as an approach for assessing individual performance differences in a competitive setting. Specifically, we conducted a comprehensive first evaluation, including analyses of reliability and construct validity. For the construct validation, we hypothesize that CDM scores have substantial correlations with (a) task related measures and (b), beyond that, with measures that address the competitive component. Specifically, based on the results presented in the previous subsection, we expect (a) positive correlations of CDM scores with GMA as well as (b) positive correlations with competition and achievement motivation, and negative correlations with test anxiety and neuroticism.

Note, that while finding the expected pattern of validities does not unambiguously constitute that CDM measure a competition-specific component, it would substantiate the assumption’s plausibility. In turn, not finding any plausible validities beyond GMA would cast doubt on the feasibility of measuring competition-specific individual differences, at least for the present CDM.

For further analyzing a potential competition-specific component, we used structural equation models (SEM). Specifically, applying a bifactor approach (Eid et al., 2003; Johnson et al., 2004; Valerius & Sparfeldt, 2014), we separated the information of the CDM into a content-specific factor, representing the cognitive component, and a method-specific factor, addressing the competitive component. Moreover, based on results of the previous construct validation, we investigated the validity of the competitive component.

To the best of our knowledge, the present study is the first investigating performance in cognitively demanding competitive tasks, which have been specifically designed for psychological assessment. It is also the first study conducting a broad construct validation (e.g., DiStefano & Hess, 2005) for competitive success beyond measures of cognitive ability or directly related constructs. Thus, in addition to evaluating CDM as an assessment approach, the present study might provide useful information on potential predictors for competitive success.

Method

In the following, the three newly developed CDM as well as design and variables of our validation study are outlined.

General CDM Construction

Because rationales for developing CDM are not yet available, we summed up their construction process along the four steps of test development postulated by Wilson (2005) as outlined in Table 1.

Table 1.

Four Steps of Test Construction According to Wilson (2005).

Step	Implementation
Construct map	Nomological network with cognitive abilities and competition performance
‘Item’ construction	Matrix format tasks with verbal, numerical and figural content (see below)
Response coding	Defining how winner is determined; s.f. game description
Measurement model	see Figure 3

In addition to the general decision that CDM are conducted in a dyadic and face-to-face manner and that they relate to the three content domains (verbal, numeric, and figural) that are common in both tests and games (e.g., Jäger, 1984; Valerius & Sparfeldt, 2014), several specific decisions with respect to design were required, which are outlined in the following.

General CDM format

We decided for a matrix layout, because it is common in intelligence tests and board games. Figure 1 gives an example for each of the three CDM respectively.

Figure 1.

Exemplary constellations for the figural, verbal, and numeric CDM (from left to right).

Time limits

Matches, as well as individual moves were restricted by predefined time limits. This was necessary to allow for an organized and simultaneous CDM administration in a round-robin design as well as time-efficient testing. While the time limits automatically introduce a speed component, we tried not to pronounce this component too much, providing enough time for reflected moves.

Stochastic properties

Results should not be stochastically determined early within a match and starting players should not have an advantage. Hereby we relayed on traditional guidelines for test construction (Eid & Schmidt, 2014).

Reward

We did not give any additional incentive beyond being declared “match winner.” We conjecture that in this way an intrinsic motivational component may contribute to individual outcomes. For the sake of standardization, we decided against additional reward, because their impact could differ across contexts and would be hard to control for in this particular study design. We did, furthermore, not provide win-lose information beyond single matches because this was practically unfeasible.

Instructions

Instructions were generated with particular care to ensure that differences in scores were due to differences in ability and not in knowledge of rules. For each CDM, instructions took between 15 and 20 min. Materials included (i) a written sheet with rules, (ii) instruction videos, (iii) training rounds, and (iv) a chance for participants to ask final questions.

Time limits as well as stochastic properties were optimized successively during the construction process by repeatedly putting the CDM to the test. Furthermore, a small preliminary study was carried out with N = 6 participants who played the first versions of the CDM and were asked to “think aloud” while doing so. Observed misinterpretations of rules as well as the results of subsequent interviews with each of the participants provided valuable insights in their understanding of the instructions, helping us to revise some of them.

The Three CDM

Figural CDM

The figural CDM used a 6 × 6 matrix field. Next to the matrix lay an assortment of tiles of different forms, fitting the matrix field size. Each participant was given an identical set of tiles, differing in color only (red vs. blue). On these forms a selection of symbols was printed determining rules for connecting them. An example is given on the left panel of Figure 1. There were five different sets of tiles, which were rotated between rounds.

For 90 s, participants alternately placed their tiles on the matrix. Tiles of a respective color needed to be connected to each other. In addition, participants had to adhere to specific symbol matching rules. The participants were asked to try to restrict the number of tiles the opponent was able to place and to use as many of their own tiles as possible. The participant who was able to place the last tile won. To keep our presentation concise, we included the matching rules in the electronic supplementary materials of this article.

Verbal CDM

The verbal CDM used a 6 × 8 matrix field (see middle panel of Figure 1). Four letters were already given in a 2 × 2 arrangement in the middle of the matrix (a new combination for each match). Participants had to alternately add single words to the fields in a fixed 15 s time frame using pens of different color to distinguish originators. Each match had six rounds.

Valid words and combination had to satisfy a set of specific rules. If a participant failed to add a word within 15 s, the other participant was declared the winner. If a participant was faster than 15 s, the opponent had, nevertheless, to wait her/his turn. The participant who added the last valid word won. If, however, the sixth round was also completed successfully, a bonus round was used to determine the winner. In this round both participants were allowed to insert a word. The participant who first added a valid word was declared the winner. To keep our presentation concise, we included the extended rules in the electronic supplementary materials of this article.

Numeric CDM

The numeric CDM used a 4 × 4 matrix. Participants were asked to alternately insert integers from 1 to 5 into the fields. One match took 80 s. The goal of each participant was to reach a predefined sum of integers within a certain arrangement of fields. The target value for the sum score was changed for each match, ranging from 8 to 16. The arrangement of fields for the first participant resembled a “T,” that is, three horizontal fields plus the field below the second field in line, as well as its 180-degree rotation (see right panel of Figure 1).

Participants were asked to try to reach more sum scores than the opponent by selecting strategically sensible integers. Again, to keep our presentation concise, extended rules are included in the electronic supplementary materials of this article. In contrast to the two previous CDM, the evaluation of results was performed by a program instead of the participants themselves to ensure accuracy.

Scoring procedure

For each CDM, an average person score was calculated. Specifically, the score was the mean of the single match outcomes, coding a won match with +1, a draw with 0, and a lost match with −1. Thus, a person score ranging from −1 to +1 was obtained. Using the mean instead of the sum was necessary to account for differences in the number of matches between sessions.

Clearly, similarly to grading on a curve or scores in assessment centers, this scoring approach is group normative. Thus, individual scores always depend on the average ability of the competitors.

Participants

Participants were recruited from the undergraduate students at the department of Psychology of a mid-sized German university. We based a prior sample size estimation on detecting true correlation coefficients of 0.30 on a two-sided α-level of 5% and a power of 80%, requiring a minimum of 85 complete cases. Adjusting for multiple testing was not conducted because the present study is an exploratory correlation study. We used a correlation of 0.30 as reference because prior studies investigating relations of GMA and competitive board game success seldom reported larger values (e.g., Burgoyne et al., 2016; Lim & Furnham, 2018). Anticipating dropouts, we decided to invite 100 students to participate in five sessions of 20 participants each.

For assigning students to the five appointments, they were asked to select feasible dates from a predefined schedule. Afterward, each of the participants was randomly assigned to one of the selected dates. If the number for an appointment was larger than 20, superfluous participants were randomly excluded.

The number of actually participating students for each of the five sessions was 18, 16, 17, 19, and 21, respectively, that is, overall 91 students participated. For sessions 1 and 4, each missing participant was excused. For sessions 2 and 3, two of the missing participants were not excused respectively. One participant missing in session 2 participated in session 5 instead.

Of the 91 participants, 87 completed an additional online survey containing several validation measures. Of these participants 72 (82.76%) were female. The mean age was 20.82 years (SD = 2.55). 11 participants (12.64%) were left-handed. All participants were German native speakers. Demographic variables did not differ significantly between the five sessions, based on a significance level of 5%.

Procedure and Variables

The study was split into an online survey, in which participants completed a series of questionnaires and tests, and five separate group session in which the CDM were conducted. For matching results from the two parts, participants generated a person-specific eight-figure code (ID) to ensure anonymity of results.

Online survey

The online survey took about 45 min and was conducted a maximum of 2 days prior to the group session. It comprised of a short demographic survey, self-assessments with respect to academic achievement (five-categorical scale ranging from worst to best 20% within their semester) and competitive gaming frequency as a child (between 6 and 12 years) and within the last year. Furthermore, participants stated how familiar they felt with commonly known numerical, verbal, or figural board games on a five-categorical scale. Subsequently, a series of short questionnaires was presented (for more details, see Table 2).

Table 2.

Tests and Questionnaires Used in the Online Survey.

	Construct	#Items	Cronbach’s α
CMS^a	Competition motivation	19	.798
BIP-AM^b	Achievement motivation	6	.657
TAI-G^c	Test anxiety	15	.904
GkKT^d	General mental ability	45	.817
GkKT^d	Verbal intelligence	22	.589
	Numeric intelligence	15	.777
	Figural intelligence	8	.653
BFI-K^e	Neuroticism	4	.822
	Extraversion	4	.786
	Openness	5	.729
	Agreeableness	4	.627
	Conscientiousness	4	.661

^aCompetition Motivation Scale (CMS, Franken & Brown, 1995).

^bAchievement Motivation of the Job Related Personality Inventory (BIP, Hossiep & Paschen, 1998).

^cTest Anxiety Inventory (TAI-G, Wacker et al., 2008).

^dGiessen Cognitive Competence Test (GkKT, Petri et al., 2019).

^eBig-5 Inventory (BFI-K, Rammstedt & John, 2005).

CDM session

The CDM session took about 3 hours in which participants competed in a round-robin design.¹ The time was divided into three blocks of similar duration. First, the figural, then the verbal and finally the numeric CDM were conducted.

During the entire study, three investigators were present.

Each of the sessions was conducted uniformly between 9:00 and 12:00 a.m. They took place in seminar rooms, in which tables were spaciously arranged in an oval shape. For the very first round of every CDM, the seating arrangement was randomly selected. Opponents were seated on opposite sides of separate tables. The materials required for the first CDM were positioned in front of them. Prior to the subsequent two CDM, they were replaced while participants were allowed a 5 min break.

Opposite sides of the tables were color coded in blue and red respectively in order to facilitate further instructions. This included all materials as well as pens used for score notation. Participants on the red table side had the first move. Colors were arranged in order to balance the privilege of having the first move.

Additionally, participants were given a rating sheet for documenting CDM evaluation and their pulse rate. The latter was self-assessed prior to the CDM and in the middle of each of the CDM, within a time frame given by the instructor. The CDM acceptance evaluation questionnaire (adapted from the acceptance questionnaire by Kersting, 2005) contained five six-categorical items measuring comprehensibility of instructions and experienced strain during measurements, using two items each, as well as an overall quality rating of the CDM.

Quality Assurance

To ensure the quality of the data, we took several precautions. After data entry, 10% of the cases were randomly selected and cross checked. Moreover, we conducted several plausibility checks. For instance, we checked whether the number of times participants had the first move was balanced, as well as whether there were no repeated ID combinations within games. We also checked whether the average game score of the three games was 0.0 for each of the five session, which was determined by the design of score calculation.

Results

In the following, distributional properties, acceptance ratings, reliability, and validity of the CDM are considered. Moreover, results of structural equation models (SEM), analyzing a competition-specific component, are presented.

All analyses were performed using R (R Core Team, 2021). The SEM were fitted using the R-package lavaan (Rosseel, 2012).

CDM Scores

CDM scores were restricted to a range of −1 (all rounds lost) to +1 (all rounds won). The observed scores of the figural CDM ranged between −0.67 and 0.67, for the verbal CDM between −0.71 and 0.86, and for the numeric CDM between −0.38 and 0.54. The distribution of scores was close to normality, which may be concluded from the QQ-plots displayed in Figure 2.

Figure 2.

QQ-plots of person scores of the three CDM.

Acceptance Evaluation

Results of the acceptance evaluation questionnaire, completed after each CDM, are given in Table 3. Ratings of clarity of instructions as well as the overall quality assessment were generally close to the maximum score. The figural CDM received the lowest rating with respect to instruction clarity, having an average score of 4.10. The strain caused by the tasks was rated moderate to small.

Table 3.

Mean and Standard Deviation of CDM Acceptance Evaluation Scores.

	Figural	Verbal	Numeric
Clarity of instructions	4.10 (0.89)	5.13 (0.90)	5.23 (0.94)
Strain of tasks	2.52 (1.03)	2.16 (1.12)	1.66 (0.82)
Overall quality assessment	4.69 (0.71)	5.02 (1.00)	4.71 (1.09)

Notes: Ratings ranged from 1 (not at all true) to 6 (absolutely true), and for “strain of tasks” from very low to very high.

Reliability

Table 4 gives two split-half reliabilities for each of the three CDM. Specifically, we used a first versus second half split as well as an odd/even split. Overall reliabilities were small, ranging between 0.350 and 0.553. The numeric CDM appeared to be the least reliable, which is in accordance with the generally restricted range of CDM scores shown above. The verbal CDM yielded largest reliabilities which, however, still were small.

Table 4.

Split-Half Reliability of CDM Scores for Two Different Split-Types.

	First/second	Odd/even
Figural	0.405	0.442
Verbal	0.553	0.500
Numeric	0.350	0.370

Validity

Correlations among CDM scores were generally small. The correlation between verbal and numeric CDM was r = 0.26 (p = 0.012), between verbal and figural CDM it was r = 0.17 (p = 0.098), and between numeric and figural CDM it was r = 0.18 (p = 0.085).

The correlations with the remaining study variables are given in Table 5.² We did not include correlations with demographic variables, because they were consistently close to zero and non-significant.

Table 5.

Pearson Correlation of CDM Scores and Validation Measures.

			CDM
		Figural	Verbal	Numeric	Combined
GkKT	figural intelligence	0.22*	0.19	0.18	0.30**
GkKT	Verbal intelligence	0.27*	0.18	0.04	0.25*
	Numeric intelligence	0.14	0.26*	0.20	0.30**
	General mental ability	0.26*	0.25*	0.20	0.37**
CMS	Competition motivation	−0.04	0.05	−0.14	−0.07
BIP-AM	Achievement motivation	−0.10	−0.05	−0.15	−0.15
TAI-G	Test anxiety	−0.02	−0.12	−0.08	−0.11
BFI-K	Neuroticism	−0.21*	−0.21	−0.24*	−0.34**
	Extraversion	0.09	0.18	0.01	0.14
	Openness	0.04	0.03	0.03	0.05
	Agreeableness	−0.07	0.12	0.03	0.03
	Conscientiousness	0.00	−0.07	0.02	−0.03
Gameplay	Childhood	−0.22*	−0.24*	0.05	−0.21*
	Current	0.07	−0.13	0.11	0.02
	Figural	0.12	0.03	0.22*	0.19
	Verbal	0.03	0.24*	0.38**	0.33**
	Numeric	0.05	−0.11	0.22*	0.08
Average pulse		0.12	0.05	0.01	0.09
High school GPA		−0.03	0.10	−0.02	0.02
Academic achievement		0.07	0.10	−0.01	0.08

Notes: *p < 0.05, **p < 0.01.

Correlations with intelligence were small but significant. A maximum correlation of r = 0.37 (p < 0.001) was obtained for the combined CDM score and the measure of GMA. Single CDM and intelligence scores correlated homogeneously with each other, without yielding particularly increased coefficients for scores pertaining to matching content domains.

Contrary to our expectations, neither competition nor achievement motivation were related to CDM. However, it is notable that the correlations’ signs were continuously negative, implying that larger motivation scores were predictive for slightly decreased CDM scores. Similarly, test anxiety was not significantly related to CDM.

Considering correlations with the Big-5, only neuroticism yielded significant correlations. The generally negative correlations with neuroticism, ranging between −0.34 and −0.21, show that participants with higher values had slightly lower CDM scores.

Interestingly, participants with a higher frequency of competitive gaming in childhood achieved slightly decreased figural and numeric CMD scores, as indicated by the negative correlations of r = −0.22 (p = 0.038) and r = −0.24 (p = 0.024) respectively. Experience with verbal games, on the other hand, was predictive of increased scores in the verbal and numeric CDM, yielding correlations of r = 0.24 (p = 0.027) and r = 0.38 (p < 0.001) respectively. Finally, neither pulse rate nor high school GPA or the self-assessment of academic achievement were significantly related to CDM. An overview of the correlations is given in the electronic supplementary materials of this article.

Latent Factor Structure

For isolating a competition-specific factor from the CDM scores, we fitted a sequence of three hierarchically nested SEM. Note that, due to model-identification constraints, we did not include specific latent variables pertaining to verbal, numeric, and figural content (e.g., Johnson et al., 2004; Valerius & Sparfeldt, 2014).

Model 1 considers a bi-factorial structure in which CDM scores are connected to general mental ability (GMA) and an orthogonal competition-specific factor. Specifically, indicators of GMA are the three intelligence scores as well as the CDM scores. The CDM scores are also indicators of the second factor denoted as “competition.” The model structure and parameter estimates are displayed in Figure 3.

Figure 3.

Model structure and standardized parameter estimates of the three SEM for extracting and explaining the competition-specific component of the CDM. Significance levels: ∗ = p ≤ .05 and ∗∗ = p ≤ .01 (GPE = gameplay-experience).

Model 1 fits the data very well, yielding a model test of χ²(6) = 3.340 (p > .05) as well as RMSEA = .000 with a 90%-confidence interval with limits of .000 and .096, CFI = 1.000, and TLI = 1.086. Loadings pertaining to CDM scores are consistently positive.

Accordingly, both model fit and parameter estimates support the bi-factorial solution for CDM scores. However, the loadings pertaining to the CDM score are consistently small, which is in accordance with the generally small validities outlined before. Specifically, only the verbal CDM yields a significant association with GMA.

Model 2 adds the three competitive gameplay experience variables (GPE) as indicators to the competition factor as compared to model 1. By including these indicators, it can be examined whether the systematic variances among CDM scores (that are independent from GMA) relate to a competition-specific component. Indeed, the updated model closely fits the data, yielding χ²(24) = 22.676 (p > .05) as well as RMSEA = .000 with 90% confidence limits of .000 and .080, CFI = 1.000, and TLI = 1.021. Thus, it is plausible to assume that the second factor measures a competition-specific component. Overall, the model’s parameter estimates that were already contained in model 1 do not change significantly (see Figure 3).

Model 3 introduces neuroticism as a third factor, using the four corresponding BFI-K items as indicators. By linking neuroticism to GMA and competition, the plausibility of the corresponding correlations can be considered with respect to factorial validity. Again, the fit is very good, yielding χ²(59) = 59.368 (p > .05), RMSEA = .008 with 90% confidence limits of .000 and .065, CFI = .998, and TLI = .999. The correlation between neuroticism and competition is strongly negative, yielding a significant value of −.564 matching the expected relation. In turn, the correlation between GMA and neuroticism is non-significant, yielding a small negative value of −.221.

All of the models support the assumption that CDM measure both competition-specific and GMA-related information. However, while the connection to the competition component is substantial, yielding significant loadings for all three CDM scores in at least one of the models, the connections to GMA is consistently weak for two of the CDM. Only, the verbal CDM yields consistently significant standardized loadings on GMA.

Discussion

The present study was conducted to set a sensible starting point for systematic research exploring individual differences in cognitively demanding competitions. We developed three cognitive dyadic measurement (CDM) tools, combining features of cognitive ability tests with a competitive component. The new CDM were evaluated considering acceptance, reliability as well as construct validity, including structural equation models (SEM) for distinguishing content- and competition-specific components.

Overall, acceptance and understandability of instructions of CDM tasks were rated very highly by the participants. The strain of CDM tasks was rated moderate to low.

Reliabilities of CDM scores were rather low, yielding coefficients between .350 and .553. Validities did partly match our hypotheses: As expected, GMA scores yielded small to moderate positive correlations with CDM scores. The size of these validities was close to values reported in the few previous studies investigating competitive board games (e.g., Burgoyne et al., 2016; Lim & Furnham, 2018). The numeric, verbal, and figural CDM did not yield distinctly increased correlations with GMA measures with corresponding content. This might be due to relatively large standard errors of correlation coefficients due to sample size and the fact that all CDM were presented in a matrix format, which may have introduced an identical figural component to all scores.

Furthermore, as hypothesized, neuroticism was negatively correlated with CDM scores. Additional positive validities were observed for competitive gameplay experience. Contrary to expectations, neither test anxiety, nor competition or achievement motivation correlated significantly. However, the affected test power due to the low reliabilities of CDM scores needs to be taken into account.

The SEM results suggest that systematic variances among CDM are well explained by two components: GMA and a competition-specific component. While loadings on these components were only small to moderate, the loading pattern of the other indicators and the correlations among latent variables support the substantive interpretation.

Critical Considerations

The present results––while providing useful information with respect to the CDM approach as well as personality attributes beyond GMA that appear to be relevant for succeeding in competitions––should only be considered as preliminary. Because CDM represent only one specific type of competition measurement and validation variables were restricted to psychological constructs, generalizations of the present results may be overly optimistic.

While the SEM supports a bi-factorial solution for the CDM, it cannot be precluded that other constructs (such as memory, concentration, or crystallized intelligence) may explain additional CDM variance. Although previous literature does not indicate that other cognitive abilities beyond GMA are relevant (e.g., Quiroga et al., 2015), it might be worth conducting a more comprehensive validation, considering the small validity coefficients in the current sample. Moreover, validating the latent competition variable with competitive gameplay experience may be considered critically. While this variable certainly includes competition-specific aspects, other alternative interpretations might be plausible as well. Future research should include a wider selection of different competition indicators for a more comprehensive validation of this aspect.

On a different note, the small sample size, although sufficient for verifying substantial correlations, led to a reduced precision of coefficients. Particularly, because of the comparably small validity coefficients, future studies should increase sample size. This also holds true for the SEM. While the fit indices indicate that the models fit well, their reduced precision needs to be considered with caution. Thus, any specific results should be considered as preliminary until they have been validated in larger samples. However, before more extensive data collections are conducted, the measures’ limitations with respect to reliability should be resolved.

Desiderates for Further CDM Development

Clearly, the usefulness of CDM is reduced due to their limited reliabilities. While they may be attributed in part to restrictions of range due to the student sample, the adequate reliabilities of the GMA measures show that better coefficients could have been obtained. We conjecture that the dynamic features of CDM are among the central reasons. Specifically, having a human opponent who can develop during the course of the measurement and, for instance, change strategies, might inevitably result in increased measurement error.

While these dynamic features are difficult to control directly, there are options to partially improve reliabilities. One option is to increase the number of matches. An acceptable reliability of 0.80 for the verbal task, which had a reliability of approximately 0.55 based on averagely 17 matches, would require 56 matches based on the Spearman-Brown prediction formula. If one match takes 2 minutes, about 2 h would be required.

Also, some of the negative effects on reliability may be lessened by an extended training phase. If all participants have a substantial amount of training, there might be less development within the actual measurement phase and, therefore, the sequence of opponents would be less influential. Moreover, a training phase would reduce the impact of gaming experience differences. However, it has to be noted that we already provided comprehensive instructions and a short training phase, which, in total, took 15–20 min per CDM.

Another option for improving reliability is to improve the CDM tasks themselves. The difference in reliability between the three CDM clearly shows that the type of task matters. While all CDM were used correctly by the participants, we observed that the time limit for each round might have had an impact. Particularly in the numerical CDM, several participants inserted the numbers rather hastily, requiring less than half of the available time. Consequently, differences in ability might not have been given the chance to manifest appropriately. The verbal CDM, having the highest reliability, decelerated the process somewhat by setting a fixed 15-second period for entering a single word. However, this remains a conjecture because multiple influencing factors are confounded.

Another difference between the three CDM was the inter-connectedness of moves. In the verbal CDM, it often appeared sufficiently challenging to insert a new word, ignoring its strategical impact on the opponent’s moves. In the figural CDM, the effects of moves were more interwoven. With placing a tile, participants were able to restrict the opponent’s and to keep their own options. In the numeric CDM, we perceived the inter-connectedness most strongly, because collecting points depended on the results of multiple rounds. We think that the larger the inter-connectedness of moves is, the more complex and/or random the scores may be. While this conjecture is supported by the pattern of reliabilities, further evidence is needed for verification.

Outlook and Recommendations

The present pilot study pursued two objectives: creating measurement approaches as well as extracting and validating a competition-specific component. With respect to the first objective, we think that while the CDM have provided valuable results, they are not yet suited for individual assessment, because of their reliabilities. While we provided some suggestions for improving them, it might be particularly challenging to improve CDM up to the quality standards of traditional test procedures. Additional studies examining the relation between the three newly developed CDM and existing board games (e.g., Scrabble) could be considered another next step of exploring the nomological net.

With respect to extracting a competition-specific component, we are confident that CDM can be useful. Analyzing the validity of the competitive component may provide new insights into the nomological network that appears to play an important role for success in competitive situations. However, further research is certainly required for substantiating our understanding of the competitive component.

In addition to the simple win-lose-scores, CDM may also provide valuable behavioral information. While––for the sake of participants’ anonymity––we did not collect person-specific behavioral observations, such an addition might be valuable. Also, a more detailed assessment of previous gameplay experiences may be beneficial, allowing to analyze more specific relations, such as, for instance, between “Scrabble” and the verbal CDM.

Besides the practical desiderates for CDM revision, we hope that the present study sets a starting point for systematical research on the effects that game-like attributes (like a competitive component) have on the assessment of abilities or traits required for a specific operation or task as it appears unwarranted not to pay attention to potential effects it can have on the assessment.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Dirk Lubbe

Pascale Stephanie Petri

Notes

References

Armstrong

M. B.

Ferrell

J. Z.

Collmus

A. B.

Landers

R. N.

(2016). Correcting misconceptions about gamification of assessment: More than sjts and badges. Industrial and Organizational Psychology, 9(3), 671–677. https://doi.org/10.1017/iop.2016.69

Balafoutas

Kerschbamer

Sutter

(2012). Distributional preferences and competitive behavior. Journal of Economic Behavior Organization, 83(1), 125–135. https://doi.org/10.1016/j.jebo.2011.06.018

Berg

(2021). A game-based online tool to measure cognitive functions in students. International Journal of Serious Games, 8(1), 71–87. https://doi.org/10.17083/ijsg.v8i1.410

Berge

L. I. O.

Bjorvatn

Garcia Pires

A. J.

Tungodden

(2015). Competitive in the lab, successful in the field? Journal of Economic Behavior & Organization, 118, 303–317. https://doi.org/10.1016/j.jebo.2014.11.014

Buford

C. C.

O’Leary

B. J.

(2015). Assessment of fluid intelligence utilizing a computer simulated game. International Journal of Gaming and Computer-Mediated Simulations, 7(4), 1–18. https://doi.org/10.4018/IJGCMS.2015100101

Burgoyne

A. P.

Sala

Gobet

Macnamara

B. N.

Campitelli

Hambrick

D. Z.

(2016). The relationship between cognitive ability and chess skill: A comprehensive meta-analysis. Intelligence, 59, 72–83. https://doi.org/10.1016/j.intell.2016.08.002

Cechanowicz

Gutwin

Brownell

Goodfellow

(2013). Effects of gamification on participation and data quality in a real-world market research domain. In: Proceedings of the First International Conference on Gameful Design, Research, and Applications. https://doi.org/10.1145/2583008.2583016

Cheng

Linger

Fox

Doe

Jin

(2014). Motivation and test anxiety in test performance across three testing contexts: The CAEL, CET, and GEPT. TESOL Quarterly, 48(2), 300–330. https://doi.org/10.1002/tesq.105

Christiansen

F. B.

Loeschcke

(1990). Evolution and competition. In Wöhrmann

Jain

(Eds.), Population biology (pp. 367–394). Springer. https://doi.org/10.1007/978-3-642-74474-7_13

10.

Deterding

Dixon

Khaled

Nacke

(2011). From game design elements to gamefulness: Defining gamification. In Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments. https://doi.org/10.1145/2181037.2181040

11.

Dicheva

Dichev

Agre

Angelova

(2015). Gamification in education: A systematic mapping study. Educational Technology & Society, 18(3), 75–88.

12.

DiStefano

Hess

(2005). Using confirmatory factor analysis for construct validation: An empirical review. Journal of Psychoeducational Assessment, 23(3), 225, 241. https://doi.org/10.1177/073428290502300303

13.

Dobson

(2000). An investigation into the relationship between neuroticism, extraversion and cognitive test performance in selection. International Journal of Selection and Assessment, 8(3), 99–109. https://doi.org/10.1111/1468-2389.00140

14.

Eid

Schmidt

(2014). Constructing measures: An item response modeling approach. Hogrefe.

15.

Eid

Lischetzke

Nussbeck

F. W.

Trierweiler

L. I.

(2003). Separating trait effects from trait-specific method effects in multitrait-multimethod models: A multiple-indicator ct-C(M-1) model. Psychological Methods, 8(1), 38–60. https://doi.org/10.1037/1082-989X.8.1.38

16.

Franken

R. E.

Brown

D. J.

(1995). Why do people like competition? The motivation for winning, putting forth effort, improving one’s performance, performing well, being instrumental, and expressing forceful/aggressive behavior. Personality and Individual Differences, 19(2), 175–184. https://doi.org/10.1016/0191-8869(95)00035-5

17.

Freund

P. A.

Holling

(2011). Who wants to take an intelligence test? Personality and achievement motivation in the context of ability testing. Personality and Individual Differences, 50(5), 723–728. https://doi.org/10.1016/j.paid.2010.12.025

18.

Gaye-Valentine

Credé

(2013). Assessing the construct validity of test anxiety: The influence of test characteristics and impact on test score criterion validity. TPM: Testing, Psychometrics, Methodology in Applied Psychology, 20(2).

19.

Hembree

(1988). Correlates, causes, effects, and treatment of test anxiety. Review of Educational Research, 58(1), 47–77. https://doi.org/10.3102/00346543058001047

20.

Hong

J.-C.

Hwang

M.-Y.

Tai

K.-H.

Lin

P.-C.

(2015). Self-efficacy relevant to competitive anxiety and gameplay interest in the one-on-one competition setting. Journal of Educational Technology & Society, 63(5), 791–807. https://doi.org/10.1007/s11423-015-9389-2

21.

Hossiep

Paschen

(1998). Bochumer inventory for job-related personality description (bochumer inventar zur berufsbezogenen persoenlichkeitsbeschreibung). Hogrefe.

22.

Jäger

A. O.

(1984). Intelligenzstrukturforschung: Konkurrierende modelle, neue entwicklungen, perspektiven. Psychologische Rundschau, 35 (1), 21–35.

23.

Johnson

D. W.

Johnson

R. T.

(1989). Cooperation and competition: Theory and research (2nd ed.). Interaction Book Company.

24.

Johnson

Bouchard

T. J.

Krueger

R. F.

McGue

Gottesman

I. I.

(2004). Just one g: Consistent results from three test batteries. Intelligence, 32(1), 95–107. https://doi.org/10.1016/s0160-2896(03)00062-x

25.

Kersting

(2005). Akzept! - questionnaire for Measuring Acceptance Towards Diagnostic Instruments [Akzept! - fragebogen zur Messung der Akzeptanz Diagnostischer Verfahren].

26.

Koivisto

Hamari

(2019). The rise of motivational information systems: A review of gamification research. International Journal of Information Management, 45, 191–210, https://doi.org/10.1016/j.ijinfomgt.2018.10.013

27.

Landers

R. N.

Auer

E. M.

Abraham

J. D.

(2020). Gamifying a situational judgment test with immersion and control game elements. Journal of Managerial Psychology, 35(4), 225–239. https://doi.org/10.1108/JMP-10-2018-0446

28.

Lim

Furnham

(2018). Can commercial games function as intelligence tests? A pilot study. The Computer Games Journal, 7(1), 27–37.https://doi.org/10.1007/s40869-018-0053-z

29.

Lobel

Granic

Engels

R. C.

(2014). Stressful gaming, interoceptive awareness, and emotion regulation tendencies: A novel approach. Cyberpsychology, Behavior and Social Networking, 17(4), 222–227. https://doi.org/10.1089/cyber.2013.0296

30.

Looyestyn

Kernot

Boshoff

Ryan

Edney

Maher

(2017). Does gamification increase engagement with online programs? A systematic review. Plos One, 12(3), e0173403. https://doi.org/10.1371/journal.pone.0173403

31.

Lumsden

Edwards

E. A.

Lawrence

N. S.

Coyle

Munafo

M. R.

(2016). Gamification of cognitive assessment and cognitive training: A systematic review of applications and efficacy. JMIR Serious Games, 4(2), Article e11. https://doi.org/10.2196/games.5888

32.

Munkes

Diehl

(2003). Matching or competition? Performance comparison processes in an idea generation task. Group Processes & Intergroup Relations, 6(3), 305–320. https://doi.org/10.1177/13684302030063006

33.

Nikolaou

Georgiou

Kotsasarlidou

Nikolaou

Georgiou

Kotsasarlidou

(2019). Exploring the relationship of a gamified assessment with performance. The Spanish Journal of Psychology, 22, Article E6. https://doi.org/10.1017/sjp.2019.5

34.

Petri

P. S.

Weingardt

Kersting

(2019). Let’s get to the hard facts - erfassung von Intelligenz im Rahmen von Online Self-Assessments. Empirische Pädagogik, 33(3), 307–330.

35.

Quiroga

M. Á.

Escorial

Román

F. J.

Morillo

Jarabo

Privado

Hernandez

Gallego

Colom

(2015). Can we reliably measure the general factor of intelligence (g) through commercial video games? Yes, we can! Intelligence, 53, 1–7. https://doi.org/10.1016/j.intell.2015.08.004

36.

Quiroga

M. Á.

Román

F. J.

de La Fuente

Privado

Colom

(2016). The measurement of intelligence in the xxi century using video games. The Spanish Journal of Psychology, 19, Article E89. https://doi.org/10.1017/sjp.2016.84

37.

R Core Team . (2021). R: A language and environment for statistical computing. http://www.R-project.org/

38.

Rabah

Cassidy

Beauchemin

(2018). Gamification in education: Real benefits or edutainment? https://doi.org/10.13140/RG.2.2.28673.56162

39.

Rammstedt

John

O. P.

(2005). A short version of the big five inventory bfi-k (kurzversion des big five inventory bfi-k). Diagnostica, 51(4), 195–206. https://doi.org/10.1026/0012-1924.51.4.195

40.

Ross

S. R.

Rausch

M. K.

Canada

K. E.

(2003). Competition and cooperation in the five-factor model: Individual differences in achievement orientation. The Journal of Psychology, 137(4), 323–337. https://doi.org/10.1080/00223980309600617

41.

Rosseel

(2012). Just one g: Consistent results from three test batteries. Journal of Statistical Software, 48(2), 1–36.

42.

Sailer

Homner

(2020). The gamification of learning: A meta-analysis. Educational Psychology Review, 32(1), 77–112. https://doi.org/10.1007/s10648-019-09498-w

43.

Sobczyk

Dobrowolski

Skorko

Michalak

Brzezicka

(2015). Issues and advances in research methods on video games and cognitive abilities. Frontiers in Psychology, 6, Article 1451. https://doi.org/10.3389/fpsyg.2015.01451

44.

Tauer

J. M.

Harackiewicz

J. M.

(1999). Winning isn’t everything: Competition, achievement orientation, and intrinsic motivation. Journal of Experimental Social Psychology, 35(3), 209–238. https://doi.org/10.1006/jesp.1999.1383

45.

Valerius

Sparfeldt

J. R.

(2014). Consistent g- as well as consistent verbal-numerical- and figural-factors in nested factor models? Confirmatory factor analyses using three test batteries. Intelligence, 44, 120–133, https://doi.org/10.1016/j.intell.2014.04.003

46.

Vollstaedt-Klein

Grimm

Kirsch

Bilalic

(2010). Personality of elite male and female chess players and its relation to chess skill. The Lancet Infectious Diseases, 20(5), 517–521. https://doi.org/10.1016/j.lindif.2010.04.005

47.

Wacker

Jaunzeme

Jaksztat

(2008). A short version of the test anxiety inventory tai-g (eine kurzform des prüfungsängstlichkeitsinventars tai-g). Zeitschrift für Pädagogische Psychologie, 22(1), 73–81. https://doi.org/10.1024/1010-0652.22.1.73

48.

Wilson

(2005). Constructing measures: An item response modeling approach. Routledge. https://doi.org/10.4324/9781410611697