Assessment of Teams in a Digital Game Environment

Abstract

Background. Despite the increasing pervasiveness of digital entertainment and serious games in organisational life, there is little evidence for the validity of game-based team training and assessment.

Aim. The authors used the game, TEAMUP for a series of team training and assessment sessions, while at the same time researching the internal validity of the game for this purpose.

Method. A total of 106 sets of data on games played by teams of professionals (police officers, auditors, consultants, etc.) and undergraduates and postgraduates (in aerospace engineering, entrepreneurship, etc.) were gathered for analysis through pre- and post-game questionnaires focusing on constructs for team quality, such as psychological safety and team cohesiveness. In addition, a large quantity of such data as time to complete task, distance and avoidable mistakes were logged to measure in-game team performance. Correlation and regression analyses were conducted to find relationships between team structure factors, team quality constructs and in-game performance measures.

Results. The main finding is that the in-game performance measure ‘avoidable mistakes’ (a proxy for task quality) correlates markedly and pervasively with ‘team cohesiveness’. More important, the findings support the premise that in-game assessment can be internally valid for team research and assessment purposes.

Keywords

serious games team assessment team training validation

Introduction

Global economic developments emphasise the importance of network organisations for the long-term competiveness of organisations. People, on the other hand, collaborate in virtual and dispersed teams with non-hierarchical forms of leadership, like distributed, shared and emergent leadership (Gressick & Derry, 2010; Hertel, Geister, & Konradt, 2005; Shuffler, Wiese, Salas, & Burke, 2010). Additionally, the current generation is growing up in a networked society in which digital games (and social media) have a ubiquitous and pervasive presence (Bekebrede, Warmelink, & Mayer, 2011; Tapscott, 1999). It is therefore argued that pervasive digital experiences, including game experiences, have effects on the types of collaboration that the Net generation feels comfortable with, as team members and as team leaders (Edery & Mollick, 2008; Gressick & Derry, 2010; Lisk, Kaplancali, & Riggio, 2011; Reeves & Read, 2009; D. Williams et al., 2006; Yee, 2006). If this is the case, we need a better insight into whether digital game-mediated team processes are related to team performances (Coovert, Winner, Bennett, & Howard, 2017; Marlow, Salas, Landon, & Presnell, 2016). This is highly relevant for the current and future performance of organisations, especially when the game generation becomes entrepreneurial or reaches management levels in organisations (Edery & Mollick, 2008; Reeves & Read, 2009).

Against this background, we see an increasing interest in the use of serious games (SGs) as an alternative or complementary mode of training, and of assessing individual team behaviour and performance (Coovert et al., 2017; Marlow et al., 2016). By observing and assessing teams during digital serious game play, we can gain additional insight into the relation between team structure (the composition of a team in terms of age, gender, experience), team quality, and team performance, which will open up many new ways of researching, training and assessing teams in our networked and digital age (Bououd & Boughzala, 2013; Poy-Castro, Mendaña-Cuervo, & González, 2015; Sawaragi, Fujii, Horiguchi, & Nakanishi, 2016; Warmelink et al., 2017; Wendel et al., 2013). However, despite the increasing pervasiveness of digital entertainment and serious games in organisational life, there is little evidence for the validity of game-based team training and assessment.

Challenges in Game-Based Team Assessment

Team Performance

The questions why do some teams perform better than others? and how can we know in advance which teams will perform better? have inspired a body of research so vast that it would be impossible to summarise here (Salas, Reyes, & Woods, 2017). Following Cohen and Bailey (1997, p. 241), we define a team as ‘… a collection of individuals who are interdependent in their tasks, who share responsibility for outcomes, who see themselves and who are seen by others as an intact social entity embedded in one or more larger social systems […], and who manage their relationships across organisational boundaries.’

A very diverse range of factors, or independent variables, have been put forward that may explain the dependent variable of team performance. Psychology focuses on the combination of personality traits (van Vianen & De Dreu, 2001), member ability (Barrick, Stewart, Neubert, & Mount, 1998; van Vianen & De Dreu, 2001), team familiarity, team roles (Fisher, Hunter, & Macrosson, 2002) or leadership styles (DeChurch & Marks, 2006; Gerstner & Day, 1997); small group research focuses on things like identity, conformity, psychological safety and cohesiveness (Beal, Cohen, Burke, & McLendon, 2003; Chiocchio & Essiembre, 2009; Mullen & Copper, 1994); and management sciences focus on such things as team structure, team size and composition (Bell, 2007), trust in leadership, reward structures and task-related technology.

The dependent variable – namely team performance – is difficult to assess (Guzzo & Dickson, 1996; Salas et al., 2017). Most studies now agree that team performance should be treated as a multidimensional concept that is best measured in so-called performance composites as “excellent indicators of overall team effectiveness as compared to those that only assess one aspect of performance” (Mathieu, Maynard, Rapp, & Gilson, 2008, p. 417). Team performance can be measured by the operationalisation of multiple factors like performance delivery (does the team reach its end goals?), task efficiency (in doing its tasks, does the team use its resources efficiently?) and task quality (does the team conduct its tasks without too many errors or failures?). However, from a methodological and practical perspective, it proves rather challenging to measure multiple team performance indicators in a real life, or even in an artificial team situation.

If we know which team structure and team quality factors have a marked influence upon team performance, we can design and use intervention strategies and methods to increase team performance. This can be accomplished through, team member selection and team training sessions. Important leverages for team intervention strategies include motivation (Colquitt, LePine, & Noe, 2000), trust (Colquitt, Scott, & LePine, 2007), knowledge (Cooke, Salas, Cannon-Bowers, & Stout, 2000) and mental maps (DeChurch & Mesmer-Magnus, 2010). Furthermore, the effect of team training strategies or methods on team performance can now be studied in, for instance, military, surgical, emergency or business teams (Salas & Cannon-Bowers, 2001; Salas et al., 2008). Games increasingly provide an artificial environment for training and assessing team quality, thereby influencing team performance. In the study below, we focus on the relation between the aforementioned aspects: 1) team structure factors, 2) team quality constructs, and 3) team performance indicators, in a game based environment used for team training and assessment.

Team Based Game Research

Several review articles on the efficacy of game-based learning (GBL) have been published, and such articles are now appearing with increasing frequency (Connolly, Boyle, MacArthur, Hainey, & Boyle, 2012; Girard, Ecalle, & Magnan, 2012). However, research into the efficacy of games is related but not identical to the validation of games for team training or assessment. In the first case, the objective is to determine the impact of game play upon the player or group of players. This is commonly referred to as learning or behavioural change. In the second case, however, the objective is to establish the validity of game play as a method for observing certain structural aspects of individuals, groups or organisations. In short, validation research precedes intervention research.

The main barrier to the diffusion of SGs in organisational life is their external validity (the explanatory and predictive value), especially when such matters as individual and team assessment or scientific research are concerned. The question whether games are effective training methods depends on whether team quality and team performance indicators from game play are correlated to relevant indicators outside the game context. However, external validity is affected by the internal validity of the game: the fact that in-game team performance is, or is not, related to in-game team quality. In a real life environment, team performance – getting the job done - matters most. Team structure – e.g., gender balance, experience - and team quality – e.g., team safety, cohesion - are the means to achieve, or explain team performance. In the artificial context of game-based training and assessment, it is the other way around. Team performance in the game – e.g., a high or low game score - is relevant only as a proxy for the team quality demonstrated during the game. Internal validity implies that teams with better in-game performance also have better team quality, and vice versa. In-game team performance scores by themselves have very little external value. However, in-game team quality may have external value, assuming that in-game team quality factors are relatively stable and transferrable to the outside world. In short, internal validity is a pre-condition but not a guarantee for external validity. External validity might be limited for other reasons, such as the fact that the in-game team challenges do not represent a real life team situation adequately. This study focuses on the internal validity of games for the training and assessment of teams, as a precondition for the external validity and efficacy of game-based team training and assessment.

Determining the validity of a SG for team research and assessment is a complicated matter, because experimental validation (by questionnaires, tests, etc.) is likely to disturb the team process and the game play itself. The use of questionnaires also contradicts the key principle of gaming, namely that the game play itself should be the assessment. Unfortunately, research into and the application of games for assessment are still in their infancy and we need to construct and validate their use by comparing the observed data with psychometric assessment tools. At the time of writing, a triangulation of data based upon self-reporting, personal observation and digital observation is highly necessary for various practical, theoretical and ethical reasons. The fundamental advantage of digital SGs for training and assessment is that data can be unobtrusively gathered, logged, saved and analysed for debriefing, assessment or research purposes. Stealth assessment – or non-invasive, unobtrusive, non-disturbing assessment – can potentially increase the learning efficacy of SGs, because much of the learning in SGs is rather implicit and subjective; for instance through personal debriefings (Seitlinger, Bedek, Kopeinik, & Albert, 2012; Shute, 2011; Tannenbaum & Cerasoli, 2012). It remains rather difficult for player/candidate and trainer/assessor to monitor and keep track of what happens, to objectify the observations and compare them to other sessions, and to feed back the information with authority in order to enhance learning or support a judgement.

The use of SGs for training and assessment is highly dependent on the possibility to unobtrusively gather data on serious game play – data that are valid enough to be used as indicators or predictors of team quality and team performance, the so-called game analytics. In order to develop unobtrusive methods and tools for game-based team training and assessment, knowledge and skills from the field of SGs (e.g. learning efficacy, game design and analytics) and organisational behaviour (e.g. assessment, team processes and organisational development) need to be synthesised. We therefore examined the relationship between team performance indicators unobtrusively collected during game play, and team structure and team quality factors surveyed before and after the game.

TEAMUP Game

The authors have been using the serious game, TEAMUP (© The Barn) for a series of team training and assessment sessions, while also studying and validating team quality and team performance in the game with psychometric tests (see below; Mayer, van Dierendonck, van Ruijven, & Wenzler, 2014). A demo video of the game can be viewed online at [www.thebarngames.nl/teamup]. The research is leading to new scientific insights into how game players can be assessed based on their game play rather than self-reported tests.

The Game

TEAMUP is a multiplayer, three-dimensional game that is available for team training under a commercial license. Four players need to self-organise, communicate, collaborate and arrange team communication, coordination and leadership in order to solve five levels of puzzles/team challenges. This may require different and alternating forms of leadership and self-organisation.

In the game, the players find themselves stranded on a deserted island occupied by Maya ruins. They need to get to the other end of the island in order to be rescued. The game is built as a total conversion modification in Unreal Development Kit (UDK) engine by a team of professional game designers; the graphics, user interaction and simulation models are of very high quality. The game can log any kind of game data, can communicate with a scoring, assessment and visualisation tool, and supports services like distributed play (headsets, walkie-talkies, etc.) and video recording. The 4-member teams may be homogeneous or heterogeneous on one or more dimensions like age, gender, nationality, game experience, leadership, preferred team roles, etc. They may be composed at random, be self-selected or be assigned. In our preferred setup, we put one player on each side of a square table so that the four players face each other and can see their own screen but not the other players’ screens. The players are not allowed to stand up, look at other screens, take over another player’s navigation or use any form of communication other than speech (no writing or Pictionary).

The game can best be experienced by playing it (and we do not want to give away too many clues!). The current version of the game consists of five puzzle levels.

(1). Door puzzle: players must navigate from their arrival dock to a closed door that gives access to a cave. Entering the cave through the door requires coordinated action. Two players have to stand on two engraved tiles outside the cave to open the door. Only then can the other two enter the cave. Once inside, these players have to stand on a similar pair of engraved tiles to keep the door open so that the other two can also enter.

(2). Tile puzzle: This level is loosely based upon the well-known training game ‘the Maze’ (Sweeney & Meadows, 2001). The team needs to find the correct path across a 5 x 5 tile maze – the floor of a room that only one of the four players can enter. When a player steps on a wrong tile, he/she falls through the floor and has to go back to the entrance. One of the players can now try again. The team has to find an effective way to communicate about correct and faulty tiles, because only the active player can see what he/she is doing. Stepping on a known faulty tile once the path is known (i.e. after one of the players has found the path) is an example of an ‘avoidable mistake’ in the game (see below).

(3). Maze puzzle: One player stands on a platform from which he/she can see the three other team members as they struggle to find their way out of a maze. The player on the platform directs the three other players through the maze to the exit.

(4). Bridge puzzle: The team needs to split up into various subgroups to solve small puzzles: a) entering a dark ruin where one leader holds a flaming torch and another needs to follow him/her. One person must stay behind in the ruin standing on an engraved tile. b) Two players have to position themselves on a bridge such that their weight brings the bridge into balance, allowing them to climb on to a platform. One person must remain on the platform to stand on a an engraved tile. If the four players manage to stand on four dispersed an engraved tile, a bridge to the next level drops down, enabling them to ascend to that level.

(5). Pillar puzzle: team members alternate in leadership, trying to communicate and solve a series of four communication and coordinated action puzzles. Solving one of the four puzzles opens a little bridge to the next puzzle, where another team member becomes the leader of a similar but more difficult team challenge.

Study Design

Data were gathered in survey design with measurements at two moments, pre and post-game as well as logging of in-game performance data (in-game analytics). In the context of this study, a classic or even quasi experiment with control groups and, or randomization is neither desirable nor feasible because the study involves real-life training sessions commissioned by clients. The objective of the study is not to determine the efficacy of TEAMUP, but to assess a possible relation between in-game team performance measures and team quality constructs (Campbell & Stanley, 1963; Mayer, Bekebrede, Warmelink, & Zhou, 2014).

The research procedure is as follows. The game has been played regularly in a workshop mode (3–4 hours), usually with 3–6 teams in parallel. Professionals (police officers, consultants, educationalists, water managers, auditors, secretaries, etc.) from many countries have played the game, as have undergraduate and postgraduate students in such fields as engineering (e.g. aerospace), management and entrepreneurship. For the police, the game play was part of a 2-day training and assessment procedure. For the engineering students, the initial request came from their student councillor. By and large, the game sessions were very similar. The client invited participants to play ‘a team game’ with context-specific motivation (‘It’s part of your curriculum/training/assessment/etc.). As facilitators, we tried to make sure that the game was ready to go before participants entered the room. We did not pre-arrange the composition of the teams because in most cases we did not know the players. Players usually tried to join a table with friends or colleagues, but of course this was not always possible. In many cases, players did not know each other. Early participants took a random seat when entering the room; late comers filled the vacant seats.

After most players were seated, we gave them a pre-game questionnaire to fill out. When everybody had completed the questionnaire, we started the game session. The author facilitated the majority of sessions; the police training, however, was moderated by external trainers hired by the client. Introductions and instructions by the trainer generally took no longer than five minutes. We made it clear that in a few moments they would virtually find themselves stranded on a deserted island and needed to find their way to the other side, working together to overcome certain obstacles. Time and avoidable mistakes (i.e. repeating an action already known to be ineffective) mattered for their in-game team performance. No further instructions were given during game play. Overall, there were few technical disturbances, and those that occurred were remedied fairly quickly by the technical support staff. Facilitator and trainers observed the teams and took notes of important points for debriefing. Teams were challenged to finish the game as fast as possible, trying to beat their peers, but there were no rewards for finishing first. Fast teams were asked to wait (have a coffee) until the slower teams had finished.

When a group finished, we immediately gave them the post-game questionnaire. After all groups had finished playing, we held a 1-hour facilitated plenary discussion. Individual and team scores (time to finish the various puzzles, the distances covered and the number of avoidable mistakes) were shown on a central screen in order to trigger and deepen discussion. The game scores of all teams were saved and compiled into a limited set of in-game performance indicators (see below). The in-game performance indicators and team psychological constructs were then analysed statistically in SPSS using correlation and regression analysis (see Tables 1 and 2).

Table 1.

Descriptives and Intercorrelations at Team Level (N=106).

	M	SD	1	2	3	4	5	6	7	8	9	10	11	12
1. Team familiarity	2.00	1.02
2. All-male team¹	.62	.49	−.23*
3. Team members’ age	20.58	5.01	.25*	−.45*
4. Digital game experience	3.55	.70	−.07	.46*	−.29*
5. Individual competence	5.37	.73	.04	.40	.03	.56*
6. Team competence	5.54	.64	.20	.23	.04	.34*	.38*
7. Team efficacy	5.56	.56	.26	.07	.15	.32*	.31*	.69*
8. Team cohesiveness	5.74	.63	.05	.10	.00	.25*	.24*	.54*	.57*
9. Team psych. safety	5.45	.40	.13	−.01	.14	.11	.19*	.53*	.61*	.48*
10. Team member exchange	5.03	.39	.03	.00	.05	.14	.20*	.37*	.39*	.27*	.46*
11. Distance	7605	1620	.16	−.24*	.23*	−.37*	−.14	−.25*	−.20*	−.34*	−.04	.12
12. Time	1639	608	.20*	−.43*	.53*	−.51*	−.29*	−.29*	−.22*	−.31*	−.01	.03	.78*
13. Errors	11.50	6.28	−.14	−.07	−.05	−.27*	−.22*	−.34*	−.35*	−.40*	−.04	−.05	.56*	.50*

Note. ¹1 = all-male team, 0 = mixed or all-female team. For distance, time and errors: better performance means a lower score.

p <. 05.

Table 2.

Team Performance and Team Collaboration.

	Distance	Time	Errors
Control variables
1. Team familiarity	.12	.07	−.10
2. All-male team	.00	−.07	.01
3. Team members’ age	.11	.40***	−.08
4. Game experience	−.28**	−.31***	−.16
Team variables
5. Individual competence
6. Team members’ competence
7. Team efficacy			−.26*
8. Team cohesiveness	−.34***	−.23**	−.36***
9. Team psych. safety			.34**
10. Team member exchange	.24**
Adj R²	.25	.46	.23

Ethical Considerations

All respondents agreed to complete the pre- and the post-game questionnaire. The questionnaires were clear about the fact that data would be anonymised and could not be traced back to persons. The respondents were also informed that data about their individual and team performance would be logged for research purposes. In the case of an external client (e.g. the police), permission was given on behalf of the client organisation to collect and store data. In a few cases (e.g. the entrepreneurship students), the players received a feedback sheet with the results of their answers after they had filled out the post-game questionnaire. Their performance in the game was kept confidential within the research team and not reported to the external client. None of the players were paid. Students did not get a mark for their performance.

Sample

The dataset consists of 424 individuals divided over 106 teams, including professionals like police officers, consultants, educationalists, water managers, auditors, and undergraduate and postgraduate students from fields like engineering, management and entrepreneurship. The majority of the players had previous experience with computer games, 21.3% had never or almost never played computer games, and 24.5% indicated that they played them daily. They were relative young: 77.8% were 20 years old or younger, 13.9% were between 20 and 30, 6.9% were between 30 and 40, and 1.5% were older than 40. Of the players, 80.7% were male and 19.3% were female; 58% were Dutch, 27% came from another western European country, and 15% came from one of a great variety of countries such as Ethiopia, Indonesia, Brazil, etc. Team compositions varied from homogeneous (all male, all young, all Dutch, etc.) to very heterogeneous (mixture of ages, genders and nationalities). There were no all-female teams.

Hypotheses

The first question is whether the composition of the player team influences the team’s game performance. According to Bell (2007), team structure variables, like age, gender and familiarity, can be predictors of team performance. In our case, it is very likely that factors like age and gender have an influence on in-game team performance. Following Barrick et al. (1998) individual and team member ability can influence team performance. In our case, it is plausible that in-game team performance is influenced by previous game experience and team member ability to play computer games. Such findings would support claims that younger teams, all-male teams, teams with members who are friends, team members who frequently play computer games and so on, have better in-game team performance. This is an important finding in itself, since many team tasks in organisations are becoming game-like, such as in control and operating rooms. On the other hand, in a context of validating game-based assessment, the finding would be tautological, namely experienced gamers have better in-game team performance. Hence, team structure and member ability are important control variables. This led to the following hypotheses:

(1). All-male teams have better in-game team performance than mixed or all-female teams.

(2). Younger teams have better in-game team performance than older teams.

(3). Teams with more team familiarity have better in-game team performance than teams with lower team familiarity.

(4). Teams with more game experience and/or higher game competence, have better in-game team performance than teams with less game experience or competence.

The second question is whether team quality constructs (i.e., team cohesion, psychological safety, team efficacy and team member exchange) surveyed after the game are related to in-game team performance indicators (i.e. time, distance, error). Beal et al. (2003), Chiocchio and Essiembre (2009) and Mullen and Copper (1994) found relationships between team cohesion and performance in groups. Seers (1989) found that the quality of team membership exchange contributed to team performance. According to Carmeli, Brueller, and Dutton (2009) and Edmondson (1999), team psychological safety is related to team performance. This led to the following hypotheses:

(5). Teams with higher team efficacy have better in-game team performance than teams with lower team efficacy.

(6). Teams with stronger team cohesiveness have better in-game team performance than teams with weaker team cohesiveness

(7). Teams with higher team psychological safety have better in-game team performance than teams with lower higher team psychological safety.

(8). Teams with stronger team member exchange have better in game team performance than reams with weaker team member exchange

Measures

Pre-game questionnaire

Before the game, players indicated their age, nationality, gender, experience with playing digital games, perceived competence and familiarity to other players. These are the team structure factors (see above).

(1). Team familiarity to other players was measured with one item focusing on the relationship with each of the team members, answered on a Likert scale ranging from 1 (‘unknown’) to 7 (‘close friend’).

(2). Digital game experience was measured with one item, ‘How often do you play computer games in private capacity?’, ranging from 1 (‘never’) to 5 (‘daily’) (Mayer, Warmelink, & Bekebrede, 2013).

(3). Individual competence was measured with the three items from the Perceived Competence Scale (G. C. Williams & Deci, 1996) reformulated for the gaming context (e.g. ‘I feel confident in my ability to play computer games’). The internal consistency was .88.

In-game performance

We developed three constructs for unobtrusively observing in-game team performance: time needed, distance covered and total numbers of errors. Time is a proxy for team performance delivery, distance a proxy for team task efficiency and error a proxy for task quality (see above).

(1). Time was calculated on the basis of the overall time needed by a team to complete all five puzzles. Time is a proxy for performance delivery.

(2). Distance indicated the sum total of the virtual metres that all team members covered in the game. Distance is a proxy for task efficiency.

(3). Errors indicated the number of avoidable mistakes made in the game by all team members in puzzle 2 (i.e. the number of faulty tiles stepped upon after a safe route through the maze had been found) and puzzle 5 (i.e. the number of resets in the pillar puzzle). Such errors are not pre-programmed into the game. Teams can make no or few errors if they communicate well, and numerous errors if they do not. Errors are a proxy for task quality.

Post-game questionnaire

Directly after finishing the game, all players completed a questionnaire on the collaboration within the team. They did so before any feedback on their performance was given and before they could discuss their experiences with others. The following constructs were taken as measurements for team quality (see above):

(1). Team members’ competence was measured with three items from Edmondson (1999). For example: ‘Most people in this team had the ability to solve the problems that came up in the game.’, ‘Certain individuals in this team lack the special skills needed for good teamwork’. Internal consistency was .67.

(2). Team efficacy was measured with three items from Edmondson (1999). For example: ‘Achieving this team’s goals was well within our reach.’, ‘With focus and effort, this team can do anything we set out accomplish’. Internal consistency was .5.

(3). Team cohesiveness was measured with four items suggested by Seers (1989). For example: ‘The team members generally trust each other.’, ‘The team lacks team spirit.’. Internal consistency was .67.

(4). Team psychological safety was measured with seven items from Edmondson (1999). For example: ‘If you make a mistake on this team, it is often held against you.’. ‘It is difficult to ask other members of this team for help’. Internal consistency was .60

(5). Team member exchange was measured with a 9-item measure developed by Seers (1989). For example: ‘I often suggested ways to solve the puzzles.’, ‘Team members understood my problems’. Internal consistency was .62.

Results

In-Game Team Performance

Teams can make trade-offs in their attempt to achieve good performance delivery. They can rush in, try to go as fast as possible and accept that they will probably make a lot of avoidable mistakes. This strategy might work as long as the team does not make, at some point in the game, false inferences about what they should do. We have seen fast teams wasting a lot of time doing or trying to do something completely irrelevant, unnecessary or impossible (e.g. climbing a tree or trying to catch falling players). In some cases, critical team members (‘Are we sure we need to do this?’) were ignored or overruled. The opposite, more reflective strategy saves time because players walk less and make fewer mistakes. This strategy might work as long as at least one team member has a clear idea of what to do, makes the right inferences and gives directions to the others. If none of the team players has a clear idea of what to do, a strategy of reflection leads to stagnation; that is, nobody does anything, there is just talk or, worse, they simply stare at each other, sometimes in despair. Without trial and error, there is nothing to reflect upon.

Team Performance Distribution

An assessment of the data revealed variation in how teams perform. The fastest team finished the game in 15 minutes; the slowest team took 68 minutes. Most teams took around 23 minutes to finish the game; the average time is around 27 minutes. The most efficient teams walked a total of 5,000 in-game metres, while the least efficient teams walked more than 13,000 metres. Teams walked an average of 7,200 metres. The number of errors (i.e. avoidable mistakes) also differed widely from team to team. The best performing team made only two errors, while the worst performing team made 37. On average, teams made 12 errors, with most teams making around 10. Distance and errors show wide distribution not only among the teams but also among members of the same team. Some members of a team walked a lot, while others in the same team hardly moved; some made a lot of avoidable mistakes, while others made none. Leaders often seemed to walk a lot – they went first, telling the other players to wait while they reconnoitre, tried to find a path, then went back and helped the others. Walking very little can be a sign that a player has weak navigation skills, and has perhaps never played a computer game before – or that he/she has followed the other team members. Making avoidable mistakes (repeating a known mistake) is definitely a sign of weak individual and collective reasoning and communication.

Team Structure

Table 1 gives the results for the inter-correlations between team structure factors, namely team familiarity, gender (all-male or mixed teams), age, game experience and individual competence. They are quite easily interpreted. As shown, gender and age make a difference to team performance. As expected, younger teams have more game experience. All-male teams and younger teams are significantly faster and walk less. However, they do not make fewer avoidable mistakes. Similar results are found among teams with more game experience. Individual competence is significantly correlated to team competence, and also to team performance indicators distance, time and error. The overall conclusion is that in-game team performance indicators are significantly and markedly influenced by team structure, such as gender, age, game experience and individual game-ability. Hypotheses 1–5 are accepted. The fact that TEAMUP game performance is significantly mediated by the structure of a team, in terms of age, gender, previous experience, affects internal validity. This needs to be taken into account when considering the relation between team quality and team performance below.

Team Quality

As shown in Table 1, team member competence, team efficacy and team cohesion are significantly correlated with the in-game team performance indicators, namely time, distance and error. Team psychological safety and team member exchange, however, are not significantly correlated with these performance indicators. Table 2 presents the results of a regression analysis for the team structure and team quality factors with the three in-game team performance indicators.

As shown, time is significantly and markedly influenced by age and game experience, whereas team cohesiveness explains a significant proportion of the difference between slow and fast teams. For the distance indicator, the results are similar, even a little stronger for team cohesion and adding team member exchange. In-game performance indicators time and distance are therefore not very useful as an indicator for team quality. They are largely influenced by team structure, such as age and previous game experience. Most noticeable, the number of errors (avoidable mistakes) cannot be explained by any of the team structure variables, like age or gender, but is significantly and markedly explained by team cohesiveness, psychological safety and team efficacy. Overall, team cohesiveness significantly predicts all three in-game performance indicators. Whereas there is insufficient evidence to accept hypotheses 5, 7 and 8, hypothesis 6 can be accepted.

Conclusion

Our findings indicate that in TEAMUP, team cohesiveness has an influence on team performance. This is an interesting finding, because it suggests that the in-game team performance correlates with an important factor that expresses whether and, if so, to what extent a team is truly a team. In other words, more cohesive teams make fewer avoidable mistakes in the game. This means they have better task quality. We can therefore assume that TEAMUP really challenges and tests whether the players are able and willing to form a team (to do it together) to help less able players.

The significance of this finding is not that we confirm the plausible and previously confirmed hypothesis that cohesive teams perform better. The significance is that we are able to confirm it in the context of playing a game; that is, that an in-game performance measure – in our particular case, avoidable mistakes in a team – can be used to say something about team cohesiveness, or vice versa. It supports the premise that in-game assessment can be valid for team research and assessment purposes. The rejection of this premise would have had significant consequences for practice. If the internal validity of in-game assessment cannot be proven, there is insufficient ground for feedback in a context of training, or for making judgements in a context of assessment. Partial validation of a game, on the other hand, gives guidance to trainers and assessors. In our case, for instance, we should put less emphasis on how fast a team is, and more on how many avoidable mistakes a team makes, for example how many times a known mistake is repeated.

The research and the results are explorative and we are well aware of the limitations and the need for further research. We have not yet compared the results with some of the other digital team games discussed above and therefore cannot generalise. The data were gathered during real-life game sessions, not under experimental lab conditions. Things sometimes turned out unexpectedly as regards players, technology or questionnaires. The reason for not testing in a lab is that it is difficult to find a significant number of professionals (e.g. police officers) to play in a lab under experimental conditions. They are primarily interested in a good professional training, and they are willing to complete questionnaires only if such training is what they receive. More important, we believe that the validity of serious games can only be tested under the conditions in which they are designed to be used, namely in real-life education, training, etc. We expected to find, and indeed did find an influence of background variables like age, gender and game experience. We have discussed this issue above, and feel it needs to be taken into account seriously in game-based training and assessment. All measurements in this study were task-related: i.e., operating as a player team performing game tasks. We have also indicated that many tasks in organisations are becoming game-like and it is therefore not surprising that experienced gamers tend to perform better in game-like team tasks in real organisations. The question of external validity of TEAMUP was not an objective of this study and as yet, remains undetermined.

All in all, and on the basis of the promising results, we feel that research models and constructs for game-based team assessment should be expanded and improved. This can only be done through an iterative series of explorative quasi-experiments. However, we have recently used the same set up with partially the same constructs in a study on team assessment in mixed reality escape rooms (Warmelink et al., 2017). The data and insights gained from these and future experiments should feed back into the design, facilitation and debriefing of the game in order to improve its power as a real-life training and assessment tool. Analytics presented in dashboards can further strengthen the effectiveness of game-based team and leadership training.

Footnotes

Acknowledgements

The author wishes to thank all persons and institutions who over a period of a decade or so, have provided ideas, contributions and support to the TEAMUP game design, play-sessions, and the execution of research, and, in particular, The Barn (Bas van Nuland and Arne Bezuijen), Dirk Jan Bolderheij, Dirk van Dierendonk, Rens Philipsen, Theo van Ruijven and Ivo Wenzler.

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Author Biography

Igor Mayer is a professor in Breda University of Applied Sciences, The Netherlands, where he leads a serious games research group on the topic of Playful Organizations and Learning Systems. Contact: i.s.mayer@hotmail.com

References

Barrick

M. R.

Stewart

G. L.

Neubert

M. J.

Mount

M. K.

(1998). Relating member ability and personality to work-team processes and team effectiveness. Journal of Applied Psychology, 83(3), 377-391. doi:10.1037/0021-9010.83.3.377

Beal

D. J.

Cohen

R. R.

Burke

M. J.

McLendon

C. L.

(2003). Cohesion and performance in groups: A meta-analytic clarification of construct relations. Journal of Applied Psychology, 88(6), 989-1004. doi:10.1037/0021-9010.88.6.989

Bekebrede

Warmelink

H. J. G.

Mayer

I. S.

(2011). Reviewing the need for gaming in education to accommodate the net generation. Computers & Education, 57(2), 1521-1529. doi:10.1016/j.compedu.2011.02.010

Bell

S. T.

(2007). Deep-level composition variables as predictors of team performance: A meta-analysis. Journal of Applied Psychology, 92(3), 595-615. doi:10.1037/0021-9010.92.3.595

Bououd

Boughzala

(2013). A serious game supporting team collaboration. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work Companion - CSCW ’13. doi:10.1145/2441955.2441986

Campbell

D. T.

Stanley

J. C.

(1963). Experimental and Quasi-Experimental Designs for Research (1st ed.). Boston, MA: Houghton Mifflin Company.

Carmeli

Brueller

Dutton

J. E.

(2009). Learning behaviours in the workplace: The role of high-quality interpersonal relationships and psychological safety. Systems Research and Behavioral Science, 26(1), 81-98. doi:10.1002/sres.932

Chiocchio

Essiembre

(2009). Cohesion and performance: A meta-analytic review of disparities between project teams, production teams, and service teams. Small Group Research, 40(4), 382-420. doi:10.1177/1046496409335103

Cohen

S. G.

Bailey

D. E.

(1997). What makes teams work: Group effectiveness research from the shop floor to the executive suite. Journal of Management, 23(3), 239-290. doi:10.1016/s0149-2063(97)90034-9

10.

Colquitt

J. A.

LePine

J. A.

Noe

R. A.

(2000). Toward an integrative theory of training motivation: A meta-analytic path analysis of 20 years of research. Journal of Applied Psychology, 85(5), 678-707. doi:10.1037/0021-9010.85.5.678

11.

Colquitt

J. A.

Scott

B. A.

LePine

J. A.

(2007). Trust, trustworthiness, and trust propensity: A meta-analytic test of their unique relationships with risk taking and job performance. Journal of Applied Psychology, 92(4), 909-927. doi:10.1037/0021-9010.92.4.909

12.

Connolly

T. M.

Boyle

E. A.

MacArthur

Hainey

Boyle

J. M.

(2012). A systematic literature review of empirical evidence on computer games and serious games. Computers & Education, 59(2), 661-686. doi:10.1016/j.compedu.2012.03.004

13.

Cooke

N. J.

Salas

Cannon-Bowers

J. A.

Stout

R. J.

(2000). Measuring team knowledge. Human Factors: The Journal of the Human Factors and Ergonomics Society, 42(1), 151-173. doi:10.1518/001872000779656561

14.

Coovert

M. D.

Winner

Bennett

Howard

D. J.

(2017). Serious games are a serious tool for team research. International Journal of Serious Games, 4(1): 41-55. doi:10.17083/ijsg.v4i1.141

15.

DeChurch

L. A.

Marks

M. A.

(2006). Leadership in multiteam systems. Journal of Applied Psychology, 91(2), 311-329. doi:10.1037/0021-9010.91.2.311

16.

DeChurch

L. A.

Mesmer-Magnus

J. R.

(2010). Measuring shared team mental models: A meta-analysis. Group Dynamics: Theory, Research, and Practice, 14(1), 1-14. doi:10.1037/a0017455

17.

Edery

Mollick

(2008). Changing the game: How video games are transforming the future of business. Upper Saddle River, NJ: FT Press.

18.

Edmondson

(1999). Psychological safety and learning behavior in work teams. Administrative Science Quarterly, 44(2), 350-383. doi:10.2307/2666999

19.

Fisher

Hunter

T. A.

Macrosson

W. D. K.

(2002). Belbin’s team role theory: For non-managers also? Journal of Managerial Psychology, 17(1), 14-20. Retrieved from doi:10.1108/02683940210415906

20.

Gerstner

C. R.

Day

D. V.

(1997). Meta-analytic review of leader–member exchange theory: Correlates and construct issues. Journal of Applied Psychology, 82(6), 827-844. doi:10.1037/0021-9010.82.6.827

21.

Girard

Ecalle

Magnan

(2012). Serious games as new educational tools: How effective are they? A meta-analysis of recent studies. Journal of Computer Assisted Learning, 19(3), 207-219. doi:10.1111/j.1365-2729.2012.00489.x

22.

Gressick

Derry

S. J.

(2010). Distributed leadership in online groups. The International Journal of Computer-Supported Collaborative Learning, 5(2), 211-236. doi:10.1007/s11412-010-9086-4

23.

Guzzo

R. A.

Dickson

M. W.

(1996). Teams in organizations: Recent research on performance and effectiveness. Annual Review of Psychology, 47(1), 307-338. doi:10.1146/annurev.psych.47.1.307

24.

Hertel

Geister

Konradt

(2005). Managing virtual teams: A review of current empirical research. Human Resource Management Review, 15(1), 69-95. doi:10.1016/j.hrmr.2005.01.002

25.

Lisk

T. C.

Kaplancali

U. T.

Riggio

R. E.

(2011). Leadership in multiplayer online gaming environments. Simulation & Gaming, 43(1), 1-17. doi:10.1177/1046878110391975

26.

Marlow

S. L.

Salas

Landon

L. B.

Presnell

(2016). Eliciting teamwork with game attributes: A systematic review and research agenda. Computers in Human Behavior, 55, 413-423. doi:10.1016/j.chb.2015.09.028

27.

Mathieu

J. E.

Maynard

M. T.

Rapp

Gilson

(2008). Team effectiveness 1997-2007: A review of recent advancements and a glimpse into the future. Journal of Management, 34(3), 410-476. doi:10.1177/0149206308316061

28.

Mayer

I. S.

Bekebrede

Warmelink

Zhou

(2014). A brief methodology for researching and evaluating serious games and game-based learning. In Connolly

T. M.

Boyle

Hainey

Baxter

Moreno-Ger

(Eds.), Psychology, pedagogy, and assessment in serious games (pp. 357-393). IGI Global. doi:10.4018/978-1-4666-4773-2.ch017

29.

Mayer

I. S.

Warmelink

H. J. G.

Bekebrede

(2013). Learning in a game-based virtual environment: A comparative evaluation in higher education. European Journal of Engineering Education, 38(1), 85-106. doi:10.1080/03043797.2012.742872

30.

Mayer

van Dierendonck

van Ruijven

Wenzler

(2014) Stealth Assessment of Teams in a Digital Game Environment. In: De Gloria

(eds) Games and Learning Alliance. GALA 2013. Lecture Notes in Computer Science, vol 8605. Cham: Springer. doi: 10.1007/978-3-319-12157-4_18

31.

Mullen

Copper

(1994). The relation between group cohesiveness and performance: An integration. Psychological Bulletin, 115(2), 210-227. doi:10.1037/0033-2909.115.2.210

32.

Poy-Castro

Mendaña-Cuervo

González

(2015). Designing and evaluating a serious game for training university students in team-working skills. RISTI - Revista Iberica de Sistemas E Tecnologias de Informacao. doi:10.17013/risti.e3.71-83

33.

Reeves

Read

J. L.

(2009). Total engagement: Using games and virtual worlds to change the way people work and businesses compete. Boston, MA: Harvard Business Review Press.

34.

Salas

Cannon-Bowers

J. A.

(2001). The science of training: A decade of progress. Annual Review of Psychology, 52, 471-499. doi:10.1146/annurev.psych.52.1.471

35.

Salas

DiazGranados

Klein

Burke

C. S.

Stagl

K. C.

Goodwin

G. F.

Halpin

S. M.

(2008). Does team training improve team performance? A meta-analysis. Human Factors: The Journal of the Human Factors and Ergonomics Society, 50(6), 903-933. doi:10.1518/001872008X375009

36.

Salas

Reyes

D. L.

Woods

A. L.

(2017). The assessment of team performance: Observations and needs. In von Davie

Zhu

Kyllonen

(Eds.), Methodology of Educational Measurement and Assessment. Innovative assessment of collaboration. Cham, Switzerland: Springer. doi:10.1007/978-3-319-33261-1_2

37.

Sawaragi

Fujii

Horiguchi

Nakanishi

(2016). Analysis of team situation awareness using serious game and constructive model-based simulation. IFAC-PapersOnLine, 49(19), 537-542. doi:10.1016/j.ifacol.2016.10.617

38.

Seers

(1989). Team-member exchange quality: A new construct for role-making research. Organizational Behavior and Human Decision Processes, 43, 118-135.

39.

Seitlinger

P. C.

Bedek

M. A.

Kopeinik

Albert

(2012). Evaluating the validity of a non-invasive assessment procedure. In Ma

Fradinho

Hauge

J. B.

Duin

Thoben

K.-D.

(Eds.), Serious games development (pp. 208-218). Bremen, Germany. doi:10.1007/978-3-642-33687-4_18

40.

Shuffler

M. L.

Wiese

C. W.

Salas

Burke

C. S.

(2010). Leading one another across time and space: Exploring shared leadership functions in virtual teams. Revista de Psicología Del Trabajo Y de Las Organizaciones, 26(1), 3-17. doi:10.5093/tr2010v26n1a1

41.

Shute

V. J.

(2011). Stealth assessment in computer-based games to support learning. In Tobias

Fletcher

J. D.

(Eds.), Computer games and instruction (Vol. 55, pp. 503-523). Charlotte, NC: Information Age Publishers.

42.

Sweeney

L. B.

Meadows

D. L.

(2001). The systems thinking playbook. Durham, NC: Pegasus Communication.

43.

Tannenbaum

S. I.

Cerasoli

C. P.

(2012). Do team and individual debriefs enhance performance? A meta-analysis. Human Factors: The Journal of the Human Factors and Ergonomics Society, 55(1), 231-245. doi:10.1177/0018720812448394

44.

Tapscott

(1999). Growing up digital: The rise of the net generation. New York, NY: McGraw-Hill.

45.

van Vianen

A. E. M.

De Dreu

C. K. W

. (2001). Personality in teams: Its relationship to social cohesion, task cohesion, and team performance. European Journal of Work & Organizational Psychology, 10(2), 97-120. doi:10.1080/13594320143000573

46.

Warmelink

Mayer

Weber

Heijligers

Haggis

Peters

Louwerse

(2017). AMELIO. In Extended Abstracts Publication of the Annual Symposium on Computer-Human Interaction in Play - CHI PLAY ’17 Extended Abstracts (pp. 111-123). New York, NY: ACM Press. doi:10.1145/3130859.3131436

47.

Wendel

Gutjahr

Battenberg

Ness

Fahnenschreiber

Göbel

Steinmetz

(2013). Designing a collaborative serious game for team building using Minecraft. In Proceedings of the European Conference on Games Based Learning. doi:10.1145/2020976.2020998

48.

Williams

Ducheneaut

Xiong

Zhang

Yee

Nickell

(2006). From tree house to barracks: The social life of guilds in World of Warcraft. Games and Culture, 1(4), 338-361. doi:10.1177/1555412006292616

49.

Williams

G. C.

Deci

E. L.

(1996). Internalization of biopsychosocial values by medical students: A test of self-determination theory. Journal of Personality and Social Psychology, 70(4), 767-779. doi:10.1037/0022-3514.70.4.767

50.

Yee

(2006). The labor of fun: How video games blur the boundaries of work and play. Games and Culture, 1(1), 68-71. doi:10.1177/1555412005281819