Abstract
The pedagogical value and global relevance of heritage education regarding the safeguarding of global tradition and practice for future generations is indisputable. This research aims to investigate the efficacy of integrating a puzzle-based game, with scaffolding strategies, to enhance students’ understanding and appreciation of cultural heritage content. In the activity, students are tasked with reconstructing art patterns from cultural heritage artifacts, a process which is designed to bolster their performance in recognizing and appreciating such patterns. A control group, utilizing a non-interactive instructional video with the same content, was also established to contrast and evaluate the benefits of the game-based approach. The evaluation of knowledge acquisition and transfer was conducted through pattern tests, with the participants’ eye movements recorded during the testing process. The results show that the participants in the game group exhibited significant advantages in terms of efficiently recognizing fundamental content and effectively transferring pattern recognition. Furthermore, the examination of the visual behavior within the game reveals that the integration of the jigsaw puzzle (game mechanic) and scaffolding (instructional strategy) can enhance the efficacy and cognitive strategies employed by participants in comprehending heritage content. This study develops design methodologies for cultural heritage games that enhances learning and appreciation.
Introduction
Many more researchers have combined game-based approaches with learning to improve learning outcomes, experience, motivation, and even cognitive ability in recent years (Erhel & Jamet, 2013; Eseryel et al., 2014). Cultural heritage is one of the application areas of game-based approaches to learning, which has the characteristics and elements of general games as well as an interaction mode (Malegiannaki & Daradoumis, 2017). Unlike these, games designed for cultural heritage education focus on the digital preservation, reproduction, appreciation of cultural content, and even virtual interaction to improve the cognitive effect (Laamarti et al., 2014; Raptis et al., 2019). It is breaking the traditional way of the heritage visit, and helps the users participate in playful cultural heritage activities so as to better actively explore the educational content (Raptis et al., 2019). Cultural heritage education games primarily take the form of spatial exploration, integrating elements of role-play, environmental simulation, task narration, and treasure hunting. These games typically demonstrate a high degree of freedom and openness, catering to the public’s educational and communicative needs regarding large heritage sites and historical museums. The utilization of advanced digital technologies in serious games aims to enhance immersion and overall interactivity, thereby enriching the gaming experience (Malegiannaki & Daradoumis, 2017). However, this introduces challenges in the design methodology of serious games and limits the transferability and adaptability of game design methods to other heritage education themes. As a result, educators catering to the needs of medium and small-scale heritage education dissemination, without advanced technological skills, are unable to adopt these methods (De Freitas, 2018). The limitation hinders the application and development of serious games in cultural heritage education. In addition, few researchers pay attention to the impact of technology and interaction strategies on users’ cognition and behavior in the game (Raptis et al., 2018, 2019). Most of them often only focus on the overall design of the game and the implementation of major functionalities, while frequently neglecting the authentic experience and learning process of users during gameplay.
In conclusion, we are interested in merging additional game mechanisms with classic game genres, such as puzzle games, to enrich gameplay and broaden the scope of cultural heritage education game design. Additionally, we aim to focus on players’ interactive behaviors and cognitive processes during game-based learning experiences. In this study, we focus on pattern heritage aesthetic education as the starting point. To enhance gameplay and align with the instructional mode of cultural heritage, we introduce the mechanism of jigsaw puzzle commonly found in puzzle-based games, which combining scaffolding strategies to improve the game-based learning process. By guiding students to interact with and explore game elements, it aims to enhance their acquisition and retention of heritage knowledge and improve the transfer of learning achievements. To validate the effectiveness of our proposed method, eye-tracking technology is used to evaluate learners’ mastery of pattern heritage appreciation skills, improvements in observation patterns, and changes in underlying cognitive processes. The following sections will provide a detailed description of the challenges faced in pattern heritage instruction, the selection and integration of puzzle-based game mechanisms and the instructional strategy, the main applications of eye-tracking technology, and its suitability for research on pattern aesthetic cognitive processes.
Invisible Concepts of Art Pattern Heritage
Art patterns can be found on various cultural heritage objects, including bronze ware, ornaments, and even architecture. These patterns not only reflect the aesthetic preferences of a particular era but also encapsulate the technological advancements, regional influences, and national cultural characteristics of that time (Leong & Leung, 2013). However, the process of appreciating art patterns contains tacit knowledge (Wei et al., 2019), making the description and teaching process challenging and abstract. Furthermore, the content of art patterns is various, which makes the teaching content impossible to include all.
Most Tacit knowledge is acquired through association, observation, imitation and interactive process (Tsoukas, 2005; Ye et al., 2023). Therefore, cultivating the ability to appreciate pattern heritage can be achieved by guiding learners to acquire the correct observation skills to recognize and understand the formal semantics and layout features of patterns. Some researchers have suggested that game-based learning could provide learners with more content to explore and participate in the game, where guided learners perceive and obtain a sufficient understanding through interaction with heritage content (Froschauer et al., 2013; Gil-Fuentetaja & Economou, 2019; Yang et al., 2018). Therefore, the goal of the game is to help players acquire the heritage content displayed currently and help them build recognition methods for various heritage patterns and structures, thereby guiding them in appreciating the art patterns in cultural heritage. This study aims to explore how to design games to achieve the purpose of instructing pattern heritage appreciation. It expands the design approach of cultural heritage education games to make them more widely applicable in the practice of cultural heritage education and dissemination.
Typical Mechanisms and Features of Puzzle-Based Games
As a particular instructional approach, puzzle-based games can be regarded as a complement to traditional approaches to help complete learning tasks for the classroom or field trips (Michalewicz et al., 2011). Puzzle-based games are universal and not tied to a specific platform or type of technology (Carnegie, 2017), which makes it easy to develop and design by integrating with virtual and physical objects (as pieces or slots) to meet the needs of different educational content (Melero & Hernández-Leo, 2014). Puzzle-based games possess simple rules that can be defined separately from content and clues (Crawford, 1984), allowing for flexible application across various subject matters and seamless integration with content (Melero & Hernández-Leo, 2014). Puzzle-based games are also used to engage students in the subject topics while fostering students’ problem solving, analytical and memory skills (Bottino et al., 2007; Huang et al., 2007; Melero & Hernández-Leo, 2014).
Typical puzzle game mechanics include mazes, riddles, puzzles, solitaire, matching, sorting, etc. They are all common game types and their gameplay is well-known to most people (Ye et al., 2022). Some researchers have suggested that the interactive mode suitable for the characteristics of heritage structure is considered to improve the acquisition of heritage knowledge and ability transfer (Bradshaw, 2014; Froschauer et al., 2013; Yang et al., 2018). In summary, the introduction of puzzle-based game elements is value to broaden the design approach of cultural heritage education games. This study chooses the jigsaw puzzle, a kind of puzzle-based game, as a reference to enrich the gameplay and interaction modes of cultural heritage education games. The rules of jigsaw puzzles are simple and widely known, making it easy for players to get started and engage in the learning process. This study focus on how the interactive mode of a digital jigsaw puzzle as the learning method of heritage content works to help students improve the acquisition of heritage knowledge and the recognition of structural features.
Scaffolding Strategies in Game-Based Learning
While puzzle-based games can be considered an interactive form of serious games, the integration of an appropriate game model and instructional strategy can enhance the achievement of learning objectives and enhance students’ engagement and entertainment experience (Melero & Hernández-Leo, 2014). Many studies have identified different factors to consider when designing serious games (Laamarti et al., 2014; Prensky, 2003; Williamson, 2009). Most of the research results show that the application of constructivist learning theories in game-based learning can not only clearly define the learning goals and tasks, promote metacognition and knowledge construction, but also provide challenge mechanisms to promote active learning and gradually increase the level of difficulty (Bottino et al., 2007; Eseryel et al., 2014; Melero & Hernández-Leo, 2014). Among them, scaffolding is more applied to game-based learning, which not only helps players to complete learning goals and tasks through appropriate prompts, but also provides even feedback and task-related supportive learning (Erhel & Jamet, 2013). In the game environment, scaffolding is a support mechanism (e.g., hints or supportive learning material), which challenges the learners when they are correct, to explain their mistakes when they are wrong, and to provide prompts and supplementary information when they have difficulty following the tasks (Barzilai & Blau, 2014; Conati et al., 2013).
Scaffolding can help players to solve the problems encountered in the game, but also better stimulate players to explore and actively learn to improve the construction of knowledge under the stimulation of appropriate difficulty (Prensky, 2003). Sun et al. (2018) discussed in detail how scaffolding affects problem-solving behaviors in game-based learning, while providing solutions to discourage players from becoming dependent on scaffolds. Many studies have provided sufficient evidence for the effects of scaffolding on achievement, learning experience, intrinsic motivation and cognitive enhancement in game-based learning (Barzilai & Blau, 2014; Chen & Law, 2016). However, few studies have applied them to cultural heritage learning. This study will focus on whether scaffolding in puzzle-based games for cultural heritage education is effective and the impact of scaffolding strategy in puzzle-based games on gameplay behavior.
Eye-Movements and Cognitive Process
Eye movement data are used to measure visual behavior and provide evidence for revealing a great deal about underlying cognitive processes (Duchowski, 2002; Rayner, 1995). Eye tracking technology is increasingly used in educational research (Alemdag & Cagiltay, 2018), which record eye movements in the learning process to reveal underlying cognitive processes, even to evaluate learning achievements and knowledge acquisition (Sharma et al., 2020). According to the literature of She and Chen (2009) and Dzeng et al. (2016), for explaining the parameters of eye movements, Rayner (1998) suggested eye-tracking metrics such as total fixation duration, fixation counts and mean fixation duration are especially relevant to learning. Fixation location and duration can indicate a person’s cognitive strategies and prior knowledge or experience (Just & Carpenter, 1975). Rayner (1998) further points out that longer fixation durations are generally indicative of more extensive processing. Other research indicated that the longer means fixation duration is associated with better transfer performance (Ozcelik et al., 2009). Loftus (1972) reported that memory for a scene was related to the number of fixations made on the scene, and more fixations yield higher recognition scores.
In game-based learning, how the players control their cognitive resources may be the key to their achievements in game playing (Tsai et al., 2016). Players’ visual attention allocations among the elements and scenes in a game can reveal the metacognitive strategies for playing games by using eye-tracking measures (Chen & Tsai, 2015). By recording the eye-movement data of users in the process of playing different serious games of cultural heritage theme, Raptis et al. (2019) found out how the decision of game designers in visual search affects their visual and consequently gameplay behavior and eventually lead to the difference of knowledge acquisition, which provides a reference for the design of cultural heritage games.
In summary, eye tracking can be used to provide evidence for the effectiveness of the learning process in puzzle-based games and their learning outcomes (Tsai et al., 2012). Especially in the field of cultural heritage, visual attention can not only identify users’ differences in perception of art (Duchowski, 2002), but also provide evidence for the impact of game content and interaction mode on cognitive process (Ye et al., 2021). Therefore, this study will use eye-tracking technology to pay more attention to the impact of a puzzle-based game with scaffolding aid on the visual attention of learners and discuss the visual behavior in the process of the game in detail.
Purpose
Puzzle-based games are characterized by their simplicity in rules, diverse range of types, and ease of development. The puzzle-solving elements within these games guide players to engage deeply with the main learning content, without the need for external enhancements such as a storyline, immersive experiences, or motivational challenges. Therefore, this research selects the jigsaw puzzle, a typical representative of puzzle-based games, as the focal point for investigating the design of educational games centered around cultural heritage. This study uses a digital jigsaw puzzle made by Ye et al. (2021) with scaffolding. In this game case, players engage in the process of learning art patterns featured on bronze mirrors from the Han Dynasty through fitting a digital jigsaw puzzle. Eye-tracking technology is employed to capture and analyze participants’ eye movements, thereby revealing their cognitive learning processes and implicit strategies. To examine the effect of using a game mode that incorporates a digital jigsaw puzzle with scaffolding strategy, a control group was established. This control group utilized a non-interactive video to learn the same content. It is important to note that the experimental and control groups in this study were exposed to the same learning content. The difference between learning through gameplay and learning through video lies in the presence of interactivity and enjoyment during the learning process. Game-based learning is typically more engaging and offers learners a variety of interactive options and opportunities for exploration. The evaluation of knowledge acquisition and transfer was conducted by examining scores obtained from a pattern test and assessing visual attention. Additionally, attention distributions of participants within the game were analyzed in order to explore the influence of jigsaw puzzle mechanics and scaffolding strategies on the learning process and the recognition of pattern heritages. Based on the preceding discussion, this study aims to address the following three research questions: 1. Examine the differences in terms of acquiring pattern heritage knowledge and improving visual attention between finishing puzzle-based games with scaffolding aid and traditional non-interactive instructional videos. 2. Evaluating the differences in enhancing the transfer performance of pattern heritage appreciation abilities in relation to the observation skills for recognizing art pattern structures between finishing puzzle-based games with scaffolding aid and traditional non-interactive instructional videos. 3. Study the influence of increased difficulty levels in the digital jigsaw puzzle game on player’s attention, and examining how the acquisition of information from different areas of the game interface on improving the recognition performance of heritage structure during the puzzle-solving process.
Methodology
Materials
Game Content and Design
A digital jigsaw puzzle game, titled “the Lost Han-dynasty bronze mirrors”, was employed in this study (Ye et al., 2021). The game serves for instructing players on the recognition of art patterns through a specifically designed puzzle fitting task. The game’s content is centered around two bronze ‘dragon’ mirrors produced in the Western Han Dynasty. The art patterns present on these mirrors reflect the visual characteristics of the ancient Chinese dragon totem. The overall pattern of the bronze ‘dragon’ mirrors comprises three sets of identical patterns, each obtained through a double rotation at the center. In other words, each pattern group is repeated three times on the bronze mirror. Within each pattern group, there exists one or a cluster of core patterns, serving as the foundation upon which the entire set of patterns is extended, ultimately defining the bronze mirror’s name. Figure 1 visually depicts the pattern structure of the two bronze mirrors, with wireframes and markings utilized to highlight the form, semantics, and layout characteristics of the art patterns. Notably, different colors (red, yellow, and green) are employed to signify the composition of the art patterns, while darker hues are utilized to denote the core patterns. The bronze mirrors in the case of game and the outline of the art patterns on it.
The digital jigsaw puzzle game case entails a task in which players are required to utilize the image of the bronze mirror as a reference for fitting those fragmented pattern pieces together to form a complete design of a bronze mirror. (Ye et al., 2021). the observation skill transfer process involved in appreciating the bronze mirror pattern encompasses assisting learners in recognizing the modeling characteristics and interrelationships of each pattern on the bronze mirrors, while guiding them to explore the rules governing the modeling and cultural significance of these patterns. This process can be analogized to the assembly of a jigsaw puzzle, wherein each core pattern and auxiliary pattern are introduced and discerned sequentially, followed by an explanation of the overall composition mode and modeling semantics of the complete pattern. The game design process establishes a restoration order for the patterns based on their visual characteristics. The core pattern is initially arranged, prompting players to prioritize its placement before proceeding to the restoration of the auxiliary patterns. The core pattern generally dictates the composition mode and positioning of the auxiliary patterns. Consequently, when examining and searching for bronze mirror patterns, attention must first be directed towards the core pattern. The restoration task of the two bronze mirror patterns constitutes two levels within the game. Players complete their learning process while challenging these levels. Successful completion of the levels will reward the players. The order of the bronze mirrors in the game was determined based on the difficulty associated with recognizing the patterns. In comparison to the second bronze mirror, the pattern structure of the first bronze mirror is intelligible, with clear boundaries delineating each individual pattern, thereby facilitating ease of recognition. Once players have completed the assembly of patterns on the first bronze mirror, they will familiarize themselves with the game mode, enabling them to take on the challenge of the second bronze mirror with its more intricate pattern structure (Ye et al., 2021).
Game Interface and Description
Two user interfaces were designed in the “the Lost Han-dynasty bronze mirrors” game: the puzzle interface and the reference interface. The puzzle interface of the game was, according to its functions, divided by the light blue dashed line (see Figure 2) into two areas, Bronze Mirror (BM), Pattern Selection (PS). The right side of the puzzle interface is the bronze mirror area, which is the main area of interactive activities during the game. In this area, players can place or remove target patterns. The game system provides feedback based on the accuracy of the placement. If the placement is correct, it will activate the entrance animation for the target pattern element, where the pattern model appears on the mirror body through a “3D jump-in” approach accompanied by sound effects. If the placement is incorrect, the error prompt panel will be activated. The left side of the puzzle interface is the pattern selection area, where players find and select the target pattern. In order to help players learn better and solve the difficulties that enable them to find the location of the target pattern during the game, the reference interface was set in the game environment (see Figure 3). The interactive button on the far right of the puzzle interface is used to enter the reference interface. All of the reference interfaces corresponding to each target pattern contain two areas. The right side is the Reference Image area (RI), where the position information of the target pattern prompt is displayed here. The left is a Target Pattern area (TP), displaying the target pattern that needs to prompt the position information. Click on the square button at the bottom of this area with the lightbulb icon to use the prompt feature. The prompt information for the target pattern will only appear after the prompt button is clicked. Area of interest (light blue dashed line) on the puzzle interface at the first stage of the first bronze mirror. Area of interest (light blue dashed line) on the reference interface of the first bronze mirror.

The prompt feature is the specific manifestation of scaffolding strategy in the game. When activated by the player, the scaffolding starts to work. It is important to note that using the prompt feature during the game is not mandatory, but only necessary when the player encounters difficulties and cannot identify the position of the target pattern. It is used to assist the player in completing the pattern and to facilitate the game-based learning. According to Ye’s et al. (2021) description of the game case: the design of scaffolding content divides the game process of each bronze mirror into two stages. In the first stage, the scaffolding will give clear prompt information, that is to display the location of the current target pattern on bronze mirrors and introduce the cultural information. This stage in the game case is mainly used for introducing the relationship of each pattern element to the players. The second phase was designed as the fading of the scaffolding with vague prompts, requiring players to infer the locations of target patterns of the same type in this stage based on the position and layout characteristics of the target pattern from the first stage and learn through trial and error. The scaffold fading in the second phase was used to analyze the impact of instructional strategies on game-based learning. The game process of the first bronze mirror is explained in Figure 4 with some screen shots of the game interface. The process of the second bronze mirror is consistent with the first, but it is more complex. Due to the complexity of the game flow chart, there is no need to draw the second bronze mirror. The process of the first level of the game case and its snapshot.
Instructional Video
In order to test the advantages of puzzle-based games in learning cultural heritage content, a control group with traditional non-interactive learning video was set up. The instructional video is consistent with the one used in the research of Ye’s et al. (2021). It contains voice and subtitles, through which students can comprehensively and effectively understand the art patterns on the bronze mirrors of Han Dynasty. The instructional video aligns its content and teaching process with game-based learning. The page layout in the instructional video adopts a scaffolding interface, ensuring consistency in learning between the participants in the video group and those in the game group. This approach also mitigates the impact of different interface layouts on visual cognition. However, unlike game-based learning, the instructional video lacks interactive elements. The video learning process is shown in Figure 5 with some screen shots. The part of instructional video and its snapshot.
Participants
A total of 48 university students (31 female) with a mean age of 19.02 years (SD = .32) and normal or corrected-to-normal vision were involved in this study. Most of them are from the School of Architecture, studying subjects like architecture and environmental design. We recruited participants for this study through open recruitment from the courses. Participants who claimed that they had experiences of playing computer games and learning with teaching videos were randomly assigned to the experimental group and control group (24 students in each group) by drawing lots.
Prior to the commencement of the experiment, all participants were subjected to informal interviews to ensure that they had no prior knowledge of the bronze mirror and ‘dragon’ pattern. Therefore, for all students involved in this study, the content of the instructional videos and games in the experiment was new, with no prior knowledge or learning background to support their learning. Finally, data from all participants were valid and analyzed.
Instrument
The pattern test was designed as a visual task to evaluate the appreciation performance of art patterns from cultural heritage artifacts. The test required participants to find the target pattern as a reference for recording and evaluating their visual behavior. The purpose is to examine the difference between the two groups’ visual strategies and attentional distribution after completing the game or video learning (Yarbus, 1967). The materials of the pattern tests used in Ye’s et al. (2021) research were modified that the test images were used as eye movement experiment stimulus materials. For a detailed description of the test content, see the appendix or the article of Ye et al. The stimulus images displayed in the test include two or more independent or combined patterns and the bronze mirror image of the patterns. In order to avoid the influence of similar stimulus image layout on visual behavior, the test forms are various, including one choice from two, one from multiple and region selection. Players were asked to find the target pattern with the correct direction by finding the bronze mirror images. Projects 1 to 3 are from the first bronze mirror and 4–5 are from the second bronze mirror. These five images are used as the basic test, which were all from the bronze mirror patterns in the game or video. The basic test was used to assess the acquisition of the contents displayed in the game or video and to further familiarize with the appearance and structure of bronze mirrors. The images of projects 6 and 7 contain two unfamiliar bronze mirrors. The illustration of the pattern structure of the two new bronze mirrors is shown in Figure 6. They do not appear before, but they have the same structure as the patterns in games or videos, that is, they all have the same visual strategy, which needs to determine the core pattern first, and then constructs the overall pattern recognition in turn. The two images are used as an extension test to evaluate the effects of transfer learning outcome by whether participants could search target patterns with the ideal visual strategy on an unfamiliar bronze mirror. During the test, participants were asked to refer to the bronze mirror image to answer. In order to avoid the influence of short-term memory on the test results and visual behavior of participants, the pattern tests were conducted one week after the learning process of the game or video to ensure visual freshness (Bisley, 2011). The stimulus images and the AOIs mark in the pattern test are shown in Figure 7. The structural features of the art patterns on the bronze mirrors in the extension test. Area of interest (AOI) on the pattern test images. (the AOIs is marked on the test image using different colors, lines, and abbreviations to represent different types. The green dashed line (C): the area of the correct pattern; The grey dotted line (W): the area of the wrong pattern; The dashed line in light blue or red (R): the area of the reference images (the whole bronze mirror); The yellow dashed line: the area of the target pattern (the local AOI)).

Procedure
In the experiment, each participant learned with the game or video by a computer (Intel Core i5-7300HQ processor at 2.50 GHz, 16 GB installed RAM, DELL Monitor 27-inch SE2717H at a screen resolution of 1920 × 1080 pixels and frequency 75 Hz) individually at their own pace. Before the experiment, the researchers investigated all the participants’ prior knowledge through interviews and introduced the experimental process and learning content. The game rules and operations were shown to the participants in the game group. To capture the participants’ eye-gaze behavior during game-based learning, we used Tobii Pro Glasses 2, which recorded the eye movements with gaze sampling frequency at 50 Hz. The study device allows the participants’ heads to move freely during experiments. Eye movements of the participants in the game group and computer screens in the game were recorded. After taking through the setup and calibration process described in the official technical manual, participants wore the Tobii Pro Glasses 2 to play. Participants in the video group will watch the instructional video explained in the Instructional Video section.
After completing one week of gaming and video learning, all participants, totaling 48, were asked to return for the pattern tests, while each participant’s visual behaviors and computer screens were tracked and recorded by the eye-tracking system. The test image will be statically displayed on the screen. The experimenter will first introduce the test content and ask the subjects to answer. The display sequence of the test images is shown in Figure 7 (Ye et al., 2021). Tobii Pro Glasses 2 are connected to the computer through Tobii Pro Glasses Controller software, monitor the eye movement of participants in real time. In the phase of basic test, some participants who can recall the outline of the target pattern may reply directly instead of viewing the bronze mirror image because the target patterns used for the test were shown in the previous learning content. If the answer is correct, they can go to the next. Otherwise, they will be asked to search the bronze mirror image to answer again.
Data Analysis
Before processing each eye-tracking metrics analysis, a Mann–Whitney U test was conducted to compare the scores of the two groups in the pattern test. The score of the pattern test is the number of correct results selected at the first time, and the correct results after correction are not included in the final score. Each correct choice is 1 point, a total of 7 points. The statistical results of the total score and each item score of the two groups in the pattern test are shown in Figure 8, and there was no significant difference in the pattern test scores between the two groups (Total scores: U = 286.5, Z = −.03, p > .05; Basic-test scores: U = 276, Z = −.26, p > .05; Extension-test scores: U = 271.5, Z = −.39, p > .05). It shows that although there are some wrong results in the two groups, most participants can find the target pattern and make correct responses, which means the visual processing of the participants was effective (Vecera et al., 2014). It provides evidence and support for further comparison and discussion of visual behavior. The total score of the two groups of participants in pattern test and the scores of each item (blue for game group, orange for video group).
Fixations were extracted using a customized velocity threshold identification (I-VT) algorithm in the ErgoLAB 3.0 (KingFar International Inc, 2016) software, as well as output eye-tracking metrics for each AOI for further statistical analyses. A series of Mann-Whitney U (MWU) tests were conducted on the eye-tracking metrics of each AOI on the test images to compare the differences in visual attention and behavior between the two groups. In order to examine the significance of participants’ visual attention and behavior in the pattern tests, we used binary logistic regression analysis to find out the relationship between the eye-tracking metrics and the final result of finding the target pattern and making the right choice. For the process of game-based learning, a series of Mann-Whitney U tests were conducted on the eye-tracking metrics of each AOI on the game interfaces to compare the differences of the visual attention and strategies with increased game difficulty. Finally, the multiple linear regression analysis was used to fit the participants’ visual attention to each interface in the game and recognition performance in the pattern test, to explore the effect of players’ visual attention in different interfaces on promoting recognition performance. All data were statistically analyzed using IBM SPSS 24.0.
Eye-Tracking Metrics Analyses
The eye-tracking metrics used in this study are all calculated based on AOIs from the test images and game interfaces. For the pattern test images, a total of three AOIs were defined according to the test content mentioned previously (see Figure 7 for details), including Correct patterns (C), Wrong patterns (W), Reference images (R). In addition, three local AOIs of the target patterns were defined in the AOI of reference images. The local AOIs of the basic test is the position of the target pattern of the test content on bronze mirror images, and the local AOIs of the extension test is the position of the core pattern, which is the key to recognize the structure of the bronze mirror pattern. Basic eye-tracking metrics of three AOIs were directly output through ErgoLAB 3.0, including Total Visit Count (TVC), Total Fixation Duration (TFD), Total Fixation Count (TFC) and Mean Fixation Duration (MFD). The eye-tracking metrics of the AOI of reference images used in this study were calculated by the sum or average value of the three local AOIs. Other special metrics were calculated by basic metrics. The hit rate of the target patterns is the percentage of fixation durations and counts in the target pattern area to the fixation durations and counts of the AOI of reference images, which were Percentage of Fixation Duration in local AOIs (the target pattern area) (PFD) and Percentage of Fixation Count in local AOIs (the target pattern area) (PFC). We compared all the above metrics between the game and the video groups to explore visual attention and behavior differences.
For the game interfaces, two AOIs on the puzzle interface and two AOIs on the reference interface were defined according to the game design mentioned previously (see Figures 2 and 3), including Bronze Mirror (BM) and Pattern Selection (PS) on the puzzle interface, Reference Image (RI), Target Pattern (TP) on the reference interface. The assembly process of each bronze mirror includes two stages and the layout of interfaces in these two stages is the same so that each AOI will appear twice. Basic eye-tracking metrics of AOIs, including Total Fixation Duration (TFD) and Total Fixation Count (TFC) and Mean Fixation Duration (MFD), were output to compare the attentional distribution of the first bronze mirror and the second bronze mirror on each game interface.
Regression Analysis
Although eye movement data can reflect the potential visual behavior, there are many explanations for the changes in eye-tracking metrics and experimental results (Rayner, 1998). The serious discussion of the experimental results is based on the combination of experimental tasks or scenes with participants’ eye movement data. Therefore, in this study, a binary logistic regression model is used to establish the correlation between visual attention on pattern test images and the right choice of participants. It provides evidence for further discussion and explanation of the differences in eye movement between the two groups of participants in the pattern test. At first, all the results of the pattern tests and eye movement data, including the basic test and the extension test (a total of 336 observations (each subject was observed 7 times in the pattern test), 240 of which are from the basic test, and 96 of which are from the extension test) were used for a binary logistic regression analysis. The basic eye-tracking metrics were used as predictor to determine which metrics were helpful to make a right choice in the pattern tests, which was a dichotomous variable coded as 1 = right choice and 0 = wrong choice. A backward entry method was used, with likelihood ratio as removal criteria, because this method fits well with a more explorative design.
In order to examine the influence of the game process on the participants’ recognition of bronze mirror patterns, a multiple regression model was used to analyze the influence of the player’s visual attention area in the process of game on improving the visual recognition performance of the target patterns in bronze mirrors. The hit rate of the target pattern (PFC) in the pattern test of the game group is used to evaluate the visual recognition performance, and the percentage of the player’s fixation time (PFD) for each AOI on the game interface is the independent variable, that analyses the possible multicollinearity among factors affecting recognition performance. On the basis of extracting the fixation duration allocated on different areas of interfaces in different stages of the game, the metric of PFD is the proportion of the fixation duration to the duration in focusing on the area (total visit duration). However, the sample size used to fit the model is small (n = 24), so the analysis results of the regression model are only used to explain the factors that may affect the recognition performance in the puzzle-based game.
Results and Discussion
Eye Movement in the Basic Test Images
MWU Tests on the Eye-Tracking Measures for the AOIs of the Game and Video Groups in the Basic Test.
*p ≤ .05, **p ≤ .01, ***p ≤ .001; TVC: Total Visit Count; TFD: Total Fixation Duration; PFD: Percentage of Fixation Duration; TFC: Total Fixation Count; PFC: Percentage of Fixation Count; MFD: Mean Fixation Duration; Effect size: large: |r| ≥ .5, medium: .3 ≤ |r| < .5, small: .1 ≤ |r| < .3.

MWU tests on the eye-tracking measures for the AOIs of the Game and Video Groups in the content of Bronze Mirror I and II of the Basic test (*p ≤ .05, **p ≤ .01, ***p ≤ .001; BM-1: Bronze Mirror I; BM-2: Bronze Mirror II; Effect size: large: |r| ≥ .5, medium: .3 ≤ |r| < .5, small: .1 ≤ |r| < .3).
The differences between the two groups in TFD and TFC are mainly focused on the target AOIs. Although the video group hit the AOIs of the target pattern more frequently in the bronze mirror area of the test image than the video group (see Table 1), the gaming group performed better in terms of hit rate (see PFD and PFC). It shows that the visual search accuracy and efficiency of the target pattern are higher among the participants in the game group, that is, they have a better recognition performance for the characteristics and layout of the bronze mirror pattern, especially in the test contents of Bronze Mirror I (Pattern tests 1–3) (see Figure 9), which linked to the complexity of the pattern structure.
Finally, in all AOIs on the images of the basic test, the mean fixation duration of the game group is significantly higher than that of the video group (see Figure 9). In general, a higher fixation rate is associated with the lower difficulty of visual tasks (Nakayama et al., 2002). Therefore, visual recognition tasks in the basic test may be easier for participants in the game group. The single fixation duration on the AOIs of correct and wrong patterns is shorter in both groups, so the visual search was the main mode in this stage (the mean fixation duration of the visual search task is 180–275 ms (Holmqvist et al., 2011)). However, the single fixation duration on the AOI of the target pattern is significantly longer, especially for the participants in the game group. Therefore, their main mode at this stage was scene content recognition rather than quick visual search (the mean fixation duration of scene perception task is 260–330 ms (Holmqvist et al., 2011)), which can provide evidence for participants in the game group to show better visual search and recognition performance in the basic test.
Eye Movement in the Extension Test Images
MWU Tests on the Eye-Tracking Measures for the AOIs of the Game and Video Groups in the Extension Test.
*p ≤ .05, **p ≤ .01, ***p ≤ .001; TVC: Total Visit Count; TFD: Total Fixation Duration; PFD: Percentage of Fixation Duration; TFC: Total Fixation Count; PFC: Percentage of Fixation Count; MFD: Mean Fixation Duration; Effect size: large: |r| ≥ .5, medium: .3 ≤ |r| < .5, small: .1 ≤ |r| < .3.

MWU tests on the eye-tracking measures for the AOIs of the Game and Video Groups in the content of Bronze Mirror I and II of the Extension test (*p ≤ .05, **p ≤ .01, ***p ≤ .001; BM-1: Bronze Mirror I; BM-2: Bronze Mirror II; Effect size: large: |r| ≥ .5, medium: .3 ≤ |r| < .5, small: .1 ≤ |r| < .3).
Compared with the basic test, there was no significant difference in the visual search efficiency of two groups of participants in the extension test. Even so, it can be seen from Figure 10 that the visual search time of participants in the game group is still less than that of the video group’s, but the frequency of scanning the AOI of the target pattern becomes more (the result of TVC). In addition, the performance of the two groups of participants in recognizing the target pattern on two unfamiliar bronze mirror patterns is significantly different. TFD and PFD of participants in the game group are significantly higher than that of the video group’s, that is, participants in the game group will more accurately hit the core pattern on the reference image of the bronze mirrors, especially in extension test 1 (see Figure 10). Unlike the basic test, participants in the game group pay more visual attention to the AOIs of the target pattern, which indicates that they have made in-depth observation and recognition of the bronze mirror image and put more mental efforts into the recognition process. Finally, on the AOIs of the target pattern, the single fixation duration of participants in game group is significantly longer than that of the video group (see Table 2), which indicates that there is a visual search process of participants in game group on the bronze mirror area with stable performance and depth recognition (Gandini et al., 2008), especially in extension test 1 (Figure 10).
Results of Logistic Regression for Predicting Recognition Results by Eye-Tracking Metrics
Binary Logistic Regression Analysis on the Impact of Visual Attention During the Pattern Test Process on Making Right or Wrong Choice in the Pattern Test (Standardized).
*p ≤ .05, **p ≤ .01; CI: Confidence interval; TVC: Total Visit Count; TFD: Total Fixation Duration; TFC: Total Fixation Count.
After excluding the non-significant factors in the final model, there are four eye-tracking metrics based on AOIs, which can be used to predict the recognition results of participants in the pattern tests. Two of them can promote correct recognition and the other two are disadvantageous. In the process of find target patterns on the bronze mirror image, visual recognition and recall with stable performance are effective cognitive mode most of the time. Furthermore, continuous visual attention to the AOIs of the target patterns is conducive to the correct result, rather than fast visual search and matching, which there will be a negative effect by frequent scanning and transferring on each AOI. However, the results of the model can only be used to predict the results of most of the pattern tests. The eye-tracking metrics will also be different for the difficult recognition process caused by different pattern structure complexity.
According to the comparison of the visual attention of the two groups of participants in the pattern tests, the visual transfer behavior of all participants suddenly became frequent in extension test 1, which is related to the difficulty of recognizing the structure of the complex pattern. It usually makes participants pay more mental efforts and even disrupting the recognition process, so there are high TVC and more scanning. However, in the basic test, because the test content appears in the game, some participants in the game group can recall directly from long-term memory. Therefore, although they put less visual attention than the video group, they still have a correct recognition result. Finally, no matter how much visual allocation is, accurately hitting the target patterns is the basis of effective visual recognition.
In addition, the results in the model show that it makes sense to allocate attention to the area of the correct pattern, because dwelling on the correct option for a long time can help recognize correctly or recall from memory (Loftus, 1972). However, allocating attention to the wrong option will have a negative effect. Because selecting the correct target pattern from two or more options is the main test form, too much attention on the wrong pattern may lead to wrong recall or confusion of correct recognition. However, the results of the model are only applicable to most cases, and there are differences in the process of checking and recognizing options in different tests.
Eye Movement in the Puzzle-Based Game Interfaces
MWU Tests on the Eye-Tracking Measures for the AOIs of the Bronze Mirror I and II in the Puzzle Interface.
*p ≤ .05, **p ≤ .01, ***p ≤ .001; TFD: Total Fixation Duration; TFC: Fixation Count; MFD: Mean Fixation Duration; Effect size: large: |r| ≥ .5, medium: .3 ≤ |r| < .5, small: .1 ≤ |r| < .3.
MWU Tests on the Eye-Tracking Measures for the AOIs of the Bronze Mirror I and II in the Reference Interface.
*p ≤ .05, **p ≤ .01, ***p ≤ .001; TFD: Total Fixation Duration; TFC: Fixation Count; MFD: Mean Fixation Duration; Effect size: large: |r| ≥ .5, medium: .3 ≤ |r| < .5, small: .1 ≤ |r| < .3.
In the process of fitting the patterns of two bronze mirrors, the difference in the puzzle interface appears in the area of pattern selection in the first stage. MFD of the two groups of participants increases significantly after entering the stage of bronze mirror II. In general, the extension of single fixation duration is directly proportional to the increase of cognitive load (Buettner, 2013). It shows that the difficulty of the game increases with a more complex pattern structure, and the process of fitting disrupted pattern pieces together is always difficult. Therefore, players have to make longer visual transactions on the target pattern with more cognitive load.
Table 5 indicated that the difference in the reference interface appears in the area of reference pattern in the first stage of fitting the patterns of two bronze mirrors. In the first stage of bronze mirror II, players allocate more visual attention and transaction duration on the reference image than bronze mirror I. That is because players need to get the prompt information of the target position of the pattern piece from the reference image. As the pattern structure becomes more complex than the previous one, it is harder to recognize and transfer, so players go over the reference image repeatedly and pay more attention to match the position of the target pattern.
Results of Multiple Linear Regression for Predicting the Improvement of Recognition Performance by Visual Attention on Game Interfaces
The Impact of Significant Predictors for Visual Transactions (PFD) in the Process of Game (Puzzle and Reference Interface) in Recognition Performance (PFC) From Multiple Linear Analysis.
PFC: Percentage of Fixation Count; PFD: Percentage of Fixation Duration; BM: the area of Bronze Mirror on the puzzle interface; RI: the area of Reference Image on the reference interface.
After excluding the non-significant factors in the final model, there are two areas on each game interface that affects the recognition performance of players with different duration of visual transactions. Among them, a longer visual transaction in the area of pattern selection on the puzzle interface is conducive to improve recognition and transfer performance, while there will be a negative effect by allocating more visual attention to the area of a reference image. The results of regression analysis provide evidence for the change of visual attention caused by the complex structure of bronze mirror pattern, that is the results of MWU tests on the eye-tracking measures for the AOIs of players in different bronze mirrors of the puzzle game. However, not all the mental efforts to overcome the cognitive load are effective. The result (Figure 11) of the regression model shows that those players who check each pattern content more carefully in the process of fitting disrupted pattern pieces will have a better recognition or memory performance after the game, while those who try to search and recognize the target pattern on the reference image do not have ideal results. Linear relationship between recognition performance and visual transactions in the process of game.
The distribution of attention on different game interfaces or areas reflects the strategies used by players to achieve the goal of fitting together the art pattern pieces of the puzzle. However, different strategies lead to the difference in players’ achievements after the game. As a scaffolding in the game, the reference interface is mainly used to prompt the position of the target patterns to guide the players to construct the recognition mode of the art pattern structure and modeling features. However, the process of retention and transfer of the art pattern’s is mainly in the process of recognizing the target pattern on the puzzle interface (the area of Pattern Selection) and placing it on the bronze mirror (the area of Bronze Mirror). The behavior of allocating more attention to the reference image may reflect the players’ dependence on scaffolding. The auxiliary information obtained from scaffolding as short-term recognition memory may help to achieve the game goal, but it is hard to be retained or transferred, making the players who allocate more visual attention transactions on the reference image can not correctly and independently recognize the art pattern on the bronze mirror after the game.
Conclusions
This study investigated the effect of using a puzzle-based game with scaffolding aid in promoting recognition performance and appreciation skills of cultural heritage content. A serious game that introduces the form semantic and layout characteristic of bronze mirrors in the Han Dynasty and teaches the recognition skills as an example was used in this study (Ye et al., 2021). The same content of the non-interactive instructional video was used as the control group. Eye-tracking technology was used to track and observe the visual attention and transaction of the two groups of participants in the pattern test process after learning to evaluate their recognition performance of art patterns. Furthermore, it also records the distribution and transfer of players’ attention during the game to explore the visual behavior and knowledge acquisition process during the puzzle-based game. Hence, this study will proceed to discuss the conclusions and contributions by integrating the purpose, research questions, and the findings obtained from inter-group comparison of pattern tests and eye movement measures during the puzzle-based game.
Regarding the first and second research questions, the results of comparing the eye-tracking metrics of the two groups of participants in the process of search and recognition in the pattern test indicated that participants who learned through puzzle-based games showed better recognition and recall of the art patterns on the bronze mirrors that used in the experiment (results of basic tests). Furthermore, they also show the process of visual strategy recognition and better metacognitive controls of visual attention over the unfamiliar patterns of bronze mirrors in extension test (results of extension tests) (Tsai et al., 2016), which reflected that they had mastered the observation skills for recognizing art pattern structures to appreciate the art patterns on the bronze mirrors. However, participants who use non-interactive video show more mechanical visual search. The puzzle-based game is a kind of serious game with meaningful and inspiring, which is useful for improving potential thinking skills and visual mode (Hsu & Wang, 2018). The results of this study further confirmed its advantages in promoting cultural heritage content learning.
The investigation of the replicability of serious games in enhancing a wide range of cultural heritage content is of paramount importance, and it is precisely the underlying motive for addressing the third research question (Tsai et al., 2016). Therefore, this study examined how the influence of improving players’ recognition performance of art patterns in bronze mirrors made with the visual attention distribution and transactions in the process of puzzle action and viewing prompt content in scaffolding for the third research question. The difference of attention distribution in the restoration process of two bronze mirror patterns with different complexity in the game indicates that the players put more mental effort into which areas. The results of multiple linear regression analysis show that the difference area and stage of visual attention allocation will affect their recognition performance improved in the game. While incorporating scaffolding in serious games has been shown to enhance the effectiveness of game-based learning, a meticulous analysis reveals that excessive emphasis on scaffolding may produce unintended adverse effects, as evidenced by regression analysis. In application, intervention measures should be set to avoid players relying on prompt information or enriching the functions of scaffolding. In addition, the game mode based on jigsaw puzzles can effectively promote the players to learn the art patterns on the bronze mirror of the Han Dynasty, especially the visual transaction in the process of recognizing the target piece of patterns, which is effective in retention and transfer.
Overall, the puzzle-based game with scaffolding aid promotes the learning of art content in cultural heritage. The interaction mode and instructional strategy in the game can improve the visual recognition of the player, teach them how to appreciate art patterns and enhance the learning performance. On the basis of the existing research, this paper makes an in-depth study by eye tracking technology. The research findings not only confirm the effectiveness of puzzle-based games with scaffolding aid in promoting the learning of art content within the domain of cultural heritage, but also provide evidence for the role of game-based learning in guiding players to perceive and acquire underlying skills (such as observation) through game interactions to enhance knowledge transfer. However, this study has certain limitations. Firstly, the game cases used in the experiment are relatively simple, and the content of the game only contains two examples of bronze mirrors. A complete game with more levels and learning content may have better learning performance. Secondly, the result was limited by the small sample size. Although the statistical method suitable for a small sample and rigorous tests are used and carefully discussed to improve the accuracy of the experimental results, it should be noted that further research with a larger sample size is needed to validate the research. Furthermore, due to limited experimental funding and participant recruitment capability, strict control over gender balance among the participants was not achieved. Although the influence of gender on the research findings of game-based learning was minimal, ensuring strict control over participant gender would yield more rigorous results, which is an area of focus for our future studies. Additionally, due to limitations in the experimental platform and eye-tracking equipment, it was unable to collect data on visual attention and trajectories during the learning process for the video group participants. If future upgrades to more advanced eye-tracking equipment and analysis software are feasible, we will supplement the study with an investigation into the differences in learning process between the two groups of participants. Finally, the research was limited by a short time frame. As a result, the evaluation of knowledge transfer performance can not provide thoroughly reliable evidence. Further research is needed with a longer time frame to understand such impact.
Footnotes
Acknowledgements
This research is supported by 2022 Teaching and Academic Research Independent Key Topics in CUMT (number: 2022DLZD07): “Research on the Digital Transformation of Public Art courses in Engineering Universities” ; 14th Five-Year Plan of Educational Science in Jiangsu Province (number: T-b/2021/06): “Research on New Infrastructure of Digital Resources from the Perspective of Cultural Heritage Aesthetic Education”.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the 2022 Teaching and Academic Research Independent Key Topics in CUMT (2022DLZD07), 14th Five-Year Plan of Educational Science in Jiangsu Province (T-b/2021/06).
