Abstract
Systematic evidence reviews draw together findings from multiple studies, helping researchers and decision makers to understand patterns of research and findings across varying contexts and research methodologies. They have become more popular over the last twenty years, with various guides discussing the different ways in which they can be conducted and the issues arising in this process. This case study of a systematic review of the factors shaping children’s digital skills explores the challenges, risks and potential strategies in this process, as those involved in that review reflect upon the various judgements involved in choosing inclusion criteria, filtering and coding studies and synthesising the material collected.
Introduction
In his classic book What is History? Carr (1961/1987) compares the study of history to fishing: The (historical) facts…are like fish swimming about in the vast and sometimes inaccessible ocean: and what the historian catches will depend, partly on chance, but mainly on what part of the ocean he chooses to fish in and what tackle he chooses to use – these two factors being, of course, determined by the kind of fish he wants to catch (1961: 19–20).
The approach taken here is first to outline the nature of systematic evidence reviews. The next step is to describe the systematic evidence review of children’s digital skills that was conducted as one part of the wider EU-funded ySKILLS project (Haddon et al., 2020). This article next reflects upon the initial decisions involved in searching databases. It then examines the subsequent filtering and coding of the studies found before moving on to chart decisions influencing how this material is organised and ‘mapped’. In so doing, it indicates numerous choices about how to transform an ontological object (i.e., a phenomenon out there) into an epistemic one (i.e. something worth being studied) (Caronia, 2011).
Systematic evidence reviews
Systematic evidence reviews originated in health research, as documented in the Oakly’s account of the early work of the EPPI 1 -Centre, the founding institution in this field (Oakly, 2017). But in the last 20 years they have expanded into other disciplines, especially the social sciences. For example, and of relevance to the project documented in this article, van Laar et al. (2017) and Schreeder et al. (2017) are reviews looking at digital skills more generally. Other examples from social science research are Nef et al. (2013) looking at social networking sites (SNSs) and older users, and Williams (2019), examining SNSs and social capital.
As the term ‘systematic’ suggests, this mapping involves a more rigorous approach than a ‘narrative’ literature review that relies on the researchers’ prior knowledge of the field (Grant and Booth, 2009; Gough et al., 2017a). It entails developing a strategy to search research databases for relevant material, and then to select for detailed analysis those items most likely to address the research question under consideration, in order to build up a comprehensive understanding and evaluation of the field 2 . The search words, search strategies and all criteria used in filtering out inappropriate material are made explicit for the reader to follow and, in principle, reproduce (re-analyse the same data) or replicate the exercise (use the same process to analyse different data). 3 Thus transparency, rigour and comprehensiveness are often cited as virtues of the systemic review.
In describing a typology of reviews, Grant and Booth (2009) note that different types of reviews may have different specific goals, but that the general objectives of a systematic review are to summarise what is known in a field in order to draw recommendations for practice, and to identify gaps in the literature as directions for future research. That said, one useful distinction made be Gough et al. (2012) is between aggregative and configurative reviews. In the former the emphasis is on making empirical statements about the findings in a field of study. The authors refer to adding up and averaging findings when seeking evidence to inform decisions. Configurative reviews focus on how to interpret a field, for example, how to make sense of variation within the studies and how these fit together. These authors note that some studies contain elements of both these processes, and indeed the project reported in this article covered both a review of the evidence and a mapping of the field.
The particular systematic evidence review discussed in this article was undertaken as part of the ySKILLS project funded by the EU’s Horizon 2020 programme. That broader project involved 15 partners from 13 countries and aimed to enhance and maximise the long-term positive impact of the information communications technology (ICT) environment on multiple aspects of wellbeing for children aged 12–17. The review focused on both factors that influenced the development of children’s digital skills (antecedents) and the consequences of those skills. It was primarily intended to inform the development of a survey for that project, as well as being a research output in its own right. Although that review covered both antecedents and consequences of children’s digital skills, this article will focus on the antecedents part of the review, i.e. the mapping and evaluation of which aspects of children’s lives might have a bearing upon their digital skills. The team that carried out the review consisted of native English and Italian speakers, the latter having a very high standard of English, so language was not an issue. They had varying degrees of familiarity with this literature and at various points showed their planning and intermediate findings to those within the project who had greater expertise as a check. Two of the research team had conducted several systematic reviews prior to this one, and so had developed relevant practical competencies (as identified by Oliver et al., 2017).
The database search process
All systematic evidence reviews involve decisions about criteria to be used for including and excluding material (Brunton et al., 2017; see Figure 1), and in this case study, most of those parameters were decided at the very start of the review process. It concentrated on children aged 12–17, not all children, reflecting the focus of the wider EU project. It selected only quantitative studies since the aim was to inform both what questions should be in the survey and particular conceptual models that were under development within that project. The review only considered published material. While this involved a form of quality control because of the process of peer review in publication, it ran the risk of missing relevant material in non-published reports, book chapters and theses 4 (Brunton et al., 2017). One particular concern of the literature on publication bias, is that results that are statistically significant are more likely to be published (Dwan et al., 2013).

The database search process
Like many other systematic reviews, and reflecting the language competencies of the reviewing team, only studies where the full text was available in English were considered – even though this runs the risk of bias, if, for example, researchers prefer to publish positive results in English-language journals (Brunton et al., 2017). Full details of the search process were pre-registered with PROSPERO, with record ID CRD42020172272. That process involved another form of quality control since most of the steps and precautions taken had to described in some detail in order to make sure the review reached a recognised standard. 5
However, review parameters can develop even during the review process (ibid), especially in an exploratory study where it is difficult to make all judgements in advance when the nature of the field is not yet known. In the case study, the original proposal covered the last 30 years of research, until it became clear that there was virtually no research until 20 years ago, and the vast majority of even these studies took place in the last 10 years. The review team judged that these latter studies were also likely to be more relevant for the empirical component of the project given the rapid change in the nature of the digital world over time. Hence, the review eventually focused only on the period 2010–January 2020, the search being conducted at the end of that final month.
This decision also reflects the fact that the staffing and time frames available for conducting any review can itself have a bearing upon a review’s parameters (Gough and Thomas, 2017). In fact, the review described here was relatively well resourced, with six researchers working on the project after the search stage, collaborating over a total period of ten months. That is not to say that reviews with fewer resources are less systematic, but that they are necessarily more focused in order to make the review process more manageable within the time limitations and resources available for the analysis.
A next step was to decide how to operationalise these search parameters. To do this, one aggregate database, Web of Science, was used as a testing ground to check the effectiveness of employing different search words concerning children, technologies and skills. A sample of abstracts of the items that were found by different experimental searches were evaluated to decide if these search terms did indeed produce results that were of interest and, equally significant, the extent to which they led to results that were irrelevant, that might be considered to be ‘noise.’ For example, searching for the words ‘digital’ and ‘skills’ as separate terms produced an enormous amount such noise. Hence, the best strategy to overcome this problem was to search for phrases like ‘digital skill*’. The value of this experimental step was to avoid the later potentially time-consuming task of reading thousands of irrelevant abstracts, given the fixed resources of even this project.
Eight databases were searched both for more comprehensive coverage and to overcome the problem of disciplinary bias associated with any one database (Brunton et al., 2017). Some of these databases allowed searches of the whole text, but it became clear that this increased the chance of finding irrelevant material. One of the search terms may be present in the main text, but be mentioned just in passing rather than being a central element of that publication: e.g. ‘future research might look at’ (followed by one of the search words). Hence, it was decided to search only by title, keywords and abstract.
As indicated in Figure 2 below, that original search process produced 4,811 items. A majority of these still failed to meet the criteria of the project, and so a first step was to check (‘screen’) the titles and abstracts according to the three criteria so far discussed, plus an assessment of quality (whether the study was robust enough and had a suitable method). Before applying those criteria (e.g. about whether a study dealt with the right age range, whether a study covered children’s skills), the team trained to make sure that all team members were making judgements in the same way, and established a process for adjudicating grey areas. Intercoder reliability was checked after this training. Nevertheless, a different project with a different team could have taken all these same precautions, but made different decisions about how to apply the criteria. In principle, this same observation could apply to a content analysis of, say, newspapers. In any one study there could be agreement about the particular way to operationalise the process of classifying content, and there could be tests afterwards to check that coders coded in the same way – but a different project might have operationalised this process differently.

Flow diagram of the search and filtering processes in the mapping process.
This stage in the screening process produced a sample of 301 items. Here, as in every stage in the process, it became clear that it was necessary to make yet further decisions about what to include and exclude that had not been considered at the outset of the review process. For example, small-sample experiments were excluded because of the wider policy in the review of excluding studies with very small samples, even though the methodology might be robust and the risk is of losing findings that may only occur in these types of studies.
The next step involved reviewing all 301 full texts in order to score them on three criteria derived from a ‘weight of evidence’ framework (Gough, 2007). In effect, this was another quality control decision, where the team developed detailed checklists of considerations to take into account when valuating (a) the overall quality of the methods used, 6 (b) how the study discussed and measured digital skills 7 and (c) how the study discussed and measured antecedents (and consequences) of digital skills. 8 Each of these three criteria had a score of 1–3 (poor, fair, good), which was used to produce an average, where studies with scores above 2 overall were retained. Given that each criterion had its own list of guidelines on how to score it, this stage involved the most complex judgements by the team members in the whole filtering process. The team trained first together to make sure they scored on the same basis, set up a process of adjudication, and checked intercoder reliability, as in the previous stages. But for the same reasons noted above, there is always scope for another team to make other evaluations of which studies to include and exclude at this stage, for instance, by virtue of drawing up different guidelines for scoring the three criteria.
The final sample size after this step was 110 publications. Within the final sample, 88 studies analysed at least one antecedent of digital skills. 9 However, even the decision about which findings represented antecedents, and which represented consequences of digital skills often involved some judgement. For instance, there were six studies that analysed learning outcomes as an antecedent of digital skills, and nine that analysed learning outcomes as a consequence of digital skills. Of those, the vast majority (85%) were based on analysis of correlations from survey data. It is therefore a matter of judgement whether to interpret these correlations as the effect of digital skills on education, the effect of education on digital skills, or some other more complicated relationship. This evidence review relied on the original study authors’ interpretation of whether a particular finding represents an antecedent or consequence, but other reviewer teams may justifiably treat these cases differently.
In sum, there are many decisions shaping what material is found in the original search and then what material is subsequently selected for analysis through the filtering process. Clearly, these decisions in part reflect the goals of the particular project and the resulting inclusion criteria, they could be specified in advance and hence considered deductive modes of reasoning. But there were also unanticipated situations that arose and required solutions during the course of the research, a more inductive approach. 10 Finally, in filtering the material from the search process, the teams’ decisions were also be influenced by the practical consideration of finding a ‘manageable’ number of studies, that can be examined by the team of a certain size operating within certain time limits.
The coding and mapping process
Following on from data collection, this last section explains the processes at work in the analysis stages. First, it outlines how the original coding frame was developed but why and how this was adjusted once a first stage of coding had taken place. Then there are issues related to the diverse ways in which digital skills are defined and measured in the various studies: while such skills are particular to this review, the equivalent situation could emerge in other systematic reviews on different topics. Finally, the last section describes the challenges that arose for a variety of reasons when trying to make sense of and characterise the results for the different antecedents.
Developing and completing the coding frame
Returning to the historian Carr (1987), after providing his fish metaphor he deals with the challenge that all histories involve some interpretation of ‘the facts’. That is also reflected in the categories used by the historian, for example, dividing history into periods and then deciding their boundaries (for example, when the Middle Ages starts and ends) or dividing histories geographically (leading to questions of whether Russia should be included in Europe, for example – 1987: 56). Carr implies that we need to be reflexive about such decisions, but regards such categorisation as a necessary process, a ‘tool of thought’ (1987: 55) that can be illuminating. This section reflects upon the equivalent processes of categorisation in the case of systematic evidence reviews, which can also be seen as the part of configuring process noted by Gough et al. (2012).
As a first step to reviewing the material collected, the details of each study were entered into a ‘coding frame’ (see Figure 3). This coding frame was used to record details of the study itself (e.g., the country to which it related, the method and the sample size) and details of how the study measured digital skills (e.g. were children asked for their own assessment of their skills, or were they tested?). Which antecedent was being examined in each study was also coded 11 as were the details of any results relating to antecedents of digital skills (e.g., whether the effect was statistically significant, and if so, the direction of the effect).

The coding and mapping process.
This was the stage of devising an initial tentative framework based on a field of study that was relatively well developed, in what has been called a ‘framework synthesis’ (Thomas et al., 2017). It involved making judgements about the broader categories to be employed as antecedents to digital skills (e.g., the ‘personal attributes’, ‘social context’, ‘ICT environment’) but also decisions about the particular antecedents to be included in these categories (age, gender, etc.). Examples of these antecedents were at this stage provided to the coders so that the research team members would have some idea about what each broad category covered.
Some of these elements were straightforward to code, for example, all studies clearly stated the country of the study participants. However, there were two types of challenges involved in this coding process.
First, some papers contained incomplete information about their data and methods. For instance, many of the studies in our sample did not specify the ages of participants, instead providing information only on participants’ school year, or stating that the children were in ‘elementary school’, ‘middle school’ or ‘secondary school’. The reviewing team followed a policy of estimating ages based on the typical ages of children within each school year in the relevant country. A different project team might have treated this information as missing 12 , which could have made a difference to conclusions about the age of children, given that about a third of studies reported school type and year rather than age per se.
Second, in coding analytical results on the associations between antecedents and digital skills, there were some practical, conceptual, and methodological challenges. For example, it was decided not to code associations between antecedents and digital skills that were evident in tables (e.g., in tables of correlation coefficients between all variables used in analysis), but which were not discussed in the actual text of the article reporting the study, although a different team may have chosen to do otherwise.
As another example, when completing the coding frame, the team chose to only record whether the direction of the effect was ‘positive or negative’ if the estimated effect was statistically significant. Some systematic reviews, particularly those that also include a meta-analysis, report the direction, precision and magnitude of findings that are not statistically significant (see, e.g. Gewandter et al., 2017).
These challenges are present in all types of systematic reviews – including in medical sciences, where systematic reviews are used more regularly. However, while many medical reviews would typically limit their scope to articles analysing the impacts of an intervention, for which most articles will have a single preferred estimate, research articles in the social sciences may report many estimated associations within a paper – as was the case in this review. It has already been noted that at the filtering stage this required the review team to make many and diverse judgements when deciding which studies were relevant to the research question. Now, at the coding stage, it was because of this diversity of frameworks and methods used in this social sciences study that the review team was unable to fully pre-define the coding framework that would be included in the review, as would be more typical in a medical systematic review (Higgins et al., 2021).
Mapping antecedents into categories
After coding each individual study, a next step in any mapping exercise involves grouping antecedents into broader categories, also known as descriptive themes (Thomas et al., 2017), in order to synthesise the studies and make statements such as ‘there are more studies about antecedent x compared with studies about antecedent y’. This was an iterative process, because it became increasingly clear that the categories used in the initial coding of the studies, as described above, did not fully align with the content of the actual studies that had been assembled – some new categories emerged that had not been anticipated, while others became less relevant. While some studies based on the original coding frame (e.g. gender) proved unproblematic, other labels, such as ‘education attainment’ were emergent, added during the next stage of the coding (or ‘re-coding’). This involved making further choices, to produce a revised, emergent framework (ibid). For example, instead of just ‘personal attributes’ the decision was made to distinguish between ‘ascribed personal attributes’, ‘achieved personal attributes’ and ‘digital personal attributes’. Then there was some re-labelling of individual antecedents in order to describe better the material actually collected (e.g., ‘vulnerabilities’ became ‘mental and physical health problems’).
Specifically, in the first round of coding there were quite a few studies where the topic did not fit into the pre-defined categories and so were classified as ‘other’. Hence, one next step in the process of developing the emergent thematic framework was to try to reduce the number of studies labelled ‘other’. One strategy was to see whether they could fit under some existing antecedent label. Another was to see whether a new antecedent could be added that would group together a few of these ‘other’ studies: for example, ‘perceptions and attitudes’ was one such new category. Yet another strategy was to see if some of the existing antecedents could be re-named (i.e., re-characterised) in order to include some of these ‘other’ studies: for example, ‘frequency of use’ of digital devices became ‘frequency and amount of use’. In other words, there was a process of developing a better, revised map of antecedents and that can be shown by comparing the original coding frame (Table 1) with the final coding frame (Table 2). This was arguably the clearest case of (re-) configuring the data. Or, returning to Carr’s fishing metaphor, this was that stage of rethinking how to classify (and then re-classify) the fish found in the ocean.
Original coding frame
Final coding frame
Skills issues
Before moving on to a more detailed description of the review process, it is worth noting a particular challenge arising from the nature of specific this field of study: the sheer diversity in terms of how digital skills are measured, how skill levels are defined, and which types of skills are considered to be ‘digital’.
In different studies, digital skills were measured in various ways: for example, self-efficacy measures (self-confidence in one’s ability to achieve different goals), measures of particular knowledge claims (‘I can do X’) or action taken (‘Sometimes I use an online account with a different name, so that other people believe I am a different person’) and performance tests where children are asked to demonstrate their skills. And within each of these measures there was further variation. Self-efficacy ranged from measuring narrower skills (e.g. confidence in ability to remove a virus from your computer) or general ones (e.g. being comfortable using digital devices). Performance tests had diverse formats, including requiring the child to achieve a specific goal in a simulation test or demonstrate knowledge by answering factual multiple choice questions.
Even skills could not be neatly classified as ‘basic’, ‘intermediate’ or ‘advanced’. This reflects both the age range of children (‘advanced’ skills for a 12-year old may be considered ‘basic’ or ‘intermediate’ for a 17-year old), and the skills that are tested. Instead of classifying skills into levels, some researchers focus on ‘Functional skills’, such as ability to open an email attachment or search for information online. Others focus on ‘Critical skills’, such as ability to critically evaluate information found online. While ‘functional skills’ are generally basic ones, some may require intermediate or advanced skills, in the sense that a beginner could not achieve this goal. ‘Critical skills’ is often a version of ‘advanced’ (in the sense of multi-stage) but implying some interpretation is taking place more akin to media literacy.
Meanwhile the domain addressed by skills covered broader headings like ‘informational skills’ (e.g. searching for information), ‘social interaction skills’ (e.g. having an awareness of the conventions of social communication), ‘content creation skills’ (e.g. design and editing skills) and ‘programming/coding skills’. But some studies focused on very specific ‘skills’ such as ethical behaviour online and digital safety skills.
Sometimes, as will be demonstrated below, it is the very variation in the skills studied or how they are measured that can be used to explain differences in findings, for example, when certain measures lead to one conclusion but different measures lead to the opposite. That very diversity illustrates the value of a systematic review. But it also provides a challenge in the review stage, when seeking to make statements about the balance of findings relating to a particular antecedent. To use Carr’s fishing metaphor, it became clear that fish which initially looked the same were, in fact, different species.
The synthesis processes
As part of the process of synthesising findings from the 88 studies covered in this review, the original papers relating to a number of studies of antecedents were re-examined in order to understand why those researchers had looked at a particular topic (See Figure 4). The reasons were diverse. For example, in relation to ethnicity, some looked at skills and ethnicity simply because ethnicity had been found important in other walks of life (e.g. Duarte et al., 2013). Others based their studies on previous skills research on that particular antecedent (e.g. Hatlevik et al., 2015). The case of ethnicity also exemplifies a common distinction, whereby some studies aim to test multiple possible antecedents of digital skills in order to determine which were more influential, 13 while others focused on just one antecedent, providing a literature review to support either one or a few hypotheses. 14 This exploration of the rationales of various studies helped to provide a sense of how the map of the field (here, studies of antecedents of digital skills) emerged from a combination of very different research strategies.

The synthesis process.
Since no research takes place in a cultural vacuum a second step for each antecedent was to note the pattern of countries where the research took place. Realities, values, and even epistemological stances regarding what to consider a variable of interest and how to explore it can vary across countries. It became evident that most studies from our sample were conducted in the Global North or middle- and upper-income countries, meaning that the findings from the systematic review primarily reflect Western or at least economically advanced parts of the world.
Turning now to the aggregative process of analysing results, sometimes comparing studies was relatively straightforward in spite of the heterogeneity of approaches and measures employed. For example, in the case for age the vast majority of the research indicated a positive relationship between age and digital skills, with children becoming more skilled as they grow older. Even when looking at the minority of studies of age pointing towards the opposite direction, it was easy to detect common methodological details that explained away this difference, such as the sampling strategies or the types of digital skills being measured. Other examples of antecedents where the analysis was less problematic were ‘educational attainment’, ‘learning’ (including learning motivation and learning styles) and ‘leisure activities’, where all studies found a positive correlation.
One of the dilemmas when trying to make an overall assessment of the role of a particular antecedent was that studies originated from very different analytical frameworks. For example, in the case of the antecedent ‘health’, some studies emanated from the digital divide literature (e.g. Helsper and Eynon, 2013) while other research came from addiction studies (e.g. Williams-Diehm et al., 2018). This is in part a product of the fact that the databases searched covered such a broad range of material. In practice, all health studies found a correlation between health and digital skills, but in order to draw conclusions and identify gaps in the research, it was still necessary to consider the different frameworks and conceptualisations, as well as the different aspects of health being studied (e.g. physical vs. mental health, learning disabilities).
Another example showing the difficulty in comparing such diverse studies is provided by the antecedent ‘personality traits’. One study came from the literature on personality tests used by employers and in career guidance, 15 one came from the identity formation literature 16 and a third study came from the literature on online risky activities, looking at personality traits such as self-image and risk perception. 17 Given that all three studies found that certain traits were associated with greater digital skills, it is possible to say that overall personality is an influential antecedent. Yet the actual typologies of traits in these studies were totally different from each other and hence it was difficult to develop any further conclusions, apart from noting reasons for this diversity. In other words, apart from the different frameworks, the challenge for comparative analysis was that different research traditions operationalised ‘personality’ in incompatible ways.
The antecedent ‘gender’ showed how the way in which skills were measured could led to contradictory findings. According to popular beliefs, boys outperform girls in terms of digital skills. While it is true that boys tended to score higher than girls in self-report studies, no important differences emerged when considering studies based on performance test results, where sometimes girls even outperformed boys. This suggests that a social desirability bias may be at work in studies using self-reported skill measures 18 , reflecting broader cultural discourses on what it means to be a boy or a girl in relation to digital technology.
At times it was not the way skills were measured but the way the antecedent was measured that produced differences in results. This can be illustrated by ‘socio-economic status’ (SES). Generally speaking, SES that was measured by parents’ education was statistically significant and positively associated with digital skills, while studies using income as the measure of SES generated mixed results. It is unclear why this difference should exist, but at least by conducting a systematic evidence review and noting this pattern, this raised a question to be addressed in further research.
It was also necessary to focus on the specificities of different studies relating to an antecedent. For example, when looking at the effect of children’s ‘personal interests’ on their digital skills, certain interests, such as an interest in science, predicted digital skills. Meanwhile other interests, like an interest in politics, did not. This shows how variables that were gathered under the same umbrella in the coding process because they seemed conceptually close can actually lead to different and even contrasting results. A similar effect arose from the process of trying to organise ‘other’ studies under the same antecedent headings. For example, the emergent category ‘perceptions and attitudes’ contained very diverse examples. 19 Unsurprisingly, the findings as to whether this antecedent correlated with digital skills were mixed depending on which perceptions and which attitudes were being examined.
A similar need to look at details arose in the case of ‘ethnicity’. While some research indicated white children (in majority-white populations) were more skilled than their non-white peers, other studies found that ethnic minority children reported more skills with respect to some specific areas (such as social entertainment skills, games and social network access) but also in the more ‘critical’ skill of evaluating the credibility of online information. In other words, by focusing on the specific skills it is possible to go beyond asking whether a particular antecedent was influential for digital skills overall, instead asking why some children may develop more of certain digital skills and less of others.
In sum, even before making comparisons between studies it is possible to contextualise the research on different antecedents by exploring the origins of research interests and the geographical location of studies. While some comparisons of studies were then relatively unproblematic, others required additional analysis to take into account the bodies of literature from which they stemmed (and hence their conceptualisations of antecedents), the operationalisation of those antecedents, and the consequences of how both skills and antecedents were measured. Sometimes the insights from this process came from the team’s own judgements based on their reading of the details of how both antecedents and skills were treated in different studies, rather than a direct interpretation of the results of our systematic coding framework, which reported more crudely simply on whether a statistically significant relationship existed.
Overall, when trying to understand the patterns of studies when making so many comparisons, especially in the light of the sheer range of studies returned by this search strategy, a certain amount of reflexivity was required. That diversity of approaches employed in the studies meant that it was possible to go beyond charting which antecedents were important to appreciate a more complex picture. This led to a more nuanced mapping of the field and indicated some lines for future enquiries.
Conclusion
This case study of a particular systematic evidence review provides a worked example. This article elaborates in some detail for a wider audience on the challenges faced and strategies used, in what is itself a research process (Gough et al., 2012). Carr’s fishing metaphor, initially intended to encourage reflection upon historical research, was explored here, and extended, in order to stand back from and reflect up the various steps in this exercise.
Although systematic reviews can vary in their nature (as demonstrated in the collection of Gough et al., 2017b), common guidelines were followed, at each point examining the nature of the judgement being made and its possible consequences. For example, inclusion and exclusion criteria were mainly derived from the goals of the wider ySKILLS project, but included decisions about language and types of publication, as well as re-setting date parameters in the light of data from the exploratory phase. And within the process of selecting the studies to be reviewed, the numerous micro-decisions may well provide some limitations to how much reviews can be reproduced or replicated even when the reasons for all the main choices are made explicit.
The process of setting up the initial coding frame and theme and then the basis upon which it was revised was explained. And the issues that arose when trying to conduct the synthesis were outlined – requiring further reflection on the variation in definitions and measurement of both the main object being studied (digital skills) and variables affecting it. Relevant factors to consider included how the motivations behind the original research affected the map that arose, and how the construction and definition of key variables varied depending on the analytical frameworks used in the original research.
This process of reflection was not meant to imply this particular piece of research was somehow flawed. Within the wider ySKILLS project the sheer detail generated by this review was appreciated. Instead, the goal of this article is more generic, to provide examples of the different decisions that have to be made throughout this type of research process, with an eye to increasing reflexivity about all the steps that lead to a finalised product. By providing specific examples of the types of challenges, judgements required, and strategies employed in this case study, this article highlights some of the factors that teams conducting a systematic review for the first time may wish to consider.
Footnotes
Acknowledgements
We thank our colleagues and our reviewers.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported here has received funding from the European Union’s Horizon 2020 Research & Innovation Programme under Grant Agreement no. 870612.
