Abstract
In the era of big data, research data plays a crucial role in assessing a nation's scientific and technological innovation. Effective research data management (RDM) ensures data quality, accessibility and reusability. Although progress has been made in data sharing and platform development, RDM practices in China are still under-researched. The aim of this study is to (i) identify the key areas within the RDM policies of Chinese research institutions and (ii) investigate the specific practices that support the implementation of RDM. A total of 44 RDM-related policies from research institutions listed in re3data were analyzed and semi-structured interviews were conducted with staff from 18 research data repositories to investigate the current RDM landscape in China. The results reveal four key RDM areas: IT infrastructure, metadata management, support services and financial support, covering 18 specific practice elements. Practical examples are provided and a comprehensive RDM framework is proposed, offering actionable insights to improve management strategies, refine policies and align Chinese RDM practices with international standards and public expectations.
Introduction
Research data is the cornerstone of national scientific and technological innovation, serving as a crucial resource in the big data era (Gupta and Rani, 2018). Globally, the academic community has increasingly emphasized the importance of effective research data management (RDM) and sharing. Leading funding agencies, such as the Arts and Humanities Research Council (AHRC), Biotechnology and Biological Sciences Research Council (BBSRC), and Engineering and Physical Sciences Research Council (EPSRC), now mandate the inclusion of detailed data management plans (DMPs) in research proposals. Prominent data centers, including the Finnish Social Science Data Archive, the Australian National Data Service (ANDS), and the UK Data Service, play a critical role in ensuring the long-term preservation and accessibility of research data. Internationally, organizations like the Digital Curation Center (DCC), DataOne, and OpenAIRE have established comprehensive frameworks and guidelines that advance global RDM standards.
China's open access and RDM practices can be traced back to the early 2000s, starting with the Ministry of Science and Technology's support for the establishment of 13 national platforms for sharing scientific data in key disciplines, including agriculture, forestry, seismology, meteorology, oceanography, earth systems, and public health. In 2015, the Action Plan for Promoting the Development of Big Data officially elevated the development of scientific data to a national strategic priority, further establishing research data as a core component of the national innovation system (The State Council, 2015). In 2018, the Measures for the Administration of Scientific Data were issued by the State Council, marking the institutionalization of RDM at the national level (The General Office of the State Council, 2018). This policy systematically outlines the requirements for managing the entire life cycle—from data collection to organization, storage, sharing, and reuse—and establishes the principle of “openness as the standard, confidentiality as the exception” (Si et al., 2023). to improve the efficiency of data use and the transparency of research. During the same period, national research data infrastructures, including the National Science and Technology Infrastructure Platform (https://en.most.gov.cn/programmes), the National Scientific Data Centers (https://www.csdata.org), and the Science Data Bank (https://www.scidb.cn/en), have been continuously developed to support policy implementation and technical infrastructure. While research on RDM in China is developing rapidly, its predominantly focused on higher education institutions, often neglecting key research institutes such as the Chinese Academy of Sciences, the Chinese Academy of Agricultural Sciences, and the Chinese Academy of Forestry. These institutes are instrumental in China's open access movement, having established national scientific data centers and sharing platforms that play a vital role in promoting data accessibility and infrastructure development (Zhang et al., 2021).
Despite recent advancements, significant challenges persist in RDM globally, including inconsistent management processes (Cox et al., 2019), mismatches between service supply and demand, and researchers’ reluctance to share data (Joo and Peters, 2019). These challenges highlight the need to examine RDM practices across diverse countries and contexts. Studies like Fu et al. (2022) investigation into RDM needs at Central Washington University, Martin-Melon et al. (2023) analysis of data services at Spanish universities, and Huang et al. (2020) review of RDM policies at Chinese universities, underscore the importance of context-specific RDM solutions.
To address this research gap, this study utilizes the re3data platform, a global registry of research data repositories, to analyze RDM practices within Chinese research institutions. Developed by the German Research Center for Geosciences, Karlsruhe Institute of Technology (KIT) Library, and Purdue University Libraries, re3data serves as a comprehensive resource for examining RDM infrastructure. As of September 10, 2024, the registry lists 3262 repositories worldwide, including 86 in China. These repositories, hosted by universities, research institutes, government agencies, and specialized data centers, collectively represent a detailed view of China's research data ecosystem. Based on this dataset, the study investigates key areas of RDM within Chinese research institutions and identifies the specific practices that support effective RDM implementation. The research addresses two core questions:
What are the key areas outlined in China's RDM policies for research institutions? What specific practices within these key areas support RDM implementation in China's research institutions?
Literature review
Research data management practices
Research data management (RDM) plays a critical role in ensuring the quality, accessibility and reusability of scientific data (Donner, 2022). While the importance of RDM is increasingly recognized globally, its implementation varies significantly across regions (Lee et al., 2024). Most developing countries lack comprehensive RDM policies (Mohammed and Ibrahim, 2019), in contrast to most developed countries where formal policies are already in place (Martin-Melon et al., 2023). The prevalence of DMPs across full research cycles remains low (Sinha and Sinha, 2023) and sharing raw data has encountered complex technical and ethical challenges (Joo and Peters, 2019). Surveys indicate that despite researchers sharing data through publications and conferences, sharing of raw data remains limited (Elsayed and Saleh, 2018).
The implementation of RDM faces various challenges, including data storage, copyright issues, limited organizational support, lack of specialists, financial constraints, and issues of data misinterpretation and loss (Marlina et al., 2022). Particularly, shortages of personnel and skills have become major constraints for research support services (Ashiq et al., 2020), while rapid technological changes pose financial challenges for managing software, hardware and other technological facilities (Perrier and Barnes, 2018). To address these challenges, researchers suggest the active involvement of leadership and donors (Donner, 2022) and the provision of essential consulting, training, and technical support, especially in data analysis, security, long-term preservation, and the development of institutional repositories (Öztemiz and Şahin, 2024).
In-depth studies by several scholars have assessed the state of RDM in China from various perspectives. Ran et al. (2021) reassessed the implementation of RDM in Chinese academia by reviewing government documents and funding agency policies, noting that strategic and policy ambiguities have led to institutional disruptions in RDM, with a notable imbalance in RDM services and resource allocation in Chinese universities. Huang et al. (2020) highlighted that local-level services in Chinese universities are still in their infancy, primarily due to inadequate infrastructure at the national level, low professionalization of librarians, and weak resonance with the concept of open science. Si et al. (2023) analyzed China's scientific data management policies from the perspectives of supply, environment and demand, noting the uneven application of policy instruments, particularly in the operation, maintenance and evaluation phases.
Research data management framework
In today's research environment, the complexity of research conditions and the rapid growth of data volumes pose many challenges to the scientific community, such as heterogeneous data processing, cross-institutional collaboration and the management of distributed facilities. To address these challenges, scholars and experts around the world have been developing Research Data Management (RDM) maturity models to meet today's needs.
In the United States, the team led by Crowston and Qin (2010) synthesized extensive literature and models, integrating the traditional Capability Maturity Model (Paulk et al., 1991), to develop a “Capability Maturity Model for Research Data Management”, which aims to improve the efficiency and reliability of RDM through a systematic approach. In the UK, the DCC developed the “Research Infrastructure Self-Evaluation Framework” (DCC, 2017), using its research data service model combined with extensive stakeholder feedback to highlight the importance of assessing and improving data management practices.
Following the Tri-Agency RDM Policy, Canada's Digital Research Alliance adopted an approach like the RISE framework to create the “RDM Maturity Assessment Model in Canada” (Fry et al., 2023), focusing on policy compliance and inter-institutional coordination. In addition, the Australian Research Data Commons led a collaboration with 25 universities to develop the Research Data Management Framework for Australian research institutions, based on extensive research and stakeholder consultation, with the aim of establishing a consistent and efficient data management system.
Methodology
This study adopts a two-phase qualitative approach to explore RDM practices in Chinese research institutions. It integrates a content analysis of institutional policy documents with semi-structured interviews to provide a comprehensive and contextualized understanding of RDM implementation.
Content analysis
The first phase of the study focused on analyzing RDM-related policies from Chinese research institutions listed in the re3data directory to identify key areas of the RDM landscape. As of 1 October 2024, 86 repositories from China were listed in re3data, of which 78 had publicly policy documents. Documents were included based on the following criteria: (a) issued by a Chinese institution listed in re3data; (b) contained explicit references to RDM activities or requirements; and (c) publicly accessible in either Chinese or English. After applying these criteria and removing duplicates, 44 unique policy documents (coded P1-P44) were selected for analysis (see Appendix 1).
The selected documents were imported into NVivo 12 for systematic, line-by-line inductive coding to identify recurring themes, terminology and requirements in key RDM process areas. To ensure analytical reliability, a subset of the documents was independently coded by a second researcher, achieving an inter-coder agreement rate of 79.6%. Discrepancies were resolved through discussion and consensus. Finalized coding results were exported to Excel for frequency analysis and prominence of various RDM key areas.
To guide the interpretation of stage-specific provisions, this study referred to the research data lifecycle model developed by the University of Reading (University of Reading, 2010). The model defines seven core stages includes:Plan, Collect, Process, Analyses, Preserve, Share, and Reuse, which reflect the complete trajectory of research data from inception to reuse. Appendix 2 presents a descriptive overview of the 86 Chinese RDM repositories listed in re3data, categorized by repository type, institutional responsibility, organizational affiliation, subject coverage, and provider role. Most are affiliated with national-level research institutes or government agencies, with limited representation from universities. Subject coverage is concentrated in the life and natural sciences, while engineering, humanities, and social sciences are notably underrepresented. This overview provides contextual grounding for the institutional landscape from which the analyzed policies emerged.
Semi-structured interviews
The second phase involved semi-structured interviews with staff affiliated with Chinese RDM repositories. Participants were selected through purposive sampling. Eligibility criteria included: (a) a minimum of five years of experience in data management, (b) familiarity with national or institutional RDM policies, and (c) willingness to participate. Initial contact was made via email, and informed consent was obtained prior to participation. Interviews were conducted between November and December 2024, either online or in person depending on participant availability. Each interview lasted 40–60 min and was audio-recorded with permission and transcribed manually. Transcripts were returned to participants for review and clarification to ensure accuracy and credibility.
A semi-structured interview guide was developed to balance probing flexibility with consistent thematic coverage. Interview questions focused on institutional practices, perceived challenges, and infrastructure support. The full list of interview questions is provided in Appendix 3. Interview data were analyzed thematically using NVivo 12, following the six-phase process proposed by Braun and Clarke (2006), including familiarisation, initial coding, theme identification, theme review, definition, and reporting. The overall analytical process is visualized in Figure 1.

Thematic analysis workflow for interview data (Braun and Clarke, 2006).
In total, 18 participants were interviewed, selected for their expertise across a range of disciplines and institutional contexts. As shown in Table 1, participants represented various types of repositories—disciplinary, institutional, and others—mirroring the national repository distribution recorded in re3data. Their roles included repository managers, system administrators, and department heads. While managerial participants addressed policy implementation and resource allocation, technical staff provided insights into operational workflows and infrastructure. Participants brought multidisciplinary perspectives, with academic backgrounds spanning life sciences, engineering, and information management. Over half had more than ten years of professional experience, offering strategic insights into policy and technological developments, while others contributed grounded, practice-oriented perspectives based on daily repository operations.
Demographics information of participants.
*Note: Table codes are based on institution types - DS for Disciplinary, IN for Institutional, OT for Other.
Results
The research findings focus on two research questions: the key areas of research data management (RDM) in China's research institutions and the specific practices for its effective implementation. The themes were identified through thematic coding of policy documents and interview data, offering an in-depth analysis and interpretation of RDM practices.
Key areas outlined in RDM policies
An analysis of China's RDM policy documents reveals four critical areas: IT infrastructure, metadata management, support services, and financial support. IT infrastructure, mentioned in 31 documents (70.5%), underscores the need for a robust technical foundation to support effective RDM implementation, including data storage, processing, and security. Metadata management, addressed in 29 documents (65.9%), highlights the importance of standardized practices to ensure data integrity, discoverability, and interoperability across systems (Cox and Pinfield, 2014). Support services, featured in 27 documents (61.4%), focus on providing researchers with technical assistance, training, and consultation to enhance their RDM capabilities. Financial support, referenced in 24 documents (54.6%), emphasizes the necessity of sustained funding to maintain infrastructure, ensure staffing, and support the continuity of RDM efforts. Table 2 provides a detailed overview of these areas, illustrating their distribution across the analyzed policy documents. Additional distributional details for each key area across the policy documents are provided in Appendix 4.
Key areas outlined in RDM policies.
IT infrastructure
In the RDM process, IT infrastructure serves as a foundational key area for maintaining data security, stability, and long-term accessibility throughout the data lifecycle. Policy documents provide comprehensive guidelines addressing all phases of this lifecycle—from submission and storage to sharing and security—emphasizing consistent technical support and robust protective measures.
During the data collection phase, IT infrastructure must support stable transmission channels and reliable technical systems to maintain data accuracy and security. The Management Measures for the National Science and Technology Resource Sharing Service Platform mandates that “scientific data centers or organizations responsible for research data management should employ modern information and network technologies to ensure efficient and secure data collection, processing, submission, integration, secure storage, and management” (P30). Similarly, the Documents for GSA highlight the importance of standardized, secure submissions, requiring that “data must be submitted through the unified BIG Sub portal for standardized GSA data submissions” (P19).
In the data preservation phase, policy guidelines mandate that IT infrastructure includes dedicated data repositories and essential facilities, such as secure servers and backup systems, to safeguard data integrity and security. For example, the National Scientific Data Management Measures specifies that “institutions must establish a scientific data preservation system with adequate storage, management, and security facilities to ensure data integrity” (P9). Additionally, the Interim Measures for the Management of the National Marine Science Data Sharing Service Platform requires that “each sub-center establish an independent storage system to enhance storage security and stability” (P35).
During the data sharing and reuse phase, IT infrastructure must ensure continuous availability and support diverse retrieval methods to facilitate seamless access. The Regulations on the Management of Physical Specimens and Data of Rocks, Minerals, and Fossils requires platforms to “provide round-the-clock open access and offer various retrieval services to meet user needs” (P22). Additionally, the Interim Regulations on the Management of CVH Data Sharing specifies that “the sharing system should employ DiGIR or TapirLink protocols, allowing each sub-library to select an appropriate system while the central library's technical team offers support” (P7).
Furthermore, policy documents emphasize the necessity of a comprehensive cybersecurity framework to protect against tampering, data breaches, and cyberattacks, ensuring data integrity and fostering user trust (P11). This comprehensive approach underscores the critical role of IT infrastructure in supporting secure, efficient, and accessible RDM practices.
Metadata management
Metadata management is a critical area of RDM, ensuring that research data remains discoverable, identifiable, and accessible throughout its lifecycle (Bossaller and Million, 2023). Policy documents outline detailed requirements for metadata management across various phases, including data submission, encoding and classification, maintenance and updating, and quality assurance. By establishing standardized metadata practices, these policies enhance data integrity, accessibility, and compliance with regulatory standards.
During the data collection phase, metadata management mandates comprehensive dataset descriptions to ensure traceability and completeness. The Global Change Research Data Publishing and Repository Data Sharing Policy requires that “metadata include essential information, such as data producers, associated departments, and relevant funding projects, to help users understand the context and origins of the data” (P37). Similarly, the China Polar Data Management Measures specifies that “polar data submissions must include various metadata elements, such as sample origin, analysis data, and documentation in both physical and electronic formats, ensuring that the dataset can be accurately interpreted and utilized by future users” (P32).
In the data processing phase, policies emphasize adherence to international and national metadata standards to ensure data compatibility and interoperability across platforms. The China National GeneBank Database User Guide stipulates that “metadata requirements must align with global standards, such as those set by the International Nucleotide Sequence Database Collaboration (INSDC) and the Global Genome Biodiversity Network (GGBN), enabling seamless data sharing and compatibility across systems” (P39). Likewise, the Regulations on the Management of Agricultural Crop Germplasm Resources mandates “a uniform coding system that prohibits modifications to national identifiers, thereby ensuring consistent and standardized data representation” (P2).
During the data preservation phase, policies stress the importance of regularly updating metadata to reflect changes in data content or structure, preserving dataset accuracy and relevance. For instance, the Regulations on the Management and Sharing of Data in the National Ecosystem Observation and Research Network requires that “metadata be revised to include new attributes, such as time-series data and data quality annotations, thereby supporting the long-term usability and integrity of the data” (P27). Metadata management adheres to rigorous standards to verify dataset reliability, accuracy, and regulatory compliance. The Chinese Academy of Sciences Data Management and Sharing Measures mandates that “each dataset be assigned a standardized national identifier and comprehensive metadata descriptions, enhancing data discoverability and ensuring alignment with national standards” (P11).
Support services
Support services are indispensable in RDM, offering technical assistance and procedural safeguards throughout data collection, processing, sharing, and analysis. Policy documents emphasize three core aspects of support services—consultation, training, and data analysis—to ensure accuracy, compliance, and scientific rigor in data management practices.
Consultation Services provide essential guidance and compliance support, ensuring standards and regulations are upheld across the RDM process. The Charter of the China Survey Data Repository specifies that “expert consultation services are provided to data contributors, ensuring compliance with management requirements during submission and processing” (P12). Similarly, the NSSDC Measures for Data Management and Open Sharing state that “consultation services assist data submitters in organizing and standardizing data submissions to meet regulatory requirements” (P34).
Training Services systematically equip researchers with the skills needed to adhere to data processing and storage standards, while enhancing proficiency with data management tools. The Guidelines for Clinical Trial Data Management Techniques highlight that “professional training in data management should include, but is not limited to: departmental SOPs and policies; documentation and archiving rules for standardized clinical trial data; application and operational skills for data management systems and related software; regulatory and industry standards such as GCP, CFDA regulations, and ICH guidelines; and training on confidentiality, privacy, and data security” (P31).
Data Analysis Services are recognized as vital for maximizing the usability and research value of datasets. By providing analytical tools and services, these platforms enable researchers to effectively utilize data in scientific research. For example, the Fudan University Social Science Data Platform User Guide notes that “the platform offers online data analysis functions, enabling users to directly process and analyze data files, including those in SPSS format” (P29). Likewise, the Regulations on the Management of the National Aquatic Organism Germplasm Resource Repository emphasize that “the platform provides access to commonly used data processing software, along with guidance on obtaining and using these tools; where possible, online data processing services are also available” (P36).
Financial support
Financial support is fundamental to the sustainability of RDM activities, ensuring that practices remain consistent, effective, and resilient over time. Policy documents highlight two key components of financial investment—infrastructure development and human resource allocations critical for maintaining the stability and efficiency of RDM processes.
In terms of infrastructure development, policies underscore the need for essential facilities that secure data integrity and accessibility. This includes comprehensive resources for data storage, management, and security systems. The Scientific Data Center (Network) Operations Management Regulations mandate that “data centers must have independent workspaces and machine rooms, including at least 100 square meters of dedicated space, necessary networking equipment, database and web servers, large-scale data storage facilities, and other essential infrastructure. Backup mechanisms should be in place for critical applications” (P8).
For human resource allocation, policy documents emphasize the importance of establishing a dedicated data management team to support RDM activities effectively. The Management and Open Sharing Measures for Scientific Data by the Chinese Academy of Sciences stipulate the necessity to “build an institutional scientific data talent system, establish specific RDM-related roles, and implement evaluation and promotion criteria tailored to scientific data management personnel” (P11). This ensures that qualified professionals are available to manage, curate, and secure data throughout their lifecycle, strengthening the overall reliability and effectiveness of RDM practices.
Practice elements in RDM implementation
Building on the key areas outlined in RDM policies, an analysis of semi-structured interviews identified 18 practice elements (Table 3) that facilitate RDM implementation in Chinese research institutions. Spanning various phases of the RDM process, these elements provide a comprehensive perspective for understanding the RDM landscape in China.
Practice element for implementing RDM.
IT infrastructure practice elements
This area encompasses a range of practical elements from access authentication and high-performance computing to data storage solutions, which serve as the foundation for processing, storing, and safeguarding research data. Additionally, regular updates and maintenance of IT infrastructure are vital to adapting to rapidly evolving technologies and the growing demand for data. Each key practice element is defined and exemplified below to highlight its significance and effectiveness in supporting RDM efforts.
Access control system
The primary objective of access control systems is to ensure data security in compliance with the Cybersecurity Law of the People's Republic of China (Leibküchler, 2018). Institutions employ multi-layer security verification and differentiate user access management to safeguard data during transmission and storage. As one department head explained: “For non-public data, we use an OAuth 2.0 authentication scheme for authorized user access, along with privacy and anonymous link access modes to enhance security and control” (OT2). These practices collectively strengthen institutional security frameworks, facilitating controlled data sharing while ensuring robust protection of sensitive research data.
High performance computing
High Performance Computing (HPC) systems are indispensable for processing large datasets and performing complex scientific computations, particularly in fields such as climate simulation and genomic research. A technical representative elaborated: “We use virtualization for data storage and processing, which not only enhances system reliability but also provides flexibility, allowing our IT infrastructure to adapt to evolving scientific research needs” (DS2). These systems ensure scalability, enabling institutions to address emerging scientific challenges effectively and support multidisciplinary research collaborations.
Active storage system
Active storage systems play a critical role in real-time data analysis, especially for data-intensive research such as genomic sequencing. By enabling rapid data access and immediate processing, these systems significantly enhance research efficiency. A department head described: “Our current data platform offers comprehensive storage and processing services, supporting extensive data interaction and real-time processing capabilities for instant access and analysis of research data” (DS7). This infrastructure ensures seamless workflows, promotes collaboration across research teams, and accommodates the increasing demands of contemporary scientific research.
Collaborative platforms
Collaborative platforms enhance institutional data sharing and interdisciplinary research by providing secure, real-time data exchange and online analytical capabilities. A department head noted: “Our platform leverages the real-time advantages of the internet, supporting extensive data exchange and online analysis while facilitating global knowledge sharing through efficient information-sharing mechanisms” (IN2). By eliminating geographical and temporal barriers, these platforms accelerate research processes and improve the quality of scientific discoveries.
Data repository
Data repositories are essential for preserving, managing, and sharing research data, ensuring data integrity, accessibility, and long-term storage (Francke et al., 2017). These centralized platforms facilitate efficient organization, retrieval, and distribution of research data. One participant emphasized: “Our repository uses a centralized data management system that flexibly supports multiple data types, offering various access levels to users, which effectively promotes data sharing and use” (DS5). By integrating advanced metadata standards, repositories meet institutional policies and funding agency mandates, fostering transparency, collaboration, and open science initiatives.
Sensitive data repositories
Sensitive data repositories require stringent security measures, including secure access controls, encryption, and tiered sharing mechanisms. A participant explained: “In our system, we handle sensitive data by implementing tiered access permissions and encryption to ensure data security and privacy throughout its lifecycle” (DS8). These repositories ensure compliance with privacy regulations while maintaining the confidentiality and integrity of sensitive data. Another manager added: “My role involves handling sensitive data indirectly, relying on analysis results to inform decision-making and research, minimizing direct interaction while safeguarding confidentiality” (IN1).
Archival storage system
Archival storage systems ensure the long-term preservation, security, and accessibility of research data. A participant described: “Our platform is specifically designed to protect and archive critical research data, supporting daily use while ensuring secure long-term storage and backups compliant with current and future regulations” (DS3). These systems incorporate features such as regular integrity checks, format migration, and secure backups to address risks of data degradation and technological obsolescence, supporting sustainable research efforts.
RDM-related software management
RDM-related software management involves maintaining and updating software tools essential for RDM activities. A system administrator remarked: “To maintain functionality and security, we conduct comprehensive reviews of data management software bi-monthly, ensuring compliance with the latest data protection regulations and encouraging team members to use updated versions” (DS12). Regular reviews improve software reliability, compatibility, and security, ensuring consistency across data processing workflows.
Metadata management practice elements
This area focuses on managing metadata within research institutions, which serves as foundational descriptive information for improving the discoverability, interoperability, and reusability of research data. Effective metadata management is pivotal for efficient data handling and encompasses metadata standards, creation and maintenance, and quality management. These elements collectively enhance data usability and research transparency.
Metadata standards
Metadata standards establish a unified framework for ensuring data consistency and interoperability (Haslhofer and Klas, 2010). Many institutions adopt globally recognized standards, such as ISO and Dublin Core, to facilitate international data sharing. A data administrator noted: “We follow the international ISO metadata standards to manage our research data, ensuring our processes align with global norms” (DS8). Another institution demonstrated flexibility by tailoring metadata templates: “We have developed multiple metadata upload templates to meet diverse research needs, allowing researchers to select and populate the appropriate metadata fields based on specific requirements” (DS13). These practices promote consistency and interoperability, aligning institutional workflows with international best practices.
Metadata creation and maintenance
Accurate and up-to-date metadata is vital for ensuring the long-term usability and applicability of research data. Institutions often leverage standardized data description frameworks, such as the Data Documentation Initiative (DDI), to customize and expand metadata according to specific research needs. One librarian highlighted their institution's approach: “We utilize the advanced metadata management features of the Dataverse platform, and we adjust and expand these standards as needed to meet specific research demands” (IN3). By incorporating flexible yet robust frameworks, institutions ensure metadata remains relevant and adaptable to changing research contexts.
Metadata quality management
Strict quality control measures are critical for maintaining metadata accuracy and reliability. Institutions implement systematic processes to monitor and correct metadata, ensuring consistency and interoperability across platforms and research domains. A repository librarian described: “We follow the Dublin Core standards for managing and uploading data. Most data, derived from our institute's observational activities, are organized and tagged according to predetermined categories before being uploaded to our data management system” (DS9). These measures ensure seamless integration of metadata across research projects, enhancing data discoverability, sharing, and reuse.
Support services practice elements
This area provides comprehensive support for researchers across various phases of data management, including general training, the development of DMPs, data curation, and data analysis. These services ensure compliance with research data standards, promote the long-term sustainability of data, and enhance the efficiency of data utilization. Collectively, they improve the quality of research outputs and amplify their societal impact.
General RDM services
This service enhances researchers’ foundational knowledge and skills in research data management (RDM), enabling them to adopt effective strategies for data handling and compliance. One participant stated: “We regularly organize training sessions and lectures on fundamental data management principles. Researchers also have access to one-on-one consultation services via online systems, email, or face-to-face interactions” (DS4). Another respondent highlighted tailored tools to aid compliance: “We developed journal-specific data policy tools for research teams, helping them understand and adhere to data sharing and publication policies. Our support includes a comprehensive system that offers daily consultation and troubleshooting through phone and email” (OT2).
Data management plan services
Data Management Plan (DMP) services assist researchers in designing and implementing DMPs that meet funding agency requirements and ensure adherence to data management standards throughout the research lifecycle. One participant remarked: “While our training covers data management broadly, we do not follow the strict DMP frameworks seen in Western countries like Europe and the US, primarily due to a lack of personnel with specialized expertise in DMP services” (DS11). Another respondent noted incremental improvements: “Although we do not label our services as ‘Data Management Plans,’ we provide related training and are incorporating best practices from other countries to enhance our offerings” (IN3).
Data curation services
Data curation services offer specialized support to ensure the usability, quality, and long-term value of research data. One participant shared: “We have assembled a dedicated data management team to oversee precise organization, description, preservation, and sharing of research data. Regular workshops and training courses are conducted to discuss effective organization techniques and the application of metadata standard” (IN1). Data curation also includes cleaning and formatting raw data to meet institutional or disciplinary standards. Another participant explained: “Our team reviews datasets for completeness and quality before they are archived, ensuring compliance with institutional policies and funding agency mandates” (DS7). These initiatives enhance data usability across disciplines and align with broader open science goals, supporting data sharing and reuse in diverse research contexts.
Data analytics services
This service focuses on strengthening researchers’ technical competencies in data management and analysis, enabling them to efficiently process complex datasets. One participant noted: “We have incorporated professional software like UserSnap to enhance our training and consultancy services. This helps researchers effectively use data management tools and resolve technical issues via real-time problem capture and feedback mechanisms” (DS8). In addition to software integration, this service includes workshops on statistical modeling, visualization techniques, and analytical workflows. Another respondent elaborated: “Our team provides hands-on workshops on advanced analysis techniques, teaching researchers to use tools like R, Python, and Tableau to derive actionable insights from datasets” (DS5). By equipping researchers with advanced technical skills, these services improve the quality and impact of research outputs, enabling more informed and data-driven decision-making.
Financial support practice elements
This area addresses the financial requirements critical to effective RDM, focusing on infrastructure, personnel, and long-term sustainability. It encompasses three key elements: resource allocation, cost-benefit assessment, and financial sustainability planning. Together, these elements ensure that data management activities receive consistent financial support, enabling their long-term operation and development.
Resource allocation
Resource allocation focuses on distributing financial and material resources to ensure the proper functioning of data management infrastructure and personnel. One participant noted: “We invest funds to maintain, upgrade, and replace infrastructure to support data management activities” (OT1). Another added: “Annually, we allocate specific funds for information technology, which are used for the construction and maintenance of infrastructure and software systems, particularly for building, maintaining, and enhancing data platforms” (DS5). These examples highlight the importance of structured financial planning to support the foundational needs of RDM.
Cost-benefit assessment
This practice helps decision-makers evaluate the effectiveness of resource allocation by comparing costs against anticipated benefits. One participant observed: “Despite practical challenges, we consider incorporating cost-benefit analysis into our financial planning for data management” (IN1). Another participant remarked: “While the effectiveness of data management is recognized, cost-benefit analysis has not yet been systematically integrated into financial planning” (DS6). These reflections illustrate both the barriers and the potential for integrating cost-benefit assessment into RDM financial strategies, enabling institutions to optimize resource allocation.
Plans for financial sustainability
This element emphasizes the importance of long-term financial strategies to ensure the resilience of data management activities against short-term fluctuations and their adaptability to future technological advancements and institutional needs. One participant noted: “Due to a lack of sustained financial support, data management activities are limited to maintaining operations. There is a need for top-down policy support to ensure these activities receive continuous and sufficient financial resources” (DS13). Another participant highlighted the institutional challenges: “Without a well-defined financial plan, it is challenging to expand infrastructure or provide training for data managers, which hinders progress toward achieving long-term sustainability” (IN2). These insights underscore the necessity of coordinated policy and financial planning to bridge resource gaps. By addressing these challenges, institutions can invest in robust data infrastructure, staff training, and innovative practices that ensure the sustainability and long-term impact of RDM activities.
Discussion
This study systematically investigates RDM practices in Chinese research institutions, identifying key areas through policy analysis and uncovering critical practice elements based on interview findings. These key areas and practical elements are integrated into a conceptual framework, as shown in Figure 2. While progress has been made, the study reveals persistent challenges in existing practices, emphasizing the need for improved strategies to enhance data management, research quality, and efficiency. These findings provide a comprehensive overview of RDM practices in China and underscore the importance of addressing these challenges to align with global standards and evolving research demands.

RDM practices framework for China's research institutions.
Using the Canadian RDM Maturity Assessment Model and the Australian Research Data Management Framework for Institutions as reference frameworks, this study explores RDM practices in Chinese research institutions. In IT infrastructure, China's practices align closely with the Canadian model, with eight identified practice elements demonstrating high maturity. Specifically, China's Scientific Data Management Regulations mandate that “research institutes, universities, and enterprises establish and maintain scientific data management systems with adequate hardware and software. Additionally, legal entities must comply with national cybersecurity regulations, constructing comprehensive security systems that include data control, attribute management, identity verification, and protection” (The General Office of the State Council, 2018).
For metadata management, like the Australian framework, which emphasizes the role of metadata standards in interoperability and discoverability, this study identifies three key elements based on respondents’ experiences: metadata standards, creation and maintenance, and quality management. These findings are consistent with Si et al. (2023), who analyzed 209 Chinese scientific data management policies and emphasized operational standards and timely data updates as critical components of metadata management.
In the area of support services, this research covered generic RDM services, DMP services, and data curation services, while placing particular emphasis on data analysis services. This was in response to the strong demand for data analysis services among researchers in chemical data management as revealed by Chen and Wu (2017). In the area of financial support, the study identified three practice elements essential throughout the research project lifecycle: resource allocation, cost-benefit assessment, and plans for financial sustainability. These detailed practical elements provide clearer implementation guidelines and are particularly adapted to the current funding pressures and urgent needs of Chinese research institutions.
However, significant challenges remain in implementing these practices. Common issues include a lack of professional capacity and training, sustainability of funding support, and researcher engagement in RDM (Chawinga and Zinn, 2019). These challenges are not unique to China; similar findings have been reported in Indonesia (Marlina et al., 2022) and Jordan (Hamad et al., 2019). This study also uniquely highlights data management risks, a rarely discussed issue in Chinese research practice, as a primary challenge. Data management risks span policy processes, infrastructure, and data security. Many institutions have insufficiently developed data management policies, particularly in intellectual property protection and sensitive data handling.
As one staff member noted, “Our policies are vague on intellectual property and sensitive data handling, requiring extra caution to prevent compliance violations” (DS9). Furthermore, few institutions have adequate facilities for storing and protecting sensitive data, which poses risks to research integrity and reliability. Data security issues, such as privacy breaches, data loss, technical failures, and cyber-attacks, are particularly concerning. Another respondent stated, “Despite deploying the latest firewalls and encryption technologies, we remain concerned about defending against sophisticated cyber-attacks” (IN1). Addressing these challenges requires comprehensive improvements in policymaking, resource allocation, and technological infrastructure to ensure effective RDM and data security. These challenges are summarized visually in Figure 3.

Challenges in Implementing RDM Practices.
To address these challenges, the study proposes specific recommendations for improving current RDM practices:
Policy Development and Refinement: It is recommended to further refine and adjust policies related to data management, particularly concerning data regulation, access control, intellectual property protection, data sharing agreements, and cross-border data flows. Clear guidelines should be established for handling data of varying sensitivity levels, clarifying responsibilities, and enhancing transparency in scientific data usage. Infrastructure for Sensitive Data: High-standard security technologies and hardware, such as security suites and advanced physical security measures, are recommended for building sensitive data infrastructure. Regular technical maintenance and security updates should be conducted to address new threats and vulnerabilities, ensuring data integrity and security. Professional Training and Development: A comprehensive incentive mechanism is suggested to enhance the motivation and professional growth of data management personnel. This includes career development planning, continuous education opportunities, performance rewards, and research support to encourage team collaboration and innovation, thereby improving the institutions’ overall data management capabilities and efficiency.
Although this study does not explicitly assess institutional maturity levels, the fragmented development of policies, infrastructures, and support services observed across institutions reflects the limited and uneven impact of national-level RDM policies. This highlights the potential value of adopting a capability maturity perspective. Maturity frameworks like the Capability Maturity Model (Figure 4) could provide structured benchmarks to guide future evaluations. Building upon the findings of this study, future research could explore the development of contextualized maturity evaluation tools tailored to the Chinese RDM environment, thereby advancing strategic planning and capacity-building efforts.

Capability maturity model (Paulk et al., 1991).
Conclusion
This study investigates RDM practices in China's research institutions, identifying four key areas outlined in RDM-related policies. Through interviews with 18 data repository staff, the study defines 18 practice elements for RDM implementation, bridging theoretical policy directives with practical case insights. Collectively, the findings provide a comprehensive view of the current RDM landscape in Chinese institutions, serving as a valuable reference for those still navigating RDM implementation challenges. The study offers both theoretical perspectives and actionable insights, particularly for institutions in the early stages of RDM development.
However, this study is limited by its broad scope, which, while capturing the complexity of RDM policies and procedures, may also limit the depth of analysis. In addition, the sample does not cover all 86 research institutions in China, and the study does not examine the influence of policy on the implementation of RDM practices. Future research should aim to include a larger and more representative sample and explore specific policy dimensions to understand their impact on RDM practices better. Expanding the research will facilitate the development of more effective data management policies and practices and enable institutions in China and beyond to address RDM challenges more efficiently, improve data management, and ultimately increase research outputs. These findings will provide valuable insights for policy development and contribute to the ongoing development of RDM practices in different research contexts.
Footnotes
Acknowledgements
The authors would like to express their sincere gratitude to the supervisors and colleagues who provided invaluable guidance and support throughout this study. We would also like to extend our heartfelt appreciation to the participants who generously contributed their time and shared their experiences. This study would not have been possible without their willingness to engage and their valuable contributions.
Credit authorship contribution statement
Ye Yuan: Writing – Original draft, Investigation, Formal analysis. A. Noorhidawati: Writing – review & editing, Validation, Supervision, Methodology, Conceptualization. A.M.K. Yanti Idaya: Writing – review & editing, Validation, Supervision.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval and informed consent statements
This study was reviewed and approved by the Research Ethics Committee of the Universiti Malaya (Approval No: UM.TNC2/UMREC_3008). All participants provided informed consent prior to participation, and their confidentiality and anonymity were strictly maintained in accordance with ethical guidelines.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
