Abstract
This study provides insights into the evolution and conceptual framework of research data management (RDM). It also investigates the role of libraries and librarians in offering data management services and the challenges they face in this regard. The study is qualitative in nature and based on an extensive literature review survey. The analysis of the reviewed literature reveals that the idea of RDM has emerged as a new addition to library research support services. The more recent literature clearly established the pivotal role of libraries and librarians in developing and managing RDM services. However, data sharing practices and the development of RDM services in libraries are more prevalent in developed countries. While these trends are still lacking among researchers and libraries in developing countries. Creating awareness among researchers about the benefits of data sharing is a challenging task for libraries. Furthermore, institutional commitment, collaboration, academic engagement, technological infrastructure development, lack of policies, funding, and storage, skills, and competencies required for librarians to offer RDM-based services are some of the other significant challenges highlighted in the literature. Certainly, RDM services are difficult and complicated; therefore, librarians need to master the skills of research data to offer library-based RDM services.
Keywords
Introduction
Researchers in higher education are generating an increasing amount of digital data because of the widespread use of computing technology (Vilar and Zabukovec, 2019). Storing and managing this data has always been a major concern for researchers (Borgman, 2015; Cox and Pinfield, 2014). In the twenty-first century, research is also more data-centric (Tenopir et al., 2011) and is rapidly changing in the context of advancements made in information and communication technologies and technical infrastructure (Whitmire et al., 2015). The growing capability of information and communication technologies to contribute to multidisciplinary research has motivated researchers across the world (Yoon et al., 2019) to find new ways of conducting research (Elsayed and Saleh, 2018). As an outcome of this technological revolution, data has come out to be the most important and dominant output in research work (Castle, 2019; Pryor, 2012). There is a growing emphasis that the data and the outcome of publicly funded research ought to be accessible to everyone to see and use (Tenopir et al., 2011). With the increasing cost of research and the growing trend of team work, appropriate handling and sharing of research data has become extremely important (Popkin, 2019). Moreover, cloud-based storage of data has increased the development and use of research data (Aydinoglu et al., 2017).
Governments, organizations, and institutions worldwide are overwhelmingly recognizing the value of research data management (Flores et al., 2015). Governments around the world enthusiastically support research; however, they are also concerned with ensuring that the data generated through research is properly managed, instantly searchable, and accessible via open access (Yoon and Schultz, 2017). When governments are bearing the major expense of research, then, considering the ethical and privacy issues, the public should have access to all the research (Henty, 2008). Therefore, universities need to change the way they manage the data produced through their research activities, as many governments require that data and research findings should be widely available online for anyone to use and reuse (Chen and Wu, 2017). Any publicly funded research councils, such as the “National Health Institutes” and “National Science Foundation”, currently demand that researchers should provide Data Management Plans (DMP) in their research funding proposals, which must describe how they will manage the data generated by their funded projects. The purpose is to ensure the open availability of research data to others, so that the full potential for reuse is realized and its benefit to wider society is enhanced (Jackson, 2018). Many research funders in the UK also demand from researchers to create a data management plan with details of data sharing and reuse as part of their research proposal (RCUK, 2013).
In response to this evolving research scenario, researchers are looking for support in managing their research data (Wittenberg and Elings, 2017). A decade ago, only a small fraction of data was properly managed and available for use (Cox and Pinfield, 2014). However, with the growth of research data management (RDM), this trend has gradually changed (Cox et al., 2017). With the increasing recognition of data as a stand-alone output of research with the potential to be re-used or re-purposed, academic institutions and libraries in many countries are developing infrastructures and services to assist researchers in better managing their data (Barbrow et al., 2017; Monastersky, 2013; Tenopir et al., 2012). Most of the university libraries in the United States, UK, and Australia are actively engaged with services related to data management (Liu and Ding, 2016). By taking up the challenge of RDM, university libraries will be defined more by their digital collections. These libraries can develop collaborative connections with researchers and thus can perform more dynamic roles within the research process (Childs et al., 2013).
Academic libraries are already well involved in research and scholarly publishing process by providing researchers with the required information resources. This engagement has recently been greatly enhanced with the rapid increase of institutional repositories (Shelly and Jackson, 2018). Many academic libraries are currently making use of their repositories to store a wide range of outputs, including publications, image, sound files, and simulations (Flores et al., 2015). Now that the role of data in the research lifecycle is more recognized, institutional repositories have a more substantial role to play in the data lifecycle (Yoon et al., 2019). It is to understand research data lifecycle in order to comprehend the complete variety of possible RDM services, tools and infrastructure (Tenopir, 2013). The data lifecycle idea is frequently utilized in data management to aid researchers in comprehending the range and significance of data management. In this data centric research scenario, it is significant to investigate the evolution of research data management, its associated services, and to highlight the role of academic libraries and librarians in this regard.
Aim and objectives
This study aims to describe the evolution and conceptual framework of research data management in the scholarly communication process. The explicit objectives of the study are to:
provide an insight into the background, concept, and benefits of research data management to the scholarly community/researchers and society. highlight the role of libraries and librarians in offering research data management services. determine the skills required for librarians to offer research data management services. identify the challenges faced by librarians in developing RDM services.
Study design
This research is qualitative in nature and purely based on an extensive ‘narrative review’ of the relevant literature on the topic of research data management. The narrative review of the literature method was used in this study to better understand and integrate the body of literature on RDM. McGaghie (2015) claims that in such type of literature reviews, the criteria for article inclusion and exclusion is typically based on the reviewers’ judgment. Rowley and Slack (2004) argue that the purpose of narrative review is multifaceted of which an important purpose is to build an understanding about the theoretical concepts and terminology on a particular topic.
Search strategy
The search for related literature was initiated by selecting key terms related to the topic. The terms that were used to search and retrieve relevant documents include ‘research data management’; ‘RDM’; ‘libraries and research data management’; ‘research data management services’; ‘research data management skills’; and ‘library challenges in research data management’. A huge number of documents were found published related to the topic, from which the most relevant were selected by the researchers for the review. Furthermore, because the published literature was vast, the time frame for the literature review was restricted on documents published between 2006 and 2021. Moreover, the documents published earlier to 2006 were considered not very relevant to the objectives of this study; therefore they were excluded in the review. The literature search was conducted from December 15th, 2021 to December 31st, 2021. Online electronic journal databases, including SAGE, Emerald Insight, and Science Direct, were comprehensively searched to retrieve the relevant literature on the topic of research data management. Moreover, the scholarly search engine Google Scholar was also used to search for and retrieve the relevant literature.
Inclusion criteria
Only those documents were selected for the review that was published from 2006 to 2021. Moreover, only documents published in the English language were included. The types of documents selected for review were research articles, conference papers, book chapters, and reports.
Significance of the study
This study has some significant theoretical benefits. First and foremost, it advances the knowledge concerning research data management by highlighting its need, importance, and benefits in the scholarly communication process. The study also gives a clearer picture of the roles, opportunities, and challenges of librarians in relation to RDM services. Data sharing significantly contributes towards the development of the scientific community and to society in general (Melero and Navarro-Molina, 2020). Therefore, in developed countries, most of the research funders and institutions now demand that researchers develop data management plans (Sherpa Juliet Directory, 2019). Accordingly, this study highlights the importance of data sharing in research and its contribution to the development of all fields. Ultimately, the study creates awareness among researchers and LIS professionals about the management of research data. A better understanding of the concept, benefits, and opportunities would enable funders, publishers, institutions, and libraries to design such policies that help them in their research data management. Regardless of the benefits of data sharing and its reuse, librarians face some technical barriers in practicing RDM services. In view of that, this study looks into such barriers too. This study is significant from a similar study by Ashiq et al. (2022) in the sense that it presents a more comprehensive analysis of the concept, evolution, and life cycle of research data. The study of Ashiq was a systematic literature review limited to the analysis of only 19 studies, whereas the analysis of the current study is more in-depth and comprehensive. Moreover, the data of the study of Ashiq et al. (2022) was limited to published studies between the years 2016 and 2020, whereas, the current study covers the timeframe from 2006 to 2021.
Literature review
Research data management: concept and definition
Before defining RDM, it would be pertinent to reflect and comprehend the term “research data”. Research data are “factual recordings (numerical scores, textual records, images, and sounds) used as primary source for a scientific research and sometimes needed to validate the research findings” (OECD, 2007). Data produced as a part of research takes a wide range of formats, including statistics and experimental results to interview recordings and transcripts (Borgman, 2012) stored as physical or digital records on personal computers or terabytes of data on shared servers. A comprehensive definition by Opus: University of Bath Online Publication Store (2013) explained research data as, “those data records, files or other evidences, irrespective of their content or form (e.g., in print, digital, physical or other forms) that comprise a research project's observations, findings, or outcomes, including primary materials and analyzed data. Research data may be created directly by researchers or derived from third parties and enhanced by researchers in the course of their work”. In other words, research data are the data collected, observed or created through a research process while conducting experiments in laboratories, social investigations, field observations, and mining on the internet (Liu and Ding, 2016). “Research data management is the process of organizing, describing, storing, and sharing data. From planning the details of data collection to addressing long-term data plans, RDM can affect reuse and reproducibility” (Vasilevsky et al., 2014). Hence, the term research data management (RDM) refers to the ways in which “the data generated by and used for research work are created, captured, transferred, stored, organized, documented, disseminated, reviewed, published, discovered, re-used, exploited, retained, archived or destroyed according to agreed policies and practice” (Opus : University of Bath Online Publication Store, 2013).
The most commonly cited definitions of RDM in the literature are those provided by Whyte and Tedds (2011) and by Cox and Pinfield (2014). According to Whyte and Tedds (2011), RDM is about “the organization of data, from its entry to the research cycle through to the dissemination and archiving of valuable results” (p. 1). Cox and Pinfield (2014) elaborated further as, “RDM consists of a number of different activities and processes associated with the data lifecycle, involving the design and creation of data, storage, security, preservation, retrieval, sharing, and reuse, all taking into account technical capabilities, ethical considerations, legal issues and governance frameworks” (p. 2). RDM is all about managing research data; embracing all the services, activities, tools and infrastructure to organize, document, store and share data for future re-use. RDM can have an impact on the reuse and reproducibility of research data, from planning the details of data collection to addressing long-term data plans (Vasilevsky et al., 2014).
A typical example used by the UK Data Service is illustrated in Figure 1. The data management requirements discussed, for our example, fall within the first three stages of this lifecycle: collecting and capturing data, processing data from its most basic form to another form for analysis (i.e., extracting numerical measurements of tumour size from an MRI image of a tumour), and analyzing the data so that the results can be distributed as some type of academic output, such a journal article). Data management is necessary for all three of these stages, as previously said, in order to make sure that the researchers record their data collection methods, how they processed and analyzed the data, and how they transitioned the data from raw to process to analyze. If the data is clear, it can be utilized by other researchers to check the reliability of the initial findings or to reanalyze the data in a completely different context or environment.

Research data lifecycle adapted from UK data service model 2019. Source: Queensland University of Technology, Skills.
The research data life cycle is also critical to understand the librarian's responsibilities in the research data management services. The University of Queensland designed an info-graphic model that highlighted the essential elements in a data life cycle (See Figure 1). The following steps shown in Figure 1 can be used to divide up the first and foremost planning, then the processes of creation or collecting, processing, analysis, preservation, discovery, and re-use. The first step is to make a plan for the RDM, then in the collection or development phase is to gather information in the form of materials for any background research and literature reviews and to create any necessary materials. The collected/compiled materials will be read, chosen, and reported on during the processing phase. The next step will be the analysis phase, at which time questions will be raised and conclusions drawn from the data. The format, location, and duration of the data's availability will all be determined during the preservation phase. The availability of the data, who may access it, where, how, and for how long will be described throughout the discovery phase. The re-use phase is necessary to ensure that future researchers will have access to any data for their own research.
In the context of data management practices, the concept of “FAIR data principles” has also been a useful term unanimously introduced by different stakeholders in research, including representatives from academia, industry, funding organizations, and scholarly publishers. These principles serve as a framework for individuals looking to improve the reusability of their data holdings. The FAIR Principles place a special emphasis on improving machines’ abilities to automatically identify and use the data, as well as on encouraging individuals’ reuse of it (Wilkinson et al., 2016).
RDM services
Tenopir et al. (2012) define RDM services as “the services which address the full data lifecycle, including the data management plan, digital curation (selection, preservation, maintenance, and archiving), metadata creation, and conversion” (p. 70). Fearon et al. (2013) elaborated further by defining RDM services as “providing information, consulting, training or active involvement in data management planning, data management guidance during research (e.g., advice on data storage or file security), research documentation and metadata, research data sharing and curation (selection, preservation, archiving, citation) of completed projects and published data” (p. 12). Both of the definitions describe RDM services through a series of activities associated with the data life cycle. RDM is regarded as a set of general activities of developing policy, services and infrastructure to manage research data (Cox et al., 2014; Koltay, 2017). To offer a wide array of RDM services around the data lifecycle, coordination and collaboration among various stakeholders, including libraries, IT services, legal advisors, research support offices, and the research community is essential (Andrikopoulou et al., 2021; Corrall, 2012; Cox and Pinfield, 2014; Fearon et al., 2013; Tang and Hu, 2019). Research data services (RDS) are another parallel term used in the literature for RDM services. Koltay (2017) considers RDM services as highly complexed involving RDM, data curation, stewardship, and governance, data literacy, data quality and standardized data citation which require direct or indirect involvement of the library.
A number of frameworks or process models of RDS have been developed to enrich the conceptualization of RDM. A pyramid model comprising nine areas of RDM activities was proposed by Lewis (2010). The role of libraries in influencing national policy was at the apex of the pyramid, followed by leading on local/institutional policy, curation capacity and identification of required skills were at the second tier. Other five activities include developing LIS professionals confidence with data, embedding data handling into research curricula, data literacy, data advice and data awareness among researchers were positioned at the third tier. Later on, Corrall (2012) added a new foundational layer to the pyramid named “data collection development and access management”, mirroring part of the data lifecycle.
Another model developed by Jones et al. (2013) outlined the components of an RDM service (See Figure 2). They believe that in order to facilitate efficient data management and exchange, a comprehensive strategy and set of services are required.

Components of research data management support services. (Source: Jones et al., 2013: 5).
Drawing upon this model, Whyte (2014) presented a process pathway model illustrating the steps of developing RDM services within an institution. Context, principles, inputs, outputs, and outcomes were the high level factors in Whyte's model. Pinfield et al. (2014) developed a library-oriented model of RDM services based on four feature factors including what (components), why (drivers), how (influencing factors), and who (stakeholders). Funders’ mandates, security concerns, open-access arguments, data storage, preservation and sharing were identified as drivers, whereas demand, roles, resources, acceptance, and communications were the influencing factors. Institutional units such as the library, IT services, academic departments, senior university managers, legal office, research support services, and researchers from various disciplines are listed as stakeholders. This model is aligned with previous models discussed earlier. However, it effectively addresses the complexity of the underlying drivers and influencing factors that might affect the development of particular services and how they might look like in a given institution. The RDM maturity model by Cox et al. (2017) is another notable model identifying the absence and existence level of services in academic libraries on a continuum from 0 to 3, specifying Level 0 as “none,” Level 1 as “basic,” Level 2 as “developing,” and Level 3 as “extensive.” Federer (2016) discussed how libraries role in RDS can be framed within the model of the research data lifecycle. He further explained libraries’ role and support in writing DMPs, data reuse, data visualization and data sharing.
In order to support the research scholars another useful model was developed by (Curdt, 2019) that is seen in (Figure 3). In this model an entire research life cycle has been supported through various research data management services.

Research data management services model (Curdt, 2019).
Role of libraries in RDM
The advancements in technology and data management have revolutionized library services (Qin, 2018). Accordingly, libraries too have a particular interest in the management of data with reference to the needs of their users (Frederick and Frederick, 2016). The need for research data services is pushing libraries to think how they can offer services to manage the research data life cycle (Yu, 2017). Considering these factors, many libraries have designed RDM services to support their researchers (Vasilevsky et al., 2014) and this trend is increasing in academic and research libraries around the world (Johnson et al., 2015). The role of academic libraries substantially support RDM is well-documented (Corrall, 2012; Cox and Pinfield, 2014; Tenopir et al., 2012). By taking a leadership role, libraries have now emerged as a well-positioned, key strategic player in offering RDM services (Cox et al., 2017) due to its clear linkages with library motives (i.e., open access) and existing library services, (i.e., information literacy and reference services) (Corrall, 2012). Libraries, with their individual institutional settings, have been contributing to RDM in various different ways. As early as in 2008, recognizing the emergence of data-intensive research environment, Swan and Brown (2008) pinpointed the following three major roles of libraries in RDM:
to generate awareness among researchers regarding the value of data to provide data archiving and preservation services to introduce a new position in libraries as data librarian
Tenopir et al. (2014) divided RDM services into two broad categories; information or consultancy/advisory services and technical services. Information and advisory services consist of providing advice on data management plans, metadata standards, finding and citing data sets, and providing web guides and finding aids for data sets. Whereas, technical services include the provision of technical support for data repositories, preparing data sets for storage and creating metadata for data sets. Lyon (2012) developed a 10 staged research life model cycle and maps potential role of libraries into (i) RDM requirements gathering, (ii) RDM planning, (iii) RDM informatics, i.e., technical advice on metadata and data formats, (iv) citation of research data, (v) RDM training, (vi) licensing, (vii) appraisal, (viii) storage (ix) access and (x) impact of research data. Cox et al. (2012) provided a simple version of all these services by categorizing them into policy and advocacy; support and training; auditing and data repository. Koltay (2017) believes that libraries can play an instrumental role in data literacy education, data curation and stewardship. Erway (2013) highlights, for example, the role of libraries in providing training to researchers in retrieving, documenting, storing, and sharing research data. A number of empirical studies covering multiple institutions in the developed world highlighted the role of libraries in the development of RDS (for instance, Corrall et al., 2013; Cox and Pinfield, 2014; Cox et al., 2016; Pinfield et al., 2014; Tenopir et al., 2012, 2013, 2014, 2015). Studies published up to 2016 mentioned limited and basic type of RDS available in libraries. However, recent studies (Cox et al., 2017, 2019; Shelly and Jackson, 2018; Tang and Hu, 2019; Yu, 2017) reported more development in each area, going beyond supporting in data management plans, still focus was on advisory and consultancy service (such as RDM plans, data literacy and training), rather than technical services (such as metadata creation, and curation of active data). Cox and Pinfield (2014) carried out a survey to identify the RDM services currently being offered in UK academic libraries. The findings of their study revealed that libraries were offering a limited number of RDM services including ‘raising open access to data and RDM policy issues’, ‘advice on copyright issues’, ‘data citing’ and ‘awareness of reusable data sources’. Another international survey of RDM activities, services, and capabilities reports that libraries have taken the lead in RDM in countries including Australia, Canada, Germany, Ireland, the Netherlands, New Zealand, and the UK, notably in terms of campaigning and policy formulation (Cox et al., 2017). On the other hand, studies including, Ashiq et al. (2022); Mohammed and Ibrahim (2019); Tripathi et al. (2017) report that countries of the developing world still lack in RDM policies formulation and practicing RDM services.
Latham (2017) while defining the role of libraries in RDM also indicated that initially, libraries relied on consultancy services, which were aligned with familiar library offerings and settings. By far, most of the libraries are less involved in the provision of technical services and more in informational RDM services. The possible reason for this scenario is that informational services are an offshoot of traditional library reference services, which require little enhancement in skill set and can be offered with existing library settings and available staff.
Librarians’ role in data management
Library professionals are searching for new roles in helping and collaborating with researchers and the RDM services are a part of this development (Riley, 2015). It is important to note that these needs are not fulfilled simply by providing a platform for sharing data (Borgman, 2019). The whole lifecycle of data, from generation to analysis, documentation, metadata, copyright, etc. must be properly managed so that data is understandable and usable before researchers can open their data and fulfill any data sharing requirements (Monastersky, 2013). Although libraries are well positioned to offer RDM services (Cox and Pinfield, 2014), the development and launching of such services requires investment in new positions and training of the library professionals (Barbrow et al., 2017). Data librarian (Koltay, 2019), embedded librarian (Auckland, 2012; McCluskey, 2013), data management librarian (Xia and Li, 2015; Xia & Wang, 2014), research informationist (Federer, 2016) are some of the labels reported in the literature to describe the new emerging roles for librarians in the realm of RDM. Tenopir (2013) has highlighted the expected role of library professionals in the research data lifecycle (See Figure 4).

Role of library professionals in research data lifecycle (Tenopir, 2013).
Li et al. (2013) identified the primary duties of librarians for RDM; (i) offering consultation and reference services for scientific research and data curation, (ii) making inquiries on data curation requirements of researchers, (iii) providing users with instruction and training on scientific data curation. Put simply, librarians’ role is mainly about developing and designing services around the whole research cycle from planning, curation, metadata creation, conversion, sharing and re-using of data. Tang and Hu (2019) also believe that librarians’ role is embedded in the process of research data lifecycle as consultants and trainers. Likewise, Brochu and Burns (2019) affirm that librarians are key stakeholders in RDM as supporters and educators due to their traditional role in collecting, organizing, storing, preserving, and facilitating free access to information.
Librarians being well-versed with data discovery, re-use, collection and management in all formats can assist and advise researchers on RDM planning thus, streamlining data care from the beginning of the research cycle. Librarians can provide awareness, information and training on a variety of research tools. Data literacy and standardizing the forms of data citation are the other roles for librarians defined by Koltay (2016). Moreover, they can provide expert guidance on data management and preservation (Federer, 2013).
Core skills to offer RDM services
Offering RDM services are extremely difficult and complicated as they include management of research data, its curation, stewardship, governance, literacy, quality and standardization of data citation (Sesartić et al., 2016). Tenopir et al. (2015) asserted that librarians need to master these research data competencies by themselves. There is a general agreement in the literature that a wide range of personal, interpersonal, managerial and technological skills are required for offering RDM services (Cox and Pinfield, 2014; Cox et al., 2019; Kennan, 2016). Commentators are of the view that librarians’ existing knowledge and skills in information management, cataloguing, information literacy and references services is relevant and aligned with their new roles for RDM services (Brochu and Burns, 2019; Cox and Pinfield, 2014; Koltay, 2016; 2019; Tang and Hu, 2019), however it will be challenging to translate them to RDM context (such as metadata creation and good house-keeping). Hence, the call for need to up-skill or re-skill librarians’ existing capacities was made by Auckland (2012), Cox and Pinfield (2014), Koltay (2016, 2019), Lyon and Brenner (2015). To offer RDM services, interpersonal skills, also labeled as soft skills (Andrikopoulou et al., 2021) which include strategy, relationship management, leadership abilities, advisory, communication and influencing skills are also required (Matteson et al., 2016; Pinfield et al., 2014, 2017). Hard skills consist of technological skills (Andrikopoulou et al., 2021) are typically associated with repository management, data preservation, data curation and metadata creation skills (Kennan, 2016; Tenopir et al., 2015). An international survey by Cox et al. (2019) identified that academic librarians should have data curation skills, technical and ICT skills (data storage, infrastructure and architecture), research skills (data analysis and visualization), data description and documentation, legal, policy and advisory skills (intellectual property rights, ethics and licensing) to offer RDM services in the libraries. While discussing skill set Cox and Pinfield (2014) concluded that RDM is an evolving phenomenon which implicit the need of constant skill development according to the emerging roles, in itself is a significant challenge.
Challenges in developing RDM services
Offering RDM services is highly complex and challenging. Prior literature has identified a number of organizational, infrastructural, financial and behavioral challenges. The complexities and scale of RDM issues in institutions are challenging. Corrall (2012) noted the scale of offering RDM services, considering university libraries as one of the many stakeholders who must work together to address a challenge in terms of infrastructure, skills, and culture change. Yu (2017) categorized challenges of RDS provision in terms of institutional commitment, collaboration, academic engagement, technology infrastructure, and lack of policies, finance, and staff knowledge. Earlier, Soehner et al. (2010) labeled these challenges as tangible and social. As RDM programs in many libraries are considered externally imposed due to the regulatory mandate, a lack of institutional commitment and support is present. Hence, resources, infrastructure, governance, policy institutional systems are still in transition in the institutions (Cox and Pinfield, 2014; Cox et al., 2019). The volume (amount of data), velocity (the speed of data generation), variety (diversity and complexities of data formats), and veracity (reliability and integrity) of data itself present a significant challenge to offer RDM services (Clement et al., 2017; Corti et al., 2019; Federer, 2013; Koltay, 2016; Perrier et al., 2018). Moreover, because of the variety of research types and practices and marked disciplinary differences in data practices complicate the data management practices (Cox and Pinfield, 2014; Cox et al., 2019; Whitmire et al., 2015).
Collaboration and partnership are also big challenge due to “the diverse range of stakeholders involved” and their varied level of “expectations, preferences and limitations” (Pryor et al., 2014). Development of coordinated, cohesive and integrated collaboration and partnership among stakeholders from institutional (e.g., library, IT services, academic departments, research support services, etc.) and across institutional units (e.g., data centers, open access repositories, other institutions) is quite challenging and complicated due to lack of support from senior management and uncertainty about the roles and duties of partners (Cox and Pinfield, 2014; Cox et al., 2019; Fearon et al., 2013; Pryor, 2014; Tang and Hu, 2019; Tenopir et al., 2015; Yu, 2017). With regard to technical aspect of RDM services, provision of repository, access and discovery systems, preparing datasets to be added to a repository, creating or transforming metadata required skill set, sophisticated technological infrastructure and sustainable funding (Cox, et al., 2017; Tenopir et al., 2015; Yu, 2017). Financing scalable storage is another challenge frequently discussed in the literature (Cox and Pinfield, 2014; Tang and Hu, 2019; Yu, 2017). Moreover, ensuring data security, data quality and data sharing is a serious challenge due to technological barriers, high cost and considerable amount of time and effort required on the part of researchers and librarians (Koltay, 2017). Palumbo et al. (2015) also presented technical, legal and ethical issues involved for the deposit, acceptance and sharing of data. Mattern et al. (2015) explored data-related issues such as data access restrictions, a lack of familiarity with best practices for managing research data, and the requirement for researchers and librarians to have a deeper grasp of post-publication impact.
With regard to library staff's expertise to offer RDS, up-skilling or re-skilling is a notable challenge (Tang and Hu, 2019). Given that, librarians are inexperienced in managing research data (Barbrow et al., 2017), have less personal research experience, and their existing roles are demanding (Cox and Pinfield, 2014; Koltay, 2016, 2017); challenges associated with skills development to offer RDM services exacerbated. Cox and Pinfield (2014) are of the view that even translating librarians’ traditional skills of information management in the context of RDM is a quite challenging. In addition to, lack of IT technical expertise, lack of domain-specific expertise and limited personal research experience is a serious challenge (Cox and Pinfield, 2014; Fearon et al., 2013; Tenopir et al., 2015). In addition, librarians need to be well-aware of the researchers’ data needs (Kim and Stanton, 2016). There is a general understanding that librarians have to gain new skills and deeper understanding of research data lifecycle to remain relevant in the realm of RDM (Cox et al., 2019; MacMillan, 2015; Yu, 2017). In addition to, considering the perspective of the researchers is very important. The biggest challenge is to change the researchers’ misconception about the role and skills of librarians (Tang and Hu, 2019). According to Surkis and Read (2015), researchers perceive that “librarians do not understand research data and have no role in managing data”. Moreover creating awareness among researcher about the importance of data sharing is also a challenging task (Cox et al., 2019; Yu, 2017). Lack of researchers’ willingness to engage in the process complicates the issue (Fearon et al., 2013). This non-involvement is attributed to either a lack of the recognition of the need for RDM or a lack of awareness about the importance and worth of RDM (Cox et al., 2019). Tenopir et al. (2015) expressed other reasons such as unawareness of the RDM services availability in the libraries, low perceived value of the service and resistance to data sharing.
Results and discussion
This article has highlighted the evolution and benefits of RDM and data sharing. Moreover, libraries and librarians role, the required skills and challenges they face in this regard are also a part of the discussion. As a result of the changing nature of libraries and librarians’ roles, the area of RDM is evolving with new challenges and opportunities (Brochu and Burns, 2019). The practice of good data management is becoming an essential requirement for any organization (Jackson, 2018). Therefore, it is not surprising that RDM related activities have been reported as among the top trends in libraries consecutively in 2012, 2014 and 2016 (ACRL, 2016). The review of literature demonstrates that a good range of studies have been conducted in many countries to discuss the evolution of RDM and its various aspects, such as research data lifecycle, its management, associated services, the role of libraries and needed skills on the part of librarians. Some of the findings of these studies conducted by leading authors in RDM perspective are discussed in this article.
The benefits of data sharing demonstrate the importance of RDM in the research process. Fecher et al. (2017) enlisted the following benefits, associated with research data sharing; these include (1) enabling statistical replications and meta analyses (transparency), (2) enabling new research and preventing duplications (efficiency), and (3) enabling reuse in other context, e.g., industry (innovation). RDM is a key to enable these benefits by lowering transaction and opportunity costs. The role of libraries and librarians in RDM has also been extensively discussed in the literature. Librarians are partners in research teams who can actively participate throughout the lifecycle of the research data. Literature has supported the importance of libraries in providing research data management services by using their experience in information and knowledge management. In the literature, there is growing certainty and unanimity regarding the crucial function of libraries and librarians in RDM services. There is a huge discussion in the literature to determine whether research data management represents another incremental step in professional practice or a true paradigm shift in collection development and service delivery. Some scholars have attempted to examine connections between RDM and established library roles and responsibilities (Corrall, 2012; Cox and Pinfield, 2014; Tenopir et al., 2013) while other thinks that fundamental rethinking of roles, responsibilities and expectations are necessitated to offer well-planned and consistent RDM services. In addition to, commenters have also questioned whether library staff members possess the knowledge and abilities required to carry out the recommended roles and responsibilities. However, the authors of this study are of the view that RDM has created new roles for librarians and libraries to come up with some new services for the researchers. These roles are data centric and revolve around assistance in data collection, curation, and its preservation and reuse. Corrall (2012) also highlighted the fundamental need for rethinking roles, responsibilities, and competencies in order to create “next-generation librarianship,” However, she believed in drawing upon the experiences and perspectives of active and experienced practitioners in the field to offer RDM services. Koltay (2016) suggested that besides developing new expertise and making use of old ones, libraries should reallocate and adjust relevant positions for the best use of their existing staff.
With the inception of RDM idea, libraries began to play a role however; the nature and extent of that role remained unclear and uncertain due to varied stakeholders involved. Later on, the literature clearly established that libraries are well-situated to be a key player. There is a sizable professional debate concerning the precise nature of libraries’ participation with RDM, the degree to which libraries may assume a leadership position, the kinds of services that must be offered, and the degree to which infrastructure need be in place. A significant problem is how much libraries can contribute to and shape overall institutional RDM policy. Evidence suggests that libraries are leading institutional RDM projects, particularly those involving policy creation (Cox et al., 2017; Cox and Pinfield, 2014; Pinfield et al., 2014; Whyte, 2014). Librarian's role is not limited to the preservation of data or providing a platform to exchange data with others but they have a crucial role in the whole research data lifecycle. They can offer services in the data lifecycle process and guide the users in planning and managing their data for long term and reuse purposes. Witt (2008) argues that the free flow of information and ideas data curation is similar with classification, description cataloguing and metadata as well as experience in selecting, deselecting and presenting information in an appropriate context. Therefore, librarians have the potential to engage in RDM related services. To offer RDM services, a potpourri of interpersonal, research and technical skills are required on the part of library staff.
The literature also discussed a range of significant organizational, infrastructural, financial and behavioral challenges associated with RDM services. A number of scholars have argued that RDM services are highly complex and challenging for libraries. These include institutional commitment, collaboration, academic engagement, technology infrastructure, a lack of policies and financing. Then the volume of data, its velocity, diversity, and veracity (reliability and integrity) of data itself present a significant challenge to offer RDM services (Clement et al., 2017; Corti et al., 2019; Federer, 2013; Koltay, 2019; Perrier et al., 2018; Yu, 2017). Moreover, disciplinary differences in data practices complicate the data management practices. Collaboration and partnership with other departments including (IT services, academic departments, and researchers) are also a big challenge. With regard to technical aspect of RDM services, technological infrastructure, its funding, high cost and amount of time and effort required on the part of researchers and librarians are also some other major challenges (Pryor, 2014). Moreover, librarians are inexperienced and not skilled in managing research data and to offer RDM services. Last but not the least, creating awareness among researcher about the importance of data sharing is also a challenging task for librarians. All of these aspects pose a great challenge for librarians especially those from the developing countries. Moreover, the authors believed that behavioral challenges are more critical than infrastructural and financial ones. Behavioral challenges are present at personal (librarians & researchers awareness about RDM and willingness to participate) and organizational level (creating a supportive environment) and libraries looking to create data services will need to handle these issues a both levels (Tenopir, 2013: 17). Carroll (2012) argues that libraries that have not yet taken part in the research data agenda should consider their options and decide where they can devote their time and energy while still benefiting their unique institution and the local and national contexts. It is advised to collaborate with other campus organizations to decide where to start, particularly computing/technology services, research offices, and those in charge of research governance.
Conclusion
This study concludes that research data management has emerged as a new addition to library research support services. However, data sharing practices and the development of RDM services in libraries are more prevalent in the developed countries, while these trends are still lacking among researchers and libraries in the developing countries. In the context of libraries, some opportunities and challenges are also associated with developing RDM services. Libraries have the opportunity to improve their existing research support services and enhance their cooperation with other stakeholders in the institution, including with researchers, academic departments, research offices, and IT departments. On the other hand, some challenges are also associated with these services including creating awareness among researchers about the benefits of data sharing is a challenging task for libraries. Moreover, institutional commitment, collaboration, academic engagement, technological infrastructure development, lack of policies and funding, time, data security, skills and competencies required for librarians to offer RDM-based services are some of other significant challenges highlighted in the literature. However, none of these challenges are so severe that they cannot be addressed. In the developed countries, funders, institutions, and journals increasingly demand researchers for data sharing and the submission of data management plans. Although the scenario in developing countries is quite different from the developed world and developing countries are lacking in policy-making, funding, collaboration, skills, and infrastructure development for research data management and data sharing. Nevertheless, librarians in developing countries need to comprehend that they will come across with data-related demands in the near future. Therefore, they should increase their skills and expand cooperation with other stakeholders in research to design policies for data sharing and the induction of RDM services. Libraries can take the lead by up-skilling their professionals and introducing basic RDM services, including consultancy in data management planning, data processing and its analysis, data description guidance, and preservation of data through the development of data repositories, etc.
This study remained limited to reviewing literature related to some specific objectives, including reviewing the background, concept, and benefits of research data management to the scholarly community/researchers and society, highlighting the role of libraries and librarians in offering research data management services, determining the skills required for librarians to offer research data management services, and identifying the challenges faced by librarians in developing RDM services. Moreover, the study was limited to documents published from 2006 and 2021. Therefore, more studies can be conducted to explore other related areas of the RDM, including data sharing practices among researchers, RDM infrastructures, research data privacy issues, etc.
