A Guide to Research Data Management

What is research data management
ELNs and Research Data Management
An ELN to make your data FAIR
For Research Institutions
ELNs as part of University services for RDM
For Researchers
Why is research data management important for researchers?
Final Remarks

 

What is research data management?

 

Plan and carry out experiments, generate and analyze data, publish: this has been the typical lifecycle of research data for several years. However, in the past decade additional steps have been introduced in order to regulate how data are shared, preserved, accessed and reused. Therefore, the data lifecycle can be divided into seven different phases: Planning, Collecting, Analysing, Publishing, Preserving, Sharing and Reusing. Research Data Management (RDM) is the overarching process that guides researchers throughout the different stages of the data lifecycle, enabling scientists and all other involved stakeholders to make the most out of the generated research data.

With this article, we aim first to explain why RDM is necessary to make the research process more efficient, and why it is needed to maintain data integrity and produce replicable results. Second, we intend to provide an overview of all stakeholders involved in the data lifecycle and highlight how they can benefit from effective RDM. Third, we aim to introduce electronic lab notebooks (ELNs) as an ideal solution for both researchers and institutions to implement robust research data management plans.

 
 
 

ELNs and Research Data Management

 
Data-driven research requires data management plans
 

Since the 80s the production of research has started growing exponentially, marking the beginning of challenges in research data management. The arrival of high throughput technologies throughout the 90s has shifted the emphasis from collecting data to analyzing, storing and managing them. Today, scientists are required to outline a strategy for their data management with an RDM plan. These plans have become mandatory in most grant applications, highlighting the importance of RDM tools, such as ELNs, for successful research programs.

Research data management is essential for strong scientific practice, and several stakeholders are involved in the different stages of the process, from issuing guidelines to putting RDM plans into practice.

Funders and Publishers
 

Transparent, robust and reusable research is a priority for funding bodies, and this is why research data management plans have become mandatory in grant applications. In line with this, many publishers require the submission of raw data along with the manuscript, encourage direct citation of data, and have implemented data citations standards.

National and international organizations
 

Several national and international networks have produced guidelines that lead universities and research institutes in their data driven innovation. They aim to improve research by building the basis for accessible and discoverable data.

Universities
 

Libraries and research data management teams help researchers from the proposal stages to the realization of RDM plans. Universities are meant to provide services, information points and infrastructures to deal with any aspect of the data lifecycle.

Researchers
 

Good management of research data means applying good scientific practices that lead to better and more efficient research. Thanks to accurate RDM plans, researchers achieve reproducible results, minimize the risk of data loss and produce more data which can be cited and re-used.

Electronic Lab Notebooks
 

ELNs are an essential asset for researchers to fulfil any requirements for data management, and they create direct bridges between scientists and stakeholders. By adopting an ELN, the data lifecycle can proceed smoothly and easily: from creating and collecting data digitally in one place to one-click data archiving, ELNs empower researchers by allowing them to implement their RDM plan without effort and time investment.

Repositories
 

Repositories are the storage location for data. They are essential for indexing, storing, archiving, finding and citing data. Repositories can be domain specific or general, international, national, institutional or individual collections.

repositories

All stakeholders seek good management plans to improve research efficiency by making findings accessible. Understandably, funders do not want to multiply their efforts (and grants) to support duplicate projects. RDM plans maximize the outcome of the research funders’ support by making re-usable data available. It is also in the interest of any research institute to deliver reproducible science and high quality publications, as good research means improved rankings and, ultimately, more funds for more research. Last but not least, researchers need solid RDM plans to achieve reliable results and publish more high-impact papers. Electronic lab notebooks allow all players to easily achieve their goals.


 
 

An ELN to make your data FAIR

 

The FAIR principles are the outcome of a joint declaration of a diverse set of stakeholders (including NIH, European agencies, institutions, publishers, scholars, librarians, archivists and research funders), and they define how research data should be: Findable, Accessible, Interoperable and Reusable. By implementing an ELN, scientists can follow all FAIR guidelines without hassle or losing focus on their project.

Findable
 

An ELN allows researchers to store and organize all data in one place and quickly find them. Thanks to regular backups and deletion control, an ELN also avoids the risk of losing any information. As ELNs can be connected to repositories, researchers can submit their data and metadata in the selected archive with just one click, so that datasets can be found by everyone. When depositing data, it is important to choose the right repository, discuss the data licensing and make sure to receive a persistent identifier, such as a DOI, to make data citable.

Accessible
 

ELNs guarantee long term data preservation, easy and controlled data access from anywhere, and export functions to quickly retrieve raw and processed data. On top of this, ELNs safeguard data integrity and offer storage options and backups. Data accessibility also entails that the way to get the data, as well as any legal conditions and/or embargo periods, has to be clear when data are published and deposited. On their part, repositories, which are often integrated with ELNs, have to ensure long term data archiving, and universities must offer resources for data storage and backups.

Interoperable
 

The data must be able to combine with other datasets, for example by defining data formats and common ontologies to be used in each research field. An ELN allows researchers to set standardized ways to collect, annotate, structure and organize findings, making data easy to exchange, understand and use in different contexts.

Reusable
 

Thanks to templates and structured data, ELNs enable researchers to easily establish metadata standards and quickly provide all information needed to re-use their data, such as their provenance (instrument calibration settings, version of the software used and so on).

ELNs ensure that all data generated in a project lifetime fulfil the FAIR principles, meeting funders and publishers’ requirements, and, most importantly, allowing researchers to generate more discoveries and share their knowledge. ELNs are therefore a key solution for universities and research organizations that strive to generate high impact and reproducible research, and are a fundamental service to offer their researchers.

 

For Research Institutions

 

Research data management plans are rapidly becoming the core of successful research projects, and universities face a challenge to provide researchers with the right resources to create RDM plans and put them into action. In fact, a robust RDM plan is not only a good scientific practice, but it is required by most funders. Such plans bring more grants, more research projects, more knowledge to share, and they are essential to complement the mission of research institutions.

Although scientists understand the importance of RDM plans, there is a widespread consensus that there exists a lack of centralized support and incentives. Among all services that universities can offer, ELNs are a powerful tool to meet all needs easily, quickly and efficiently: ELNs support the generation of FAIR data, while reducing the workload of librarians, IT departments and researchers.

ELNs maximize the potential of repositories

Research institutes have in place different strategies to help their researchers archive their data: some universities offer their own repository, whilst others redirect to either general or discipline-specific data repositories. An institutional repository is a good solution to document, report and showcase the research output of a university. Moreover, it can be useful to host niche research that does not find a home in other repositories. It can also overcome potential conflicts in intellectual property policies and data licensing. However, these valuable infrastructures may not reach their full potential, simply because researchers may not have the time to deposit data, or perhaps they find it too difficult. Integrating an ELN with an institutional repository can be the key to unleash its latent capability and maximize its use. An ELN makes the data transfer into a repository hassle-free, encouraging researchers to make the best use of the archive and deposit FAIR data.

From the researchers’ point of view, there are key considerations to keep in mind: everyone has to be able to find their archived data and cite them. Thus, it is important to choose the most relevant repository, and discuss their license options and potential embargo periods. The Digital Curation Centre (DCC) provides a detailed guide on how to select the best repository, and the Royal Society has given useful directions by defining different tiers of repositories, which reflect the scale and reach of the deposited data:

  1. Major international programs e.g. the Worldwide Protein Data bank
  2. National data centres managed by national bodies e.g. DANS in the Netherlands
  3. Institutional repositories curated by individual universities or institutes e.g. ORA-Data of the University of Oxford
  4. Individual collections

Lastly, repositories can be classified in two main categories:

  1. General repositories e.g. Figshare, Zenodo (a joint project between OpenAIRE and CERN), Mandeley data, Dryad
  2. Domain specific repositories e.g. GenBank, UniProtKB

Registries such as re3data or FAIRsharing can help researchers find the most suited and reputable repositories. They provide an extensive list of archives, both general and institutional, as well as filters to narrow down the search. Another interesting project that aims to make data easier to find is the bioCADDIE project, a collaboration between leading publishers and several academic institutions that plans to create a data discovery index and establish clear standards for data archiving.

Archived data must be linked to a permanent identifier, such as a DOI (Digital Object Identifier), that can be referenced in publications. Permanent identifiers allow researchers to cite their own primary datasets and cite someone else’s data, as several publishers mandate in their latest data citation standards (e.g. Elsevier, Springer Nature, PLOS ONE)

 
 
 

ELNs as part of University services for RDM

 

elns and rdm

Universities aim to create and disseminate knowledge, and generating FAIR data is crucial for these scientific breakthroughs. Institutes have to support researchers with their data administration throughout the whole data lifecycle and help them meet both funders’ and publishers’ requirements. Ultimately, good RDM services are important for universities to hit the top positions of university rankings. In fact, ranking principles include, among others, quality of publications, collaborations, and number of citations (see the Leiden Ranking indicators for example).

Today, university libraries offer different supportive measures for RDM, such as info points, assistance to generate RDM plans, guidelines to funders’ requirements and finally institutional repositories. Just to name a few, the digital library of the University of California in the US, the Max Planck Society in Germany and the University of Edinburgh in the UK offer excellent support for RDM. However, delivering all the infrastructures needed for data collection, analysis and storage is not an easy job, and libraries might struggle to provide visibility to datasets and facilitate the re-use of data. Above all, motivating scientists can prove to be a challenge, especially as researchers are short on time and might not see clear and immediate benefits. Luckily, both librarians and researchers alike can find relief by introducing ELNs that help overcome different RDM challenges:

  • Data collection: ELNs allow to keep all different types of data in one place and easily build experimental workflows. It is also possible to connect an ELN to other devices and collect data from different sources.
  • Data analysis & exploration: with an ELN, researchers have a clear overview of their work, making data analysis and interpretation much easier. Moreover, research teams can easily define common guidelines and standards to analyze, organize, date, title and label data, which simplify the access and the re-use of their data. Importantly, ELNs enable scientists to retrieve any experiment within seconds thanks to advanced search functionalities.
  • Data sharing: ELNs improve collaboration between team members and external collaborators, speeding up their research and channelling projects in the right direction. Researchers can define specific share settings for their research projects and easily communicate with their peers by using tools like messages and comments.
  • Data archiving & publishing: As any repository can potentially be integrated with an ELN, researchers can deposit their data and metadata with just one click. This not only ensures that data are preserved in the long term but also allows researchers to easily archive their raw data and fulfill publishers’ requirements for data replicability and reproducibility. Importantly, researchers can publish their research faster if all data are stored in one place and are already in a digital format.
  • Data storage: ELNs guarantee data integrity by retaining data, maintaining security and protecting confidentiality and authenticity. ELNs also offer multiple storage options, such as local or cloud based servers, as well as daily backups.
  •  
     

For Researchers

 
Funders’ requirements
 

Over the past few years, the requirements of funding bodies on research data management have increased and become more stringent, a tendency which is likely to continue in the near future. Funders see that the key to a faster, more efficient, and more productive research process is effective management of generated data. Robust RDM plans lead to reusable and accessible data, which can be solid bases for further discoveries, interpolation with different datasets and new interpretations. Ultimately, RDM is not only crucial for research of higher quality, but it is also a way to maximize both funders’ investments and optimize researchers’ efforts.

The overarching principle is that publicly funded research should generate openly available data, which bring benefits to the entire scientific community and beyond. Each funder has its own policy for RDM, but in general, applicants are required to plan each phase of the data lifecycle, and budget for the cost of their RDM. Some funders, such as the BBSRC, provide sanctions in case of non-compliance. Researchers are expected to plan how and where to store their data, which backup systems to use, the best archive, the data license, and potential embargo periods. Also, to ensure that everyone is able to replicate and reuse any raw data, the authors should state what is needed to validate their results.

Researchers can find updated RDM policies on funders’ websites, but of course, university libraries and specialized centres also provide extensive information. For instance, the Digital Curation Centre (DCC) gives a complete overview of the policies of the main UK and European funding bodies. The DCC also curates, in partnership with the University of California Curation Centre, the DMPonline service which provides templates for RDM plans. Another useful tool is the SHERPA Juliet database, which collects requirements on open access, sharing and archiving from funders worldwide.

Different international and national organizations support universities and research institutes with their research data management. Good examples are the DANS in the Netherlands, the DLCM in Switzerland, the BD2K project of the NIH and the British Royal Society’s Data Programme. Germany has also launched the Priority Initiative to coordinate the digitalization of science, and established the German Council for Scientific Information Infrastructures to support the development of infrastructures for research data management. On an international level, the Research Data Alliance (RDA), organized by European, US and Australian governmental bodies, aims to develop tools to facilitate data-driven research.

Why is research data management important for researchers?

 

Scientists may ask themselves whether RDM plans help their research. There are several reasons why researchers should have no doubts. For instance, RDM plans allow them to:

  • Follow the best research practices
  • Save time for research
  • Avoid risk of data loss
  • Ensure transparency and reproducibility
  • Increase data visibility and number of citations
  • Fulfill funders’ requirements and receive more grants
  • Produce new knowledge and make more discoveries just by re-using data
  • Archive, retrieve and re-use their own data


springer nature

But what about their scientific career? Can RDM plans help? Although there is not yet an author-level metric that takes into account the number of data citations, work is in progress exactly in that direction. To begin with, both the Declaration on Research Assessment (DORA) and the FORCE11 Joint declaration of data citation principles recognize the need to make data citations count towards academic performance. Here are two model principles from the FORCE11 declaration, which has been signed by, among others, the NIH, European agencies, and different institutions and several publishers:

“Data should be considered legitimate, citable products of research”

“Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications”

Publishers are also taking actions and encouraging researchers to cite data. Two years ago Elsevier implemented data citations standards and strongly advocates the citation of primary datasets. Moreover, journals like Scientific Data by Springer Nature and Earth System Science Data by Copernicus Publications specifically focus on the publication of datasets and discoveries based on reused data. The good impact factor of the journals is just one more reason to share data.

Last but not least, it is important to mention that funders cover any costs related to RDM throughout the duration of the projects they fund (see for example the H2020 guidelines on FAIR data management. Alternatively, you can read our blog post on H2020 for a summarized guide).

ELN: an asset to meet funders’ requirements and maximize the benefits of RDM plans

Fulfilling the requirements for RDM in a grant application might seem a challenging task. However, there is an easy solution for this: ELNs. Thanks to features such as tags, filters, advanced search and organized data elements, ELNs offer a ready-to-use structure with which researchers can build a robust RDM proposal. So, when and how can an ELN help?

  • Ensure the generation of findable and re-usable data: With an ELN, scientists can easily tag data and experiments, ensuring to always follow naming conventions and link the right keywords to their findings. Moreover, advanced search functions allow researchers to quickly retrieve any kind of data.
    The outline of metadata is simplified, as a lot of information is automatically recorded when data are generated. For instance, an ELN always records the author of the data, the size of the files, the dates of creation and last modification, and who may access and edit the data. On top of this, any modification is tracked and collected in a full audit trail. Researchers can find an extensive collection of discipline-specific metadata standards in the Metadata Standards Directory.
  • Make data accessible: Archiving data, associated metadata, and documentation is just one click away: researchers can upload any type of file to their lab notebook, which can be directly connected to any repository.
  • Guarantee inter-disciplinary interoperability: ELNs enable researchers to quickly create and re-use templates, ensuring the continuous use of standardized vocabulary and methodologies.
  • Secure research data: ELNs provide state-of-the-art security solutions to guarantee data integrity and security, such as full backups, secure storage and long-term retention of electronic records. Also, ELNs safeguard intellectual property and confidentiality thanks, for instance, to multi-level authentication processes, unique author identification and safe data transfer via data encryption.

 
 

Final Remarks

 

Research Data Management has become an integral part of the research process, and it is pivotal to achieve scientific breakthroughs in a data-driven research.
Researchers, universities, funding bodies and publishers must join forces to encourage and promote the generation of Findable, Accessible, Interoperable and Reusable data in order to translate more discoveries into new solutions.
labfolder’s mission is to simplify the creation of effective RDM plans and enable researchers to easily put them into action for a better, reproducible, transparent and open science.

Scientific writer and Editor at editando

Leave a Reply

Your email address will not be published. Required fields are marked *