What is research data management
ELNs and Research Data Management
An ELN to make your data FAIR
For Research Institutions
ELNs as part of University services for RDM
Why is research data management important for researchers?
Plan and carry out experiments, generate and analyze data, publish: this has been the typical lifecycle of research data for several years. However, in the past decade additional steps have been introduced in order to regulate how data are shared, preserved, accessed and reused. Therefore, the data lifecycle can be divided into seven different phases: Planning, Collecting, Analysing, Publishing, Preserving, Sharing and Reusing. Research Data Management (RDM) is the overarching process that guides researchers throughout the different stages of the data lifecycle, enabling scientists and all other involved stakeholders to make the most out of the generated research data.
With this article, we aim first to explain why RDM is necessary to make the research process more efficient, and why it is needed to maintain data integrity and produce replicable results. Second, we intend to provide an overview of all stakeholders involved in the data lifecycle and highlight how they can benefit from effective RDM. Third, we aim to introduce electronic lab notebooks (ELNs) as an ideal solution for both researchers and institutions to implement robust research data management plans.
Data-driven research requires data management plans
Since the 80s the production of research has started growing exponentially, marking the beginning of challenges in research data management. The arrival of high throughput technologies throughout the 90s has shifted the emphasis from collecting data to analyzing, storing and managing them. Today, scientists are required to outline a strategy for their data management with an RDM plan. These plans have become mandatory in most grant applications, highlighting the importance of RDM tools, such as ELNs, for successful research programs.
Research data management is essential for strong scientific practice, and several stakeholders are involved in the different stages of the process, from issuing guidelines to putting RDM plans into practice.
Funders and Publishers
Transparent, robust and reusable research is a priority for funding bodies, and this is why research data management plans have become mandatory in grant applications. In line with this, many publishers require the submission of raw data along with the manuscript, encourage direct citation of data, and have implemented data citations standards.
National and international organizations
Several national and international networks have produced guidelines that lead universities and research institutes in their data driven innovation. They aim to improve research by building the basis for accessible and discoverable data.
Libraries and research data management teams help researchers from the proposal stages to the realization of RDM plans. Universities are meant to provide services, information points and infrastructures to deal with any aspect of the data lifecycle.
Good management of research data means applying good scientific practices that lead to better and more efficient research. Thanks to accurate RDM plans, researchers achieve reproducible results, minimize the risk of data loss and produce more data which can be cited and re-used.
Electronic Lab Notebooks
ELNs are an essential asset for researchers to fulfil any requirements for data management, and they create direct bridges between scientists and stakeholders. By adopting an ELN, the data lifecycle can proceed smoothly and easily: from creating and collecting data digitally in one place to one-click data archiving, ELNs empower researchers by allowing them to implement their RDM plan without effort and time investment.
Repositories are the storage location for data. They are essential for indexing, storing, archiving, finding and citing data. Repositories can be domain specific or general, international, national, institutional or individual collections.
All stakeholders seek good management plans to improve research efficiency by making findings accessible. Understandably, funders do not want to multiply their efforts (and grants) to support duplicate projects. RDM plans maximize the outcome of the research funders’ support by making re-usable data available. It is also in the interest of any research institute to deliver reproducible science and high quality publications, as good research means improved rankings and, ultimately, more funds for more research. Last but not least, researchers need solid RDM plans to achieve reliable results and publish more high-impact papers. Electronic lab notebooks allow all players to easily achieve their goals.
The FAIR principles are the outcome of a joint declaration of a diverse set of stakeholders (including NIH, European agencies, institutions, publishers, scholars, librarians, archivists and research funders), and they define how research data should be: Findable, Accessible, Interoperable and Reusable. By implementing an ELN, scientists can follow all FAIR guidelines without hassle or losing focus on their project.
An ELN allows researchers to store and organize all data in one place and quickly find them. Thanks to regular backups and deletion control, an ELN also avoids the risk of losing any information. As ELNs can be connected to repositories, researchers can submit their data and metadata in the selected archive with just one click, so that datasets can be found by everyone. When depositing data, it is important to choose the right repository, discuss the data licensing and make sure to receive a persistent identifier, such as a DOI, to make data citable.
ELNs guarantee long term data preservation, easy and controlled data access from anywhere, and export functions to quickly retrieve raw and processed data. On top of this, ELNs safeguard data integrity and offer storage options and backups. Data accessibility also entails that the way to get the data, as well as any legal conditions and/or embargo periods, has to be clear when data are published and deposited. On their part, repositories, which are often integrated with ELNs, have to ensure long term data archiving, and universities must offer resources for data storage and backups.
The data must be able to combine with other datasets, for example by defining data formats and common ontologies to be used in each research field. An ELN allows researchers to set standardized ways to collect, annotate, structure and organize findings, making data easy to exchange, understand and use in different contexts.
Thanks to templates and structured data, ELNs enable researchers to easily establish metadata standards and quickly provide all information needed to re-use their data, such as their provenance (instrument calibration settings, version of the software used and so on).
ELNs ensure that all data generated in a project lifetime fulfil the FAIR principles, meeting funders and publishers’ requirements, and, most importantly, allowing researchers to generate more discoveries and share their knowledge. ELNs are therefore a key solution for universities and research organizations that strive to generate high impact and reproducible research, and are a fundamental service to offer their researchers.
Research data management plans are rapidly becoming the core of successful research projects, and universities face a challenge to provide researchers with the right resources to create RDM plans and put them into action. In fact, a robust RDM plan is not only a good scientific practice, but it is required by most funders. Such plans bring more grants, more research projects, more knowledge to share, and they are essential to complement the mission of research institutions.
Although scientists understand the importance of RDM plans, there is a widespread consensus that there exists a lack of centralized support and incentives. Among all services that universities can offer, ELNs are a powerful tool to meet all needs easily, quickly and efficiently: ELNs support the generation of FAIR data, while reducing the workload of librarians, IT departments and researchers.
ELNs maximize the potential of repositories
Research institutes have in place different strategies to help their researchers archive their data: some universities offer their own repository, whilst others redirect to either general or discipline-specific data repositories. An institutional repository is a good solution to document, report and showcase the research output of a university. Moreover, it can be useful to host niche research that does not find a home in other repositories. It can also overcome potential conflicts in intellectual property policies and data licensing. However, these valuable infrastructures may not reach their full potential, simply because researchers may not have the time to deposit data, or perhaps they find it too difficult. Integrating an ELN with an institutional repository can be the key to unleash its latent capability and maximize its use. An ELN makes the data transfer into a repository hassle-free, encouraging researchers to make the best use of the archive and deposit FAIR data.
From the researchers’ point of view, there are key considerations to keep in mind: everyone has to be able to find their archived data and cite them. Thus, it is important to choose the most relevant repository, and discuss their license options and potential embargo periods. The Digital Curation Centre (DCC) provides a detailed guide on how to select the best repository, and the Royal Society has given useful directions by defining different tiers of repositories, which reflect the scale and reach of the deposited data:
<li”text-align: justify;”>Individual collections
Lastly, repositories can be classified in two main categories:
<li”text-align: justify;”>Domain specific repositories e.g. GenBank, UniProtKB
Registries such as re3data or FAIRsharing can help researchers find the most suited and reputable repositories. They provide an extensive list of archives, both general and institutional, as well as filters to narrow down the search. Another interesting project that aims to make data easier to find is the bioCADDIE project, a collaboration between leading publishers and several academic institutions that plans to create a data discovery index and establish clear standards for data archiving.
Archived data must be linked to a permanent identifier, such as a DOI (Digital Object Identifier), that can be referenced in publications. Permanent identifiers allow researchers to cite their own primary datasets and cite someone else’s data, as several publishers mandate in their latest data citation standards (e.g. Elsevier, Springer Nature, PLOS ONE)
Universities aim to create and disseminate knowledge, and generating FAIR data is crucial for these scientific breakthroughs. Institutes have to support researchers with their data administration throughout the whole data lifecycle and help them meet both funders’ and publishers’ requirements. Ultimately, good RDM services are important for universities to hit the top positions of university rankings. In fact, ranking principles include, among others, quality of publications, collaborations, and number of citations (see the Leiden Ranking indicators for example).
Today, university libraries offer different supportive measures for RDM, such as info points, assistance to generate RDM plans, guidelines to funders’ requirements and finally institutional repositories. Just to name a few, the digital library of the University of California in the US, the Max Planck Society in Germany and the University of Edinburgh in the UK offer excellent support for RDM. However, delivering all the infrastructures needed for data collection, analysis and storage is not an easy job, and libraries might struggle to provide visibility to datasets and facilitate the re-use of data. Above all, motivating scientists can prove to be a challenge, especially as researchers are short on time and might not see clear and immediate benefits. Luckily, both librarians and researchers alike can find relief by introducing ELNs that help overcome different RDM challenges:
Over the past few years, the requirements of funding bodies on research data management have increased and become more stringent, a tendency which is likely to continue in the near future. Funders see that the key to a faster, more efficient, and more productive research process is effective management of generated data. Robust RDM plans lead to reusable and accessible data, which can be solid bases for further discoveries, interpolation with different datasets and new interpretations. Ultimately, RDM is not only crucial for research of higher quality, but it is also a way to maximize both funders’ investments and optimize researchers’ efforts.
The overarching principle is that publicly funded research should generate openly available data, which bring benefits to the entire scientific community and beyond. Each funder has its own policy for RDM, but in general, applicants are required to plan each phase of the data lifecycle, and budget for the cost of their RDM. Some funders, such as the BBSRC, provide sanctions in case of non-compliance. Researchers are expected to plan how and where to store their data, which backup systems to use, the best archive, the data license, and potential embargo periods. Also, to ensure that everyone is able to replicate and reuse any raw data, the authors should state what is needed to validate their results.
Researchers can find updated RDM policies on funders’ websites, but of course, university libraries and specialized centres also provide extensive information. For instance, the Digital Curation Centre (DCC) gives a complete overview of the policies of the main UK and European funding bodies. The DCC also curates, in partnership with the University of California Curation Centre, the DMPonline service which provides templates for RDM plans. Another useful tool is the SHERPA Juliet database, which collects requirements on open access, sharing and archiving from funders worldwide.
Different international and national organizations support universities and research institutes with their research data management. Good examples are the DANS in the Netherlands, the DLCM in Switzerland, the BD2K project of the NIH and the British Royal Society’s Data Programme. Germany has also launched the Priority Initiative to coordinate the digitalization of science, and established the German Council for Scientific Information Infrastructures to support the development of infrastructures for research data management. On an international level, the Research Data Alliance (RDA), organized by European, US and Australian governmental bodies, aims to develop tools to facilitate data-driven research.
Scientists may ask themselves whether RDM plans help their research. There are several reasons why researchers should have no doubts. For instance, RDM plans allow them to:
But what about their scientific career? Can RDM plans help? Although there is not yet an author-level metric that takes into account the number of data citations, work is in progress exactly in that direction. To begin with, both the Declaration on Research Assessment (DORA) and the FORCE11 Joint declaration of data citation principles recognize the need to make data citations count towards academic performance. Here are two model principles from the FORCE11 declaration, which has been signed by, among others, the NIH, European agencies, and different institutions and several publishers:
“Data should be considered legitimate, citable products of research”
“Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications”
Publishers are also taking actions and encouraging researchers to cite data. Two years ago Elsevier implemented data citations standards and strongly advocates the citation of primary datasets. Moreover, journals like Scientific Data by Springer Nature and Earth System Science Data by Copernicus Publications specifically focus on the publication of datasets and discoveries based on reused data. The good impact factor of the journals is just one more reason to share data.
Last but not least, it is important to mention that funders cover any costs related to RDM throughout the duration of the projects they fund (see for example the H2020 guidelines on FAIR data management. Alternatively, you can read our blog post on H2020 for a summarized guide).
ELN: an asset to meet funders’ requirements and maximize the benefits of RDM plans
Fulfilling the requirements for RDM in a grant application might seem a challenging task. However, there is an easy solution for this: ELNs. Thanks to features such as tags, filters, advanced search and organized data elements, ELNs offer a ready-to-use structure with which researchers can build a robust RDM proposal. So, when and how can an ELN help?
Research Data Management has become an integral part of the research process, and it is pivotal to achieve scientific breakthroughs in a data-driven research.
Researchers, universities, funding bodies and publishers must join forces to encourage and promote the generation of Findable, Accessible, Interoperable and Reusable data in order to translate more discoveries into new solutions.
labfolder’s mission is to simplify the creation of effective RDM plans and enable researchers to easily put them into action for a better, reproducible, transparent and open science.