In data science, replicability and reproducibility are some of the keys to data integrity. Three main topics can be derived from the concept: data replicability, data reproducibility and research reproducibility. These may sound similar, but they are actually quite different. We will cover these three topics and their differences over the course of three articles.
To start with, let’s take a look at data replicability.
Firstly, how do you define data replicability? It simply means that it is possible for an experiment to be carried out again, either by the same scientist or another. Below we will look into why the ability to repeat experiments is important and how you can ensure this.
The first reason data needs to be replicable is that it makes it more reliable. Repeating experiments allows you to identify mistakes, flukes and falsifications. Mistakes may have been the misreading of a result or incorrectly entering data. These are sometimes simply inevitable are we are only human. Most importantly, replication can identify falsifications, which can have more serious implications in the future.
On a more positive note, replication also lets you see patterns and trends in your results. This is affirmative for your work, making it stronger and better able to support your claims. When data is reliable, it also means it is data with integrity, which is very important and can be read more about here.
If someone is to thoroughly peer review your work, then they would carry out the experiments again themselves. This is often not practical or even possible, but it should always be considered. If someone were wanting to replicate an experiment, it is vital that the experiment is replicable in theory. The first scientist should also do everything possible to allow replicability.
If your work is then published, it crucial for there to be a section on the methods of your work. This is generally required anyway, but it is further important to enable others to go on to repeat. Having a complete methodology also relates to reliability, since if your methods are reliable, the results are more likely to be reliable. Furthermore, it will indicate whether your data was collected in a generally accepted way, which others are able to repeat.
Being able to replicate experiments and the resulting data also allows you to check the extraneous variables. These are variables that you are not actually testing, but that may be influencing your results. Through replication, you can see how and if any extraneous variables have affected your experiment and if they need to be made note of. Through replication, you are more likely to be able to identify the undesirable variables and then decrease or control their influence where possible.
Replicating data yourself, as well as others doing it, is advisable before you publish the work, if that is your intention. This is because if the data has been replicated and confirmed before publication, it is again more likely to have integrity. In turn, the chance of your paper being retracted decreases. Making it easier for others to replicate data then makes it easier for them to support your data and claims, so it is definitely in your interest to make data replicable.
While carrying out your experiment you should record every step you take in the process. This is not only because it is good practice and is often required to track what you are doing, but it provides a log to look back at. This, in turn, gives you something to refer back to and enables you to repeat the experiment. It also makes it easier for others to follow the same steps to see if they obtain the same results, which is the whole aim of replicability.
When you are recording your processes, make sure you are totally truthful in what you record. Sometimes it can be tempting to ignore mistakes or write results more favorably than they actually came out. This also applies to when you repeat experiments, if one is a bit of an outlier, don’t brush it under the rug. That is the point of repeats, to check your methods, equipment and more for reliability. Furthermore, if you are not truthful with what others will then be reading and carrying out experiments from, they could come with serious questions as to why results are not matching.
You should make your raw data available for others, so long as it does not compromise patents or such like. This would be accompanied by the step-by-step process that you went through and the description of each step. This is to further transparency, enhance reliability and of course increase the replicability of your data. Having the raw data to compare when repeating experiments yourself or when others replicate it in the future makes it easier since you have something to refer back to.
You are focused on your experiments, so to record each and every detail might not be at the forefront of your mind. This is where labfolder comes in. Not only can you enter data directly into your digital lab notebook, but there is an automatic full audit trail accompanying it. This includes dates and times of creation, editing, deletion, signing and witnessing.
Having all of your data and work in one place also helps in visualizing it. This helps ensure each and every step of your work is recorded since it is easier to see when there are gaps. Within labfolder, you can also create and share protocols and templates. Thus you can provide others with the exact instructions on how to carry out your experiment, giving total replicability.