Most people would probably agree without hesitation that data sharing and the communication of scientific findings, technical expertise and knowledge has brought humanity where it is now, and still is one of the cornerstones of modern civilization. Thus, the publication of scientific data is an essential necessity for publicly funded scientists. There is, however, quite some controversy about the details of data sharing and scientific communication, or to be more precise: when to share data, and to what extent.
Usually, academic scientists publish their data in journals. The publication in scientific journals often marks a point when the data collection is complete, and conclusions have been drawn from the data to a certain level of comprehensiveness. Although it helps the readability of a scientific story, data sharing in scientific publications has some disadvantages: First, the shortened technical descriptions usually make the reproduction of the experiments and verification of the scientific findings quite challenging. Second, journal articles are strongly biased towards positive results (1) – whatever did not work is not published and thus often has to be repeated over and over again.
In reference to ‘modern’ data sharing, open science, open data and open access movements are also steadily growing in recognition. Their histories date back to the 1600′s, “when the societal demand for access to scientific knowledge reached a point where it became necessary for groups of scientists to share resources with each other so that they could collectively do their work” (2). Several initiatives and institutions have formed in recent years and an increasing number of renowned research institutes have promoted and influenced such initiatives of data sharing since many years or are joining in nowadays, take e.g. the Helmholtz Society here in Germany.
Open Access is on the rise, but if you ask a random scientist whether he would like to share his data right here, right now, most certainly, the answer would be ‘no’. Why is scientific data sharing still so unpopular, despite all the technical possibilities which facilitate it?
Whether data is shared pre- or post-publication, a study by Heather Piwowar (5) – co-founder of ImpactStory – suggests that authors of journal papers benefit from data sharing: When analyzing the citation count of publications describing human cancer microarray data, she found a 69% increase in citation count for those publications where the raw (microarray) data was shared. Pre-publication can also help scientists to claim originality of their work. “Sharing my data at early stages of research is not only beneficial for science and for fellow scientists, but also for me”, says Julien Colomb, neurobiologist and founder of Drososhare. “I can also undoubtedly claim that I have been the first to publish data on my topics. Pre-publication may actually protect you from being scooped.”
There is an increasing criticism towards traditional ways of scientific publishing: A recent comment by Nobel Laureate Randy Schekman in The Guardian and a title page by The Economist a few weeks ago demonstrate that the topic of data sharing gets an increasing amount of public attention. We also discussed the topic on our blog. Now, the more interesting – and yet unanswered – question is: Which – alternative or complementary – ways to share and publish scientific data do exist?
While publishers slowly open up to sharing scientific raw data (see Nature’s Scientific Data, Elsevier’s Article of the Future), many scientists are already a few steps ahead. With the amount and complexity of generated scientific data ever increasing, the scientific community is exploring new possibilities of data sharing to complement scientific publications by new formats. Scientific databases for the deposition of research data have a long history, and their number is continuing to grow. Data submission to specific databases is mandatory for journal publication in some cases, but many databases also offer the possibility of temporarily retaining the data, making it available only after a certain period after publishing.
Scientific Databases often store specific data in a defined structure, while pre-publication servers allow the deposition of data in a journal-like style. Pre-publication servers like arXiv are the standard for data sharing in physics, mathematics and computer sciences. In contrast, chemistry and other disciplines still lag behind. With the recent launch of bioRxiv, a new and rapidly growing initiative to bring the possibilities of pre-print to wet lab scientists, they might keep up soon, though.
Scientific repositories like figshare and Dryad allow the deposition and publication of scientific raw data. The high degree of freedom in data formats that can be submitted allows data sharing, deposition and reuse of many different types of data, but may also make it difficult to find specific data sets. Platforms for the harmonization and bundling of scientific data in repositories like biosharing and Macmillan´s Scientific Data promise to make the desired datasets reusable by providing easier ways to find them.
Finally, open lab notebooks (6) are probably the most radical way of data sharing: Scientists conduct their research publically by giving everyone access to their own lab notes. This data sharing format, however, will only be applicable to scientists where the PI or the institutional rules do not speak against this practice.
How can we improve scientific data sharing and accelerate science? Two major challenges have to be met:
With labfolder, we want to allow scientists to technically process the information that is used in scientific research by allowing them to share their protocols which have led to published discoveries. However, the most important point is that scientific data sharing is done voluntarily and based on a common understanding that it helps science, and that every scientist can decide what to share – and when.