Sharing is caring – Why data sharing is beneficial for Science

Most people would probably agree without hesitation that data sharing and the communication of scientific findings, technical expertise and knowledge has brought humanity where it is now, and still is one of the cornerstones of modern civilization. Thus, the publication of scientific data is an essential necessity for publicly funded scientists. There is, however, quite some controversy about the details of data sharing and scientific communication, or to be more precise: when to share data, and to what extent.

Sharing is caring - Scientists benefit from data sharing

Sharing is caring – Scientists benefit from data sharing

Data sharing in scientific publications

Usually, academic scientists publish their data in journals. The publication in scientific journals often marks a point when the data collection is complete, and conclusions have been drawn from the data to a certain level of comprehensiveness. Although it helps the readability of a scientific story, data sharing in scientific publications has some disadvantages: First, the shortened technical descriptions usually make the reproduction of the experiments and verification of the scientific findings quite challenging. Second, journal articles are strongly biased towards positive results (1) – whatever did not work is not published and thus often has to be repeated over and over again.

In reference to ‘modern’ data sharing, open scienceopen data and open access movements are also steadily growing in recognition. Their histories date back to the 1600′s, “when the societal demand for access to scientific knowledge reached a point where it became necessary for groups of scientists to share resources with each other so that they could collectively do their work” (2).  Several initiatives and institutions have formed in recent years and an increasing number of renowned research institutes have promoted and influenced such initiatives of data sharing since many years or are joining in nowadays, take e.g. the Helmholtz Society here in Germany.

Data sharing today: Why isn´t all data shared yet?

Open Access is on the rise, but if you ask a random scientist whether he would like to share his data right here, right now, most certainly, the answer would be ‘no’. Why is scientific data sharing still so unpopular, despite all the technical possibilities which facilitate it?

  • Some scientists might be afraid that once they have shared their knowledge, it is no longer possible to publish their findings in a journal. Few scientists know that pre-publication of research findings is actually not a problem for most journal publishers (3).
  • Holding back knowledge can give a technical advantage to scientists who do so. Generally, scientists which hold back knowledge also profit from the knowledge others share. Restrictive sharing behaviour might bounce back when less data is shared with those who share less, i.e. in personal communication.
  • Sharing data is sometimes quite an effort. In a study by Tenopir et al. (4), the lack of resources and infrastructure was given as a reason for not sharing data by up to half of all scientists interviewed in the study. This is especially true for data for which no standard or metadata agreements exist.

Scientific data sharing: The benefits

Whether data is shared pre- or post-publication, a study by Heather Piwowar (5) – co-founder of ImpactStory – suggests that authors of journal papers benefit from data sharing: When analyzing the citation count of publications describing human cancer microarray data, she found a 69% increase in citation count for those publications where the raw (microarray) data was shared. Pre-publication can also help scientists to claim originality of their work. “Sharing my data at early stages of research is not only beneficial for science and for fellow scientists, but also for me”, says Julien Colomb, neurobiologist and founder of Drososhare. “I can also undoubtedly claim that I have been the first to publish data on my topics. Pre-publication may actually protect you from being scooped.”

A need for new ways of data sharing

There is an increasing criticism towards traditional ways of scientific publishing: A recent comment by Nobel Laureate Randy Schekman in The Guardian and a title page by The Economist a few weeks ago demonstrate that the topic of data sharing gets an increasing amount of public attention. We also discussed the topic on our blog. Now, the more interesting – and yet unanswered – question is: Which – alternative or complementary – ways to share and publish scientific data do exist?

Data sharing – which way works for you?

While publishers slowly open up to sharing scientific raw data (see Nature’s Scientific Data, Elsevier’s Article of the Future), many scientists are already a few steps ahead. With the amount and complexity of generated scientific data ever increasing, the scientific community is exploring new possibilities of data sharing to complement scientific publications by new formats. Scientific databases for the deposition of research data have a long history, and their number is continuing to grow. Data submission to specific databases is mandatory for journal publication in some cases, but many databases also offer the possibility of temporarily retaining the data, making it available only after a certain period after publishing.

Scientific Databases often store specific data in a defined structure, while pre-publication servers allow the deposition of data in a journal-like style. Pre-publication servers like arXiv are the standard for data sharing in physics, mathematics and computer sciences.  In contrast, chemistry and other disciplines still lag behind. With the recent launch of bioRxiv, a new and rapidly growing initiative to bring the possibilities of pre-print to wet lab scientists, they might keep up soon, though.

Scientific repositories like figshare and Dryad allow the deposition and publication of scientific raw data. The high degree of freedom in data formats that can be submitted allows data sharing, deposition and reuse of many different types of data, but may also make it difficult to find specific data sets. Platforms for the harmonization and bundling of scientific data in repositories like biosharing and Macmillan´s Scientific Data promise to make the desired datasets reusable by providing easier ways to find them.

Finally, open lab notebooks (6) are probably the most radical way of data sharing: Scientists conduct their research publically by giving everyone access to their own lab notes. This data sharing format, however, will only be applicable to scientists where the PI or the institutional rules do not speak against this practice.

Future challenges in data sharing

How can we improve scientific data sharing and accelerate science? Two major challenges have to be met:

  • Motivating scientists to share their data: Aside from the increasing number or platforms where scientists can share their data easily, policies of funding agencies increasingly encourage scientists to share larger amounts of their data.
  • Making data sharing easy: For many disciplines that generate heterogeneous datasets, unified sharing protocols are hard to implement. With the growing number of data repositories and platforms which bundle and distribute data readily available on platforms, we believe the accessibility of scientific raw data will get a strong boost soon.

Safe and secure data sharing with labfolder

With labfolder, we want to allow scientists to technically process the information that is used in scientific research by allowing them to share their protocols which have led to published discoveries. However, the most important point is that scientific data sharing is done voluntarily and based on a common understanding that it helps science, and that every scientist can decide what to share – and when.

References

  1. Publication bias. (2013, December 8). In Wikipedia, The Free Encyclopedia. Retrieved 15:52, December 10, 2013
  2. David, P. A. (2004). “Understanding the emergence of ‘open science’ institutions: Functionalist economics in historical context”. Industrial and Corporate Change 13 (4): 571. doi:10.1093/icc/dth023
  3. List of academic journals by preprint policy. (2013, November 7). In Wikipedia, The Free Encyclopedia. Retrieved 15:48, December 10, 2013.
  4. Tenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, et al. (2011) Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6): e21101. doi:10.1371/journal.pone.0021101
  5. Piwowar, Heather Alyce (2010) Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data. Doctoral Dissertation, University of Pittsburgh.
  6. Open notebook science. (2013, December 5). In Wikipedia, The Free Encyclopedia. Retrieved 15:58, December 10, 2013

Leave a Reply

Your email address will not be published. Required fields are marked *