As Vince Smith once put it [1] data are the fuel of Science:
“The fabric of science is changing, driven by a revolution in digital technologies that facilitate the acquisition and communication of massive amounts of data. This is changing the nature of collaboration and expanding opportunities to participate in science. If digital technologies are the engine of this revolution, digital data are its fuel. But for many scientific disciplines, this fuel is in short supply.”
Despite the importance of data, some scientists are often really bad at sharing it properly. So why don’t scientists share data?
Nature has a special issue dedicated to this topic published today at nature.com/news/specials/datasharing [2,3,4,5] which isn’t behind a pay wall (at the moment). It describes some of the technical and cultural barriers to data sharing. You should go and read it yourself if you’re interested, but here is a very brief and incomplete summary, with some extra points thrown in for good measure:
- Some funding bodies do not adequately support the research projects they sponsor in sharing data properly, both before and after publication.
- Many scientists lack awareness, incentives and knowledge of data sharing which can be compounded by a fear of being “scooped”.
- Public databases, often a more natural home for data than traditional publications, are frequently undervalued by a publish or perish culture [6].
- Traditional scientific publishing is frequently (and ironically) a really inadequate method for sharing data. Important data and metadata routinely gets damaged or destroyed in the process of publishing [7].
- The technical infrastructure for long term data sharing either does not exist or is not understood by those who should be providing and using it. This can lead to empty archive syndrome.
These are some of the reasons that scientists don’t share data. Which raises the question, how do we get out of this mess? The special issue offers some solutions [3,4] to these cultural and technical problems, including the use of Creative Commons licenses. It’s good to see these important issues given a higher profile but we will probably be striving for better data sharing for many years to come.
References
- Vince Smith (2009). Data publication: towards a database of everything BMC Research Notes, 2 (1) DOI: 10.1186/1756-0500-2-113
- Anonymous (2009). Data’s shameful neglect: Research cannot flourish if data are not preserved and made accessible. Nature, 461 (7261), 145-145 DOI: 10.1038/461145a
- Schofield, P., Bubela, T., Weaver, T., Portilla, L., Brown, S., Hancock, J., Einhorn, D., Tocchini-Valentini, G., Hrabe de Angelis, M., & Rosenthal, N. (2009). Post-publication sharing of data and tools Nature, 461 (7261), 171-173 DOI: 10.1038/461171a
- Toronto International Data Release Workshop Authors (2009). Prepublication data sharing Nature, 461 (7261), 168-170 DOI: 10.1038/461168a
- Bryn Nelson (2009). Data sharing: Empty archives Nature, 461 (7261), 160-163 DOI: 10.1038/461160a
- Michael Seringhaus, & Mark Gerstein (2007). Publishing perishing? Towards tomorrow’s information architecture BMC Bioinformatics, 8 (1) DOI: 10.1186/1471-2105-8-17
- Duncan Hull, Steve Pettifer, & Douglas Kell (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web PLoS Computational Biology, 4 (10) DOI: 10.1371/journal.pcbi.1000204
[CC-licensed picture of Sharing by ryancr, more commentary on this post over at friendfeed.]