As Vince Smith once put it [1] data are the fuel of Science:
“The fabric of science is changing, driven by a revolution in digital technologies that facilitate the acquisition and communication of massive amounts of data. This is changing the nature of collaboration and expanding opportunities to participate in science. If digital technologies are the engine of this revolution, digital data are its fuel. But for many scientific disciplines, this fuel is in short supply.”
Despite the importance of data, some scientists are often really bad at sharing it properly. So why don’t scientists share data?
Nature has a special issue dedicated to this topic published today at nature.com/news/specials/datasharing [2,3,4,5] which isn’t behind a pay wall (at the moment). It describes some of the technical and cultural barriers to data sharing. You should go and read it yourself if you’re interested, but here is a very brief and incomplete summary, with some extra points thrown in for good measure:
- Some funding bodies do not adequately support the research projects they sponsor in sharing data properly, both before and after publication.
- Many scientists lack awareness, incentives and knowledge of data sharing which can be compounded by a fear of being “scooped”.
- Public databases, often a more natural home for data than traditional publications, are frequently undervalued by a publish or perish culture [6].
- Traditional scientific publishing is frequently (and ironically) a really inadequate method for sharing data. Important data and metadata routinely gets damaged or destroyed in the process of publishing [7].
- The technical infrastructure for long term data sharing either does not exist or is not understood by those who should be providing and using it. This can lead to empty archive syndrome.
These are some of the reasons that scientists don’t share data. Which raises the question, how do we get out of this mess? The special issue offers some solutions [3,4] to these cultural and technical problems, including the use of Creative Commons licenses. It’s good to see these important issues given a higher profile but we will probably be striving for better data sharing for many years to come.
References
- Vince Smith (2009). Data publication: towards a database of everything BMC Research Notes, 2 (1) DOI: 10.1186/1756-0500-2-113
- Anonymous (2009). Data’s shameful neglect: Research cannot flourish if data are not preserved and made accessible. Nature, 461 (7261), 145-145 DOI: 10.1038/461145a
- Schofield, P., Bubela, T., Weaver, T., Portilla, L., Brown, S., Hancock, J., Einhorn, D., Tocchini-Valentini, G., Hrabe de Angelis, M., & Rosenthal, N. (2009). Post-publication sharing of data and tools Nature, 461 (7261), 171-173 DOI: 10.1038/461171a
- Toronto International Data Release Workshop Authors (2009). Prepublication data sharing Nature, 461 (7261), 168-170 DOI: 10.1038/461168a
- Bryn Nelson (2009). Data sharing: Empty archives Nature, 461 (7261), 160-163 DOI: 10.1038/461160a
- Michael Seringhaus, & Mark Gerstein (2007). Publishing perishing? Towards tomorrow’s information architecture BMC Bioinformatics, 8 (1) DOI: 10.1186/1471-2105-8-17
- Duncan Hull, Steve Pettifer, & Douglas Kell (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web PLoS Computational Biology, 4 (10) DOI: 10.1371/journal.pcbi.1000204
[CC-licensed picture of Sharing by ryancr, more commentary on this post over at friendfeed.]
Thanks for this summary, Duncan! I’m curious if any of the authors mentioned micro-attribution as an important step to addressing the second summary point of lack of incentives and fear of being scooped. I’ll have a scan of the articles this weekend.
Comment by Chris Lasher — September 10, 2009 @ 3:03 pm |
Maybe I missed something but 3 out of 4 of the articles asked me to login or pay.Can you get to them all outside of your institution?
Comment by ChemSpiderman — September 10, 2009 @ 6:05 pm |
http://opendino.wordpress.com/
Maybe that’s changing?
Comment by Jon — September 10, 2009 @ 7:49 pm |
The two Opinion articles and the Editorial are free to access online indefinitely. You will need to register/log in to read the two Opinion articles. All Nature Editorials are free to access and do not require log in.
Anybody experiencing problems with access, please accept our apologies- there is a link on the access page to report any problems, so please use it and we’ll look into it immediately.
Comment by Maxine Clarke — September 10, 2009 @ 9:15 pm |
I sometimes wonder if there is any point to worrying about the lack of open data when we are hardly able to grock the data that is already open.
Comment by Hari Jayaram — September 10, 2009 @ 9:56 pm |
I think part of the problem is the effort required to properly publish data. I’ve put all the experiment data and code from my MPhil research online, and it took me several weekends to collate it, write some supplementary documentation and package everything up. Faced with tight deadlines (such as the final submission date for a thesis!), I doubt most researchers would be able to justify spending time preparing and sharing their data.
Comment by Paul — September 12, 2009 @ 3:54 pm |
Because it’s hard to do it properly?
Comment by Neil Swainston — September 18, 2009 @ 9:42 pm |
[…] of authors providing raw data in scholarly publications is currently being debated (discussed here, here and here). Perhaps a new generation of students who expect data to be made available as the norm […]
Pingback by Textbooks in the Fabric « Synthesis — March 15, 2010 @ 8:05 pm |
[…] the way of data sharing and preservation. A thoughtful summary about the whole issue is available here. There is a general consensus and a great interest that it should be mandatory for scientific […]
Pingback by Towards a Data Democracy | Abhishek Tiwari — July 21, 2010 @ 6:48 am |
[…] Why don't scientists share data? […]
Pingback by Around the Web: Some resources on the Panton Principles & open data : Confessions of a Science Librarian — April 16, 2012 @ 3:26 pm |