O'Really?

December 11, 2009

The Semantic Biochemical Journal experiment

utopian documentsThere is an interesting review [1] (and special issue) in the Biochemical Journal today, published by Portland Press Ltd. It provides (quote) “a whirlwind tour of recent projects to transform scholarly publishing paradigms, culminating in Utopia and the Semantic Biochemical Journal experiment”. Here is a quick outline of the publishing projects the review describes and discusses:

  • Blogs for biomedical science
  • Biomedical Ontologies – OBO etc
  • Project Prospect and the Royal Society of Chemistry
  • The Chemspider Journal of Chemistry
  • The FEBS Letters experiment
  • PubMedCentral and BioLit [2]
  • Public Library of Science (PLoS) Neglected Tropical Diseases (NTD) [3]
  • The Elsevier Grand Challenge [4]
  • Liquid Publications
  • The PDF debate: Is PDF a hamburger? Or can we build more useful applications on top of it?
  • The Semantic Biochemical Journal project with Utopia Documents [5]

The review asks what advances these projects have made  and what obstacles to progress still exist. It’s an entertaining tour, dotted with enlightening observations on what is broken in scientific publishing and some of the solutions involving various kinds of semantics.

One conclusion made is that many of the experiments described above are expensive and difficult, but that the costs of not improving scientific publishing with various kinds of semantic markup is high, or as the authors put it:

“If the cost of semantic publishing seems high, then we also need to ask, what is the price of not doing it? From the results of the experiments we have seen to date, there is clearly a need to move forward and still a great deal of scope to innovate. If we fail to move forward in a collaborative way, if we fail to engage the key players, the price will be high. We will continue to bury scientific knowledge, as we routinely do now, in static, unconnected journal articles; to sequester fragments of that knowledge in disparate databases that are largely inaccessible from journal pages; to further waste countless hours of scientists’ time either repeating experiments they didn’t know had been performed before, or worse, trying to verify facts they didn’t know had been shown to be false. In short, we will continue to fail to get the most from our literature, we will continue to fail to know what we know, and will continue to do science a considerable disservice.”

It’s well worth reading the review, and downloading the Utopia software to experience all of the interactive features demonstrated in this special issue, especially the animated molecular viewers and sequence alignments.

Enjoy… the Utopia team would be interested to know what people think, see commentary on friendfeed,  the digital curation blog and youtube video below for more information.

References

  1. Attwood, T., Kell, D., McDermott, P., Marsh, J., Pettifer, S., & Thorne, D. (2009). Calling International Rescue: knowledge lost in literature and data landslide! Biochemical Journal, 424 (3), 317-333 DOI: 10.1042/BJ20091474
  2. Fink, J., Kushch, S., Williams, P., & Bourne, P. (2008). BioLit: integrating biological literature with databases Nucleic Acids Research, 36 (Web Server) DOI: 10.1093/nar/gkn317
  3. Shotton, D., Portwin, K., Klyne, G., & Miles, A. (2009). Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article PLoS Computational Biology, 5 (4) DOI: 10.1371/journal.pcbi.1000361
  4. Pafilis, E., O’Donoghue, S., Jensen, L., Horn, H., Kuhn, M., Brown, N., & Schneider, R. (2009). Reflect: augmented browsing for the life scientist Nature Biotechnology, 27 (6), 508-510 DOI: 10.1038/nbt0609-508
  5. Pettifer, S., Thorne, D., McDermott, P., Marsh, J., Villéger, A., Kell, D., & Attwood, T. (2009). Visualising biological data: a semantic approach to tool and database integration BMC Bioinformatics, 10 (Suppl 6) DOI: 10.1186/1471-2105-10-S6-S19

September 18, 2009

Popular, personal and public data: Article-level metrics at PLoS

PLoS: The Public Library of ScienceThe Public Library of Science (PLoS) is a non-profit organisation committed to making the world’s scientific and medical literature freely accessible to everyone via open access publishing. As recently announced they have just published the first article-level metrics (e.g. web server logs and related information) for all articles in their library. This is novel, interesting and potentially useful data, not currently made publicly available by other publishers. Here is a  selection of some of the data, taken from the full dataset here (large file), which includes the “top ten” papers by viewing statistics.

Article level metrics for some papers published in PLoS (August 2009)

Rank* Article Journal Views Citations**
1 Why Most Published Research Findings Are False (including this one?) [1] PLoS Medicine 232847 52
2 Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration [2] PLoS Medicine 182305 15
3 Serotonin and Depression: A Disconnect between the Advertisements and the Scientific Literature [3] PLoS Medicine 105498 16
4 The Diploid Genome Sequence of an Individual Human [4] PLoS Biology 88271 54
5 Ultrasonic Songs of Male Mice [5] PLoS Biology 81331 8
6 Complete Primate Skeleton from the Middle Eocene of Messel in Germany: Morphology and Paleobiology [6] PLoS ONE 62449 0
7 The Impact Factor Game: It is time to find a better way to assess the scientific literature [7] PLoS Medicine 61353 13
8 A Map of Recent Positive Selection in the Human Genome [8] PLoS Biology 59512 94
9 Mapping the Structural Core of Human Cerebral Cortex [9] PLoS Biology 58151 8
10 Ten Simple Rules for Getting Published [10] PLoS Computational Biology 57312 1
11 Men, Women, and Ghosts in Science [11] PLoS Biology 56982 0
120 Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web [12] (w00t!) PLoS Computational Biology 16295 3
1500 Specificity and evolvability in eukaryotic protein interaction networks [13] PLoS Computational Biology 4270 7
1632 Comparative genomics and disorder prediction identify biologically relevant SH3 protein interactions [14] PLoS Computational Biology 4063 10
1755 Folding Very Short Peptides Using Molecular Dynamics [15] PLoS Computational Biology 3876 2
2535 Microblogging the ISMB: A New Approach to Conference Reporting [16] PLoS Computational Biology 3055 1
7521 Probing the Flexibility of Large Conformational Changes in Protein Structures through Local Perturbations [17] PLoS Computational Biology 1024 0
12549 Deciphering Proteomic Signatures of Early Diapause in Nasonia [18] PLoS ONE 0 0

*The rank is based on the 12,549 papers for which viewing data (combined usage of HTML + PDF + XML) are available.

**Citation counts are via PubMedCentral (data from CrossRef and Scopus is also provided, see Bora’s comments and commentary at Blue Lab Coats.)

Science is not a popularity contest but…

Analysing this data is not straightforward. Some highly-viewed articles are never cited (reviews, editorial, essays, opinion, etc). Likewise, popularity and importance are not the same thing. Some articles get lots of citations but few views, which suggests that people are not actually reading the papers them before citing them. As described on the PLoS website article-level-metrics.plos.org:

“When looking at Article-Level Metrics for the first time bear the following points in mind:

  • Online usage is dependent on the article type, the age of the article, and the subject area(s) it is in. Therefore you should be aware of these effects when considering the performance of any given article.
  • Older articles normally have higher usage than younger ones simply because the usage has had longer to accumulate. Articles typically have a peak in their usage in the first 3 months and usage then levels off after that.
  • Spikes of usage can be caused by media coverage, usage by large numbers of people, out of control download scripts or any number of other reasons. Without a detailed look at the raw usage logs it is often impossible to tell what the reason is and so we encourage you to regard usage data as indicative of trends, rather than as an absolute measure for any given article.
  • We currently have missing usage data for some of our articles, but we are working to fill the gaps. Primarily this affects those articles published before June 17th, 2005.
  • Newly published articles do not accumulate usage data instantaneously but require a day or two before data are shown.
  • Article citations as recorded by the Scopus database are sometimes undercounted because there are two records in the database for the same article. We’re working with Scopus to correct this issue.
  • All metrics will accrue over time (and some, such as citations, will take several years to accrue). Therefore, recent articles may not show many metrics (other than online usage, which accrues from day one). ”

So all the usual caveats apply when using this bibliometric data. Despite the limitations, it is more revealing than the useful (but simplistic) “highly accesssed” papers at BioMedCentral, which doesn’t always give full information on what “highly” actually means next to each published article. It will be interesting to see if other publishers now follow the lead of PLoS and BioMed Central and also publish their usage data combined with other bibliometric indicators such as blog coverage. For authors publishing with PLoS, this data has an added personal dimension too, it is handy to see how many views your paper has.

As paying customers of the services that commercial publishers provide, should scientists and their funders be demanding more of this kind of information in the future? I reckon they should. You have to wonder, why these kind of innovations have taken so long to happen, but they are a welcome addition.

[More commentary on this post over at friendfeed.]

References

  1. Ioannidis, J. (2005). Why Most Published Research Findings Are False PLoS Medicine, 2 (8) DOI: 10.1371/journal.pmed.0020124
  2. Kirsch, I., Deacon, B., Huedo-Medina, T., Scoboria, A., Moore, T., & Johnson, B. (2008). Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration PLoS Medicine, 5 (2) DOI: 10.1371/journal.pmed.0050045
  3. Lacasse, J., & Leo, J. (2005). Serotonin and Depression: A Disconnect between the Advertisements and the Scientific Literature PLoS Medicine, 2 (12) DOI: 10.1371/journal.pmed.0020392
  4. Levy, S., Sutton, G., Ng, P., Feuk, L., Halpern, A., Walenz, B., Axelrod, N., Huang, J., Kirkness, E., Denisov, G., Lin, Y., MacDonald, J., Pang, A., Shago, M., Stockwell, T., Tsiamouri, A., Bafna, V., Bansal, V., Kravitz, S., Busam, D., Beeson, K., McIntosh, T., Remington, K., Abril, J., Gill, J., Borman, J., Rogers, Y., Frazier, M., Scherer, S., Strausberg, R., & Venter, J. (2007). The Diploid Genome Sequence of an Individual Human PLoS Biology, 5 (10) DOI: 10.1371/journal.pbio.0050254
  5. Holy, T., & Guo, Z. (2005). Ultrasonic Songs of Male Mice PLoS Biology, 3 (12) DOI: 10.1371/journal.pbio.0030386
  6. Franzen, J., Gingerich, P., Habersetzer, J., Hurum, J., von Koenigswald, W., & Smith, B. (2009). Complete Primate Skeleton from the Middle Eocene of Messel in Germany: Morphology and Paleobiology PLoS ONE, 4 (5) DOI: 10.1371/journal.pone.0005723
  7. The PLoS Medicine Editors (2006). The Impact Factor Game PLoS Medicine, 3 (6) DOI: 10.1371/journal.pmed.0030291
  8. Voight, B., Kudaravalli, S., Wen, X., & Pritchard, J. (2006). A Map of Recent Positive Selection in the Human Genome PLoS Biology, 4 (3) DOI: 10.1371/journal.pbio.0040072
  9. Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C., Wedeen, V., & Sporns, O. (2008). Mapping the Structural Core of Human Cerebral Cortex PLoS Biology, 6 (7) DOI: 10.1371/journal.pbio.0060159
  10. Bourne, P. (2005). Ten Simple Rules for Getting Published PLoS Computational Biology, 1 (5) DOI: 10.1371/journal.pcbi.0010057
  11. Lawrence, P. (2006). Men, Women, and Ghosts in Science PLoS Biology, 4 (1) DOI: 10.1371/journal.pbio.0040019
  12. Hull, D., Pettifer, S., & Kell, D. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web PLoS Computational Biology, 4 (10) DOI: 10.1371/journal.pcbi.1000204
  13. Beltrao, P., & Serrano, L. (2007). Specificity and Evolvability in Eukaryotic Protein Interaction Networks PLoS Computational Biology, 3 (2) DOI: 10.1371/journal.pcbi.0030025
  14. Beltrao, P., & Serrano, L. (2005). Comparative Genomics and Disorder Prediction Identify Biologically Relevant SH3 Protein Interactions PLoS Computational Biology, 1 (3) DOI: 10.1371/journal.pcbi.0010026
  15. Ho, B., & Dill, K. (2006). Folding Very Short Peptides Using Molecular Dynamics PLoS Computational Biology, 2 (4) DOI: 10.1371/journal.pcbi.0020027
  16. Saunders, N., Beltrão, P., Jensen, L., Jurczak, D., Krause, R., Kuhn, M., & Wu, S. (2009). Microblogging the ISMB: A New Approach to Conference Reporting PLoS Computational Biology, 5 (1) DOI: 10.1371/journal.pcbi.1000263
  17. Ho, B., & Agard, D. (2009). Probing the Flexibility of Large Conformational Changes in Protein Structures through Local Perturbations PLoS Computational Biology, 5 (4) DOI: 10.1371/journal.pcbi.1000343
  18. Wolschin, F., & Gadau, J. (2009). Deciphering Proteomic Signatures of Early Diapause in Nasonia PLoS ONE, 4 (7) DOI: 10.1371/journal.pone.0006394

April 17, 2009

The Unreasonable Effectiveness of Google

GoogleVia the Official Google Research Blog at the University of Google, Alon Halevy, Peter Norvig and Fernando Pereira have published an interesting expert opinion piece in the  March/April 2009 edition of IEEE Intelligent Systems: computer.org/intelligent. The paper talks about embracing complexity and making use of the “the unreasonable effectiveness of data” [1] drawing analogies with the “unreasonable effectiveness of mathematics” [2]. There is plenty to agree and disagree with in this provocative article which makes it an entertaining read. So what can we learn from those expert Googlers in the Googleplex? (more…)

October 14, 2008

Open Access Day: Why It Matters

Open Access Day 14th October 2008Today, Tuesday the 14th of October 2008, is Open Access Day. Like many others, this blog post is joining in by describing why Open Access matters – from a personal point of view. According to the wikipedia article Open Access (OA) is “free, immediate, permanent, full-text, online access, for any user, web-wide, to digital scientific and scholarly material, primarily research articles published in peer-reviewed journals. OA means that any individual user, anywhere, who has access to the Internet, may link, read, download, store, print-off, use, and data-mine the digital content of that article. An OA article usually has limited copyright and licensing restrictions.” What does all this mean and why does it matter? Well, in four question-and-answer points, here goes… (more…)

July 25, 2008

How to spend a £400 million Science budget

A thought experiment with lots of money

The Queens Ahead by canonsnapperThe Biotechnology and Biological Sciences Research Council (BBSRC) is the United Kingdom’s funding agency for academic research and training in the non-clinical life sciences. It supports a total of around 1600 scientists and 2000 research students in universities and institutes in the UK. The head of our laboratory, Douglas Kell, has recently been appointed Chief Executive of the BBSRC [1]. Congratulations Doug, we wish you the very best in your new job. Now, according to bbsrc.ac.uk, their annual budget is a cool £400 million (just short of $800 million or €500 million). This has left me wondering, how would you spend a £400 million Science budget for the life sciences? For the purposes of this article, imagine it was you that had been put in charge of said budget, and Prime Minister Gordon Brown (texture like sun) had given you, yes YOU, a big bag of cash to distribute as you see fit. A mouth-watering prospect, I think you’ll agree. Here, is my personal opinion of how, in my dreams, I would spend the money. (more…)

Blog at WordPress.com.