O'Really?

February 15, 2012

The Open Access Irony Awards: Naming and shaming them

Ask me about open access by mollyaliOpen Access (OA) publishing aims to make the results of scientific research available to the widest possible audience. Scientific papers that are published in Open Access journals are freely available for crucial data mining and for anyone or anything to read, wherever they may be.

In the last ten years, the Open Access movement has made huge progress in allowing:

“any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers.”

But there is still a long way to go yet, as much of the world’s scientific knowledge remains locked up behind publisher’s paywalls, unavailable for re-use by text-mining software and inaccessible to the public, who often funded the research through taxation.

Openly ironic?

ironicIronically, some of the papers that are inaccessible discuss or even champion the very Open Access movement itself. Sometimes the lack of access is deliberate, other times accidental – but the consequences are serious. Whether deliberate or accidental, restricted access to public scientific knowledge is slowing scientific progress [1]. Sometimes the best way to make a serious point is to have a laugh and joke about it. This is what the Open Access Irony Awards do, by gathering all the offenders in one place, we can laugh and make a serious point at the same time by naming and shaming the papers in question.

To get the ball rolling, here is are some examples:

  • The Lancet owned by Evilseviersorry I mean Elsevier, recently  published a paper on “the case for open data” [2] (please login to access article). Login?! Not very open…
  • Serial offender and über-journal Science has an article by Elias Zerhouni on the NIH public access policy [3] (Subscribe/Join AAAS to View Full Text), another on “making data maximally available” [4] (Subscribe/Join AAAS to View Full Text) and another on a high profile advocate of open science [5] (Buy Access to This Article to View Full Text) Irony of ironies.
  • From Nature Publishing Group comes a fascinating paper about harnessing the wisdom of the crowds to predict protein structures [6]. Not only have members of the tax-paying public funded this work, they actually did some of the work too! But unfortunately they have to pay to see the paper describing their results. Ironic? Also, another published in Nature Medicine proclaims the “delay in sharing research data is costing lives” [1] (instant access only $32!)
  • From the British Medical Journal (BMJ) comes the worrying news of dodgy American laws that will lock up valuable scientific data behind paywalls [7] (please subscribe or pay below). Ironic? *
  • The “green” road to Open Access publishing involves authors uploading their manuscript to self-archive the data in some kind of  public repository. But there are many social, political and technical barriers to this, and they have been well documented [8]. You could find out about them in this paper [8], but it appears that the author hasn’t self-archived the paper or taken the “gold” road and pulished in an Open Access journal. Ironic?
  • Last, but not least, it would be interesting to know what commercial publishers make of all this text-mining magic in Science [9], but we would have to pay $24 to find out. Ironic?

These are just a small selection from amongst many. If you would like to nominate a paper for an Open Access Irony Award, simply post it to the group on Citeulike or group on Mendeley. Please feel free to start your own group elsewhere if you’re not on Citeulike or Mendeley. The name of this award probably originated from an idea Jonathan Eisen, picked up by Joe Dunckley and Matthew Cockerill at BioMed Central (see tweet below). So thanks to them for the inspiration.

For added ironic amusement, take a screenshot of the offending article and post it to the Flickr group. Sometimes the shame is too much, and articles are retrospectively made open access so a screenshot will preserve the irony.

Join us in poking fun at the crazy business of academic publishing, while making a serious point about the lack of Open Access to scientific data.

References

  1. Sommer, Josh (2010). The delay in sharing research data is costing lives Nature Medicine, 16 (7), 744-744 DOI: 10.1038/nm0710-744
  2. Boulton, G., Rawlins, M., Vallance, P., & Walport, M. (2011). Science as a public enterprise: the case for open data The Lancet, 377 (9778), 1633-1635 DOI: 10.1016/S0140-6736(11)60647-8
  3. Zerhouni, Elias (2004). Information Access: NIH Public Access Policy Science, 306 (5703), 1895-1895 DOI: 10.1126/science.1106929
  4. Hanson, B., Sugden, A., & Alberts, B. (2011). Making Data Maximally Available Science, 331 (6018), 649-649 DOI: 10.1126/science.1203354
  5. Kaiser, Jocelyn (2012). Profile of Stephen Friend at Sage Bionetworks: The Visionary Science, 335 (6069), 651-653 DOI: 10.1126/science.335.6069.651
  6. Cooper, S., Khatib, F., Treuille, A., Barbero, J., Lee, J., Beenen, M., Leaver-Fay, A., Baker, D., Popović, Z., & players, F. (2010). Predicting protein structures with a multiplayer online game Nature, 466 (7307), 756-760 DOI: 10.1038/nature09304
  7. Epstein, Keith (2012). Scientists are urged to oppose new US legislation that will put studies behind a pay wall BMJ, 344 (jan17 3) DOI: 10.1136/bmj.e452
  8. Kim, Jihyun (2010). Faculty self-archiving: Motivations and barriers Journal of the American Society for Information Science and Technology DOI: 10.1002/asi.21336
  9. Smit, Eefke, & Van Der Graaf, M. (2012). Journal article mining: the scholarly publishers’ perspective Learned Publishing, 25 (1), 35-46 DOI: 10.1087/20120106

[CC licensed picture “ask me about open access” by mollyali.]

* Please note, some research articles in BMJ are available by Open Access, but news articles like [7] are not. Thanks to Trish Groves at BMJ for bringing this to my attention after this blog post was published. Also, some “articles” here are in a grey area for open access, particularly “journalistic” stuff like news, editorials and correspondence, as pointed out by Becky Furlong. See tweets below…

June 22, 2010

Impact Factor Boxing 2010

Golden Gloves Prelim Bouts by Kate Gardiner[This post is part of an ongoing series about impact factors. See this post for the latest impact factors published in 2012.]

Roll up, roll up, ladies and gentlemen, Impact Factor Boxing is here again. As with last year (2009), the metrics used in this combat sport are already a year out of date. But this doesn’t stop many people from writing about impact factors and it’s been an interesting year [1] for the metrics used by many to judge the relative value of scientific work. The Public Library of Science (PLoS) launched their article level metrics within the last year following the example of BioMedCentral’s “most viewed” articles feature. Next to these new style metrics, the traditional impact factors live on, despite their limitations. Critics like Harold Varmus have recently pointed out that (quote):

“The impact factor is a completely flawed metric and it’s a source of a lot of unhappiness in the scientific community. Evaluating someone’s scientific productivity by looking at the number of papers they published in journals with impact factors over a certain level is poisonous to the system. A couple of folks are acting as gatekeepers to the distribution of information, and this is a very bad system. It really slows progress by keeping ideas and experiments out of the public domain until reviewers have been satisfied and authors are allowed to get their paper into the journal that they feel will advance their career.”

To be fair though, it’s not the metric that is flawed, more the way it is used (and abused) – a subject covered in much detail in a special issue of Nature at http://nature.com/metrics [2,3,4,5]. It’s much harder than it should be to get hold of these metrics, so I’ve reproduced some data below (fair use? I don’t know I am not a lawyer…) to minimise the considerable frustrations of using Journal Citation Reports (JCR).

Love them, loathe them, use them, abuse them, ignore them or obsess over them … here’s a small selection of the 7347 journals that are tracked in JCR  ordered by increasing impact.

Journal Title 2009 data from isiknowledge.com/JCR Eigenfactor™ Metrics
Total Cites Impact Factor 5-Year Impact Factor Immediacy Index Articles Cited Half-life Eigenfactor™  Score Article Influence™ Score
RSC Integrative Biology 34 0.596 57 0.00000
Communications of the ACM 13853 2.346 3.050 0.350 177 >10.0 0.01411 0.866
IEEE Intelligent Systems 2214 3.144 3.594 0.333 33 6.5 0.00447 0.763
Journal of Web Semantics 651 3.412 0.107 28 4.6 0.00222
BMC Bionformatics 10850 3.428 4.108 0.581 651 3.4 0.07335 1.516
Journal of Molecular Biology 69710 3.871 4.303 0.993 916 9.2 0.21679 2.051
Journal of Chemical Information and Modeling 8973 3.882 3.631 0.695 266 5.9 0.01943 0.772
Journal of the American Medical Informatics Association (JAMIA) 4183 3.974 5.199 0.705 105 5.7 0.01366 1.585
PLoS ONE 20466 4.351 4.383 0.582 4263 1.7 0.16373 1.918
OUP Bioinformatics 36932 4.926 6.271 0.733 677 5.2 0.16661 2.370
Biochemical Journal 50632 5.155 4.365 1.262 455 >10.0 0.10896 1.787
BMC Biology 1152 5.636 0.702 84 2.7 0.00997
PLoS Computational Biology 4674 5.759 6.429 0.786 365 2.5 0.04369 3.080
Genome Biology 12688 6.626 7.593 1.075 186 4.8 0.08005 3.586
Trends in Biotechnology 8118 6.909 8.588 1.407 81 6.4 0.02402 2.665
Briefings in Bioinformatics 2898 7.329 16.146 1.109 55 5.3 0.01928 5.887
Nucleic Acids Research 95799 7.479 7.279 1.635 1070 6.5 0.37108 2.963
PNAS 451386 9.432 10.312 1.805 3765 7.6 1.68111 4.857
PLoS Biology 15699 12.916 14.798 2.692 195 3.5 0.17630 8.623
Nature Biotechnology 31564 29.495 27.620 5.408 103 5.7 0.14503 11.803
Science 444643 29.747 31.052 6.531 897 8.8 1.52580 16.570
Cell 153972 31.152 32.628 6.825 359 8.7 0.70117 20.150
Nature 483039 34.480 32.906 8.209 866 8.9 1.74951 18.054
New England Journal of Medicine 216752 47.050 51.410 14.557 352 7.5 0.67401 19.870

Maybe next year Thomson Reuters, who publish this data, could start attaching large government health warnings (like on cigarette packets) and long disclaimers to this data? WARNING: Abusing these figures can seriously damage your Science – you have been warned!

References

  1. Rizkallah, J., & Sin, D. (2010). Integrative Approach to Quality Assessment of Medical Journals Using Impact Factor, Eigenfactor, and Article Influence Scores PLoS ONE, 5 (4) DOI: 10.1371/journal.pone.0010204
  2. Abbott, A., Cyranoski, D., Jones, N., Maher, B., Schiermeier, Q., & Van Noorden, R. (2010). Metrics: Do metrics matter? Nature, 465 (7300), 860-862 DOI: 10.1038/465860a
  3. Van Noorden, R. (2010). Metrics: A profusion of measures Nature, 465 (7300), 864-866 DOI: 10.1038/465864a
  4. Braun, T., Osterloh, M., West, J., Rohn, J., Pendlebury, D., Bergstrom, C., & Frey, B. (2010). How to improve the use of metrics Nature, 465 (7300), 870-872 DOI: 10.1038/465870a
  5. Lane, J. (2010). Let’s make science metrics more scientific Nature, 464 (7288), 488-489 DOI: 10.1038/464488a

[Creative Commons licensed picture of Golden Gloves Prelim Bouts by Kate Gardiner ]

September 18, 2009

Popular, personal and public data: Article-level metrics at PLoS

PLoS: The Public Library of ScienceThe Public Library of Science (PLoS) is a non-profit organisation committed to making the world’s scientific and medical literature freely accessible to everyone via open access publishing. As recently announced they have just published the first article-level metrics (e.g. web server logs and related information) for all articles in their library. This is novel, interesting and potentially useful data, not currently made publicly available by other publishers. Here is a  selection of some of the data, taken from the full dataset here (large file), which includes the “top ten” papers by viewing statistics.

Article level metrics for some papers published in PLoS (August 2009)

Rank* Article Journal Views Citations**
1 Why Most Published Research Findings Are False (including this one?) [1] PLoS Medicine 232847 52
2 Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration [2] PLoS Medicine 182305 15
3 Serotonin and Depression: A Disconnect between the Advertisements and the Scientific Literature [3] PLoS Medicine 105498 16
4 The Diploid Genome Sequence of an Individual Human [4] PLoS Biology 88271 54
5 Ultrasonic Songs of Male Mice [5] PLoS Biology 81331 8
6 Complete Primate Skeleton from the Middle Eocene of Messel in Germany: Morphology and Paleobiology [6] PLoS ONE 62449 0
7 The Impact Factor Game: It is time to find a better way to assess the scientific literature [7] PLoS Medicine 61353 13
8 A Map of Recent Positive Selection in the Human Genome [8] PLoS Biology 59512 94
9 Mapping the Structural Core of Human Cerebral Cortex [9] PLoS Biology 58151 8
10 Ten Simple Rules for Getting Published [10] PLoS Computational Biology 57312 1
11 Men, Women, and Ghosts in Science [11] PLoS Biology 56982 0
120 Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web [12] (w00t!) PLoS Computational Biology 16295 3
1500 Specificity and evolvability in eukaryotic protein interaction networks [13] PLoS Computational Biology 4270 7
1632 Comparative genomics and disorder prediction identify biologically relevant SH3 protein interactions [14] PLoS Computational Biology 4063 10
1755 Folding Very Short Peptides Using Molecular Dynamics [15] PLoS Computational Biology 3876 2
2535 Microblogging the ISMB: A New Approach to Conference Reporting [16] PLoS Computational Biology 3055 1
7521 Probing the Flexibility of Large Conformational Changes in Protein Structures through Local Perturbations [17] PLoS Computational Biology 1024 0
12549 Deciphering Proteomic Signatures of Early Diapause in Nasonia [18] PLoS ONE 0 0

*The rank is based on the 12,549 papers for which viewing data (combined usage of HTML + PDF + XML) are available.

**Citation counts are via PubMedCentral (data from CrossRef and Scopus is also provided, see Bora’s comments and commentary at Blue Lab Coats.)

Science is not a popularity contest but…

Analysing this data is not straightforward. Some highly-viewed articles are never cited (reviews, editorial, essays, opinion, etc). Likewise, popularity and importance are not the same thing. Some articles get lots of citations but few views, which suggests that people are not actually reading the papers them before citing them. As described on the PLoS website article-level-metrics.plos.org:

“When looking at Article-Level Metrics for the first time bear the following points in mind:

  • Online usage is dependent on the article type, the age of the article, and the subject area(s) it is in. Therefore you should be aware of these effects when considering the performance of any given article.
  • Older articles normally have higher usage than younger ones simply because the usage has had longer to accumulate. Articles typically have a peak in their usage in the first 3 months and usage then levels off after that.
  • Spikes of usage can be caused by media coverage, usage by large numbers of people, out of control download scripts or any number of other reasons. Without a detailed look at the raw usage logs it is often impossible to tell what the reason is and so we encourage you to regard usage data as indicative of trends, rather than as an absolute measure for any given article.
  • We currently have missing usage data for some of our articles, but we are working to fill the gaps. Primarily this affects those articles published before June 17th, 2005.
  • Newly published articles do not accumulate usage data instantaneously but require a day or two before data are shown.
  • Article citations as recorded by the Scopus database are sometimes undercounted because there are two records in the database for the same article. We’re working with Scopus to correct this issue.
  • All metrics will accrue over time (and some, such as citations, will take several years to accrue). Therefore, recent articles may not show many metrics (other than online usage, which accrues from day one). ”

So all the usual caveats apply when using this bibliometric data. Despite the limitations, it is more revealing than the useful (but simplistic) “highly accesssed” papers at BioMedCentral, which doesn’t always give full information on what “highly” actually means next to each published article. It will be interesting to see if other publishers now follow the lead of PLoS and BioMed Central and also publish their usage data combined with other bibliometric indicators such as blog coverage. For authors publishing with PLoS, this data has an added personal dimension too, it is handy to see how many views your paper has.

As paying customers of the services that commercial publishers provide, should scientists and their funders be demanding more of this kind of information in the future? I reckon they should. You have to wonder, why these kind of innovations have taken so long to happen, but they are a welcome addition.

[More commentary on this post over at friendfeed.]

References

  1. Ioannidis, J. (2005). Why Most Published Research Findings Are False PLoS Medicine, 2 (8) DOI: 10.1371/journal.pmed.0020124
  2. Kirsch, I., Deacon, B., Huedo-Medina, T., Scoboria, A., Moore, T., & Johnson, B. (2008). Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration PLoS Medicine, 5 (2) DOI: 10.1371/journal.pmed.0050045
  3. Lacasse, J., & Leo, J. (2005). Serotonin and Depression: A Disconnect between the Advertisements and the Scientific Literature PLoS Medicine, 2 (12) DOI: 10.1371/journal.pmed.0020392
  4. Levy, S., Sutton, G., Ng, P., Feuk, L., Halpern, A., Walenz, B., Axelrod, N., Huang, J., Kirkness, E., Denisov, G., Lin, Y., MacDonald, J., Pang, A., Shago, M., Stockwell, T., Tsiamouri, A., Bafna, V., Bansal, V., Kravitz, S., Busam, D., Beeson, K., McIntosh, T., Remington, K., Abril, J., Gill, J., Borman, J., Rogers, Y., Frazier, M., Scherer, S., Strausberg, R., & Venter, J. (2007). The Diploid Genome Sequence of an Individual Human PLoS Biology, 5 (10) DOI: 10.1371/journal.pbio.0050254
  5. Holy, T., & Guo, Z. (2005). Ultrasonic Songs of Male Mice PLoS Biology, 3 (12) DOI: 10.1371/journal.pbio.0030386
  6. Franzen, J., Gingerich, P., Habersetzer, J., Hurum, J., von Koenigswald, W., & Smith, B. (2009). Complete Primate Skeleton from the Middle Eocene of Messel in Germany: Morphology and Paleobiology PLoS ONE, 4 (5) DOI: 10.1371/journal.pone.0005723
  7. The PLoS Medicine Editors (2006). The Impact Factor Game PLoS Medicine, 3 (6) DOI: 10.1371/journal.pmed.0030291
  8. Voight, B., Kudaravalli, S., Wen, X., & Pritchard, J. (2006). A Map of Recent Positive Selection in the Human Genome PLoS Biology, 4 (3) DOI: 10.1371/journal.pbio.0040072
  9. Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C., Wedeen, V., & Sporns, O. (2008). Mapping the Structural Core of Human Cerebral Cortex PLoS Biology, 6 (7) DOI: 10.1371/journal.pbio.0060159
  10. Bourne, P. (2005). Ten Simple Rules for Getting Published PLoS Computational Biology, 1 (5) DOI: 10.1371/journal.pcbi.0010057
  11. Lawrence, P. (2006). Men, Women, and Ghosts in Science PLoS Biology, 4 (1) DOI: 10.1371/journal.pbio.0040019
  12. Hull, D., Pettifer, S., & Kell, D. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web PLoS Computational Biology, 4 (10) DOI: 10.1371/journal.pcbi.1000204
  13. Beltrao, P., & Serrano, L. (2007). Specificity and Evolvability in Eukaryotic Protein Interaction Networks PLoS Computational Biology, 3 (2) DOI: 10.1371/journal.pcbi.0030025
  14. Beltrao, P., & Serrano, L. (2005). Comparative Genomics and Disorder Prediction Identify Biologically Relevant SH3 Protein Interactions PLoS Computational Biology, 1 (3) DOI: 10.1371/journal.pcbi.0010026
  15. Ho, B., & Dill, K. (2006). Folding Very Short Peptides Using Molecular Dynamics PLoS Computational Biology, 2 (4) DOI: 10.1371/journal.pcbi.0020027
  16. Saunders, N., Beltrão, P., Jensen, L., Jurczak, D., Krause, R., Kuhn, M., & Wu, S. (2009). Microblogging the ISMB: A New Approach to Conference Reporting PLoS Computational Biology, 5 (1) DOI: 10.1371/journal.pcbi.1000263
  17. Ho, B., & Agard, D. (2009). Probing the Flexibility of Large Conformational Changes in Protein Structures through Local Perturbations PLoS Computational Biology, 5 (4) DOI: 10.1371/journal.pcbi.1000343
  18. Wolschin, F., & Gadau, J. (2009). Deciphering Proteomic Signatures of Early Diapause in Nasonia PLoS ONE, 4 (7) DOI: 10.1371/journal.pone.0006394

April 9, 2009

Upcoming Gig: The Scholarly Communication Landscape

The Scholarly Communication LandscapeDetails of an upcoming gig, The Scholarly Communication Landscape in Manchester on the 23rd of April 2009. If you are interested in coming, you need to register by Monday the 13th April at the official symposium pages.

Why? To help University staff and researchers understand some of the more complex issues embedded in the developments in digital scholarly communication, and to launch Manchester eScholar, the University of Manchester’s new Institutional Repository.

How? Information will be presented by invited speakers, and views and experience exchanged via plenary sessions.

Who For? University researchers (staff and students), research support staff, librarians, research managers, and anyone with an active interest in the field will find this symposium helpful to their developing use and provision of research digital formats. The programme for the symposium currently looks like this:

Welcome and Introduction by Jan Wilkinson, University Librarian and Director of The John Rylands Library.

Session I Chaired by Jan Wilkinson

  • Is the Knowledge Society a ‘social’ Network? Robin Hunt, CIBER, University College London
  • National Perspectives, Costs and Benefits Michael Jubb, Director, Research Information Network
  • The Economics of Scholarly Communication – how open access is changing the landscape Deborah Kahn, Acting Editorial Director Biology, BioMed Central

Session II Chaired by Dr Stella Butler

  • Information wants to be free. So … ? Dr David Booton, School of Law, University of Manchester
  • Putting Repositories in Their Place – the changing landscape of scholarly communication Bill Hubbard, SHERPA, University of Nottingham
  • The Year of Blogging Dangerously – lessons from the blogosphere, by Dr Duncan Hull (errr, thats me!), mib.ac.uk. This talk will describe how to build an institutional repository using free (or cheap) web-based and blogging tools including flickr.com, slideshare.net, citeulike.org, wordpress.com, myexperiment.org and friendfeed.com. We will discuss some strengths and limitations of these tools and what Institutional Repositories can learn from them.

Session III Chaired by Professor Simon Gaskell

Sumary and close by Professor Simon Gaskell, Vice-President for Research

Blog at WordPress.com.