O'Really?

September 18, 2009

Popular, personal and public data: Article-level metrics at PLoS

PLoS: The Public Library of ScienceThe Public Library of Science (PLoS) is a non-profit organisation committed to making the world’s scientific and medical literature freely accessible to everyone via open access publishing. As recently announced they have just published the first article-level metrics (e.g. web server logs and related information) for all articles in their library. This is novel, interesting and potentially useful data, not currently made publicly available by other publishers. Here is a  selection of some of the data, taken from the full dataset here (large file), which includes the “top ten” papers by viewing statistics.

Article level metrics for some papers published in PLoS (August 2009)

Rank* Article Journal Views Citations**
1 Why Most Published Research Findings Are False (including this one?) [1] PLoS Medicine 232847 52
2 Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration [2] PLoS Medicine 182305 15
3 Serotonin and Depression: A Disconnect between the Advertisements and the Scientific Literature [3] PLoS Medicine 105498 16
4 The Diploid Genome Sequence of an Individual Human [4] PLoS Biology 88271 54
5 Ultrasonic Songs of Male Mice [5] PLoS Biology 81331 8
6 Complete Primate Skeleton from the Middle Eocene of Messel in Germany: Morphology and Paleobiology [6] PLoS ONE 62449 0
7 The Impact Factor Game: It is time to find a better way to assess the scientific literature [7] PLoS Medicine 61353 13
8 A Map of Recent Positive Selection in the Human Genome [8] PLoS Biology 59512 94
9 Mapping the Structural Core of Human Cerebral Cortex [9] PLoS Biology 58151 8
10 Ten Simple Rules for Getting Published [10] PLoS Computational Biology 57312 1
11 Men, Women, and Ghosts in Science [11] PLoS Biology 56982 0
120 Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web [12] (w00t!) PLoS Computational Biology 16295 3
1500 Specificity and evolvability in eukaryotic protein interaction networks [13] PLoS Computational Biology 4270 7
1632 Comparative genomics and disorder prediction identify biologically relevant SH3 protein interactions [14] PLoS Computational Biology 4063 10
1755 Folding Very Short Peptides Using Molecular Dynamics [15] PLoS Computational Biology 3876 2
2535 Microblogging the ISMB: A New Approach to Conference Reporting [16] PLoS Computational Biology 3055 1
7521 Probing the Flexibility of Large Conformational Changes in Protein Structures through Local Perturbations [17] PLoS Computational Biology 1024 0
12549 Deciphering Proteomic Signatures of Early Diapause in Nasonia [18] PLoS ONE 0 0

*The rank is based on the 12,549 papers for which viewing data (combined usage of HTML + PDF + XML) are available.

**Citation counts are via PubMedCentral (data from CrossRef and Scopus is also provided, see Bora’s comments and commentary at Blue Lab Coats.)

Science is not a popularity contest but…

Analysing this data is not straightforward. Some highly-viewed articles are never cited (reviews, editorial, essays, opinion, etc). Likewise, popularity and importance are not the same thing. Some articles get lots of citations but few views, which suggests that people are not actually reading the papers them before citing them. As described on the PLoS website article-level-metrics.plos.org:

“When looking at Article-Level Metrics for the first time bear the following points in mind:

  • Online usage is dependent on the article type, the age of the article, and the subject area(s) it is in. Therefore you should be aware of these effects when considering the performance of any given article.
  • Older articles normally have higher usage than younger ones simply because the usage has had longer to accumulate. Articles typically have a peak in their usage in the first 3 months and usage then levels off after that.
  • Spikes of usage can be caused by media coverage, usage by large numbers of people, out of control download scripts or any number of other reasons. Without a detailed look at the raw usage logs it is often impossible to tell what the reason is and so we encourage you to regard usage data as indicative of trends, rather than as an absolute measure for any given article.
  • We currently have missing usage data for some of our articles, but we are working to fill the gaps. Primarily this affects those articles published before June 17th, 2005.
  • Newly published articles do not accumulate usage data instantaneously but require a day or two before data are shown.
  • Article citations as recorded by the Scopus database are sometimes undercounted because there are two records in the database for the same article. We’re working with Scopus to correct this issue.
  • All metrics will accrue over time (and some, such as citations, will take several years to accrue). Therefore, recent articles may not show many metrics (other than online usage, which accrues from day one). ”

So all the usual caveats apply when using this bibliometric data. Despite the limitations, it is more revealing than the useful (but simplistic) “highly accesssed” papers at BioMedCentral, which doesn’t always give full information on what “highly” actually means next to each published article. It will be interesting to see if other publishers now follow the lead of PLoS and BioMed Central and also publish their usage data combined with other bibliometric indicators such as blog coverage. For authors publishing with PLoS, this data has an added personal dimension too, it is handy to see how many views your paper has.

As paying customers of the services that commercial publishers provide, should scientists and their funders be demanding more of this kind of information in the future? I reckon they should. You have to wonder, why these kind of innovations have taken so long to happen, but they are a welcome addition.

[More commentary on this post over at friendfeed.]

References

  1. Ioannidis, J. (2005). Why Most Published Research Findings Are False PLoS Medicine, 2 (8) DOI: 10.1371/journal.pmed.0020124
  2. Kirsch, I., Deacon, B., Huedo-Medina, T., Scoboria, A., Moore, T., & Johnson, B. (2008). Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration PLoS Medicine, 5 (2) DOI: 10.1371/journal.pmed.0050045
  3. Lacasse, J., & Leo, J. (2005). Serotonin and Depression: A Disconnect between the Advertisements and the Scientific Literature PLoS Medicine, 2 (12) DOI: 10.1371/journal.pmed.0020392
  4. Levy, S., Sutton, G., Ng, P., Feuk, L., Halpern, A., Walenz, B., Axelrod, N., Huang, J., Kirkness, E., Denisov, G., Lin, Y., MacDonald, J., Pang, A., Shago, M., Stockwell, T., Tsiamouri, A., Bafna, V., Bansal, V., Kravitz, S., Busam, D., Beeson, K., McIntosh, T., Remington, K., Abril, J., Gill, J., Borman, J., Rogers, Y., Frazier, M., Scherer, S., Strausberg, R., & Venter, J. (2007). The Diploid Genome Sequence of an Individual Human PLoS Biology, 5 (10) DOI: 10.1371/journal.pbio.0050254
  5. Holy, T., & Guo, Z. (2005). Ultrasonic Songs of Male Mice PLoS Biology, 3 (12) DOI: 10.1371/journal.pbio.0030386
  6. Franzen, J., Gingerich, P., Habersetzer, J., Hurum, J., von Koenigswald, W., & Smith, B. (2009). Complete Primate Skeleton from the Middle Eocene of Messel in Germany: Morphology and Paleobiology PLoS ONE, 4 (5) DOI: 10.1371/journal.pone.0005723
  7. The PLoS Medicine Editors (2006). The Impact Factor Game PLoS Medicine, 3 (6) DOI: 10.1371/journal.pmed.0030291
  8. Voight, B., Kudaravalli, S., Wen, X., & Pritchard, J. (2006). A Map of Recent Positive Selection in the Human Genome PLoS Biology, 4 (3) DOI: 10.1371/journal.pbio.0040072
  9. Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C., Wedeen, V., & Sporns, O. (2008). Mapping the Structural Core of Human Cerebral Cortex PLoS Biology, 6 (7) DOI: 10.1371/journal.pbio.0060159
  10. Bourne, P. (2005). Ten Simple Rules for Getting Published PLoS Computational Biology, 1 (5) DOI: 10.1371/journal.pcbi.0010057
  11. Lawrence, P. (2006). Men, Women, and Ghosts in Science PLoS Biology, 4 (1) DOI: 10.1371/journal.pbio.0040019
  12. Hull, D., Pettifer, S., & Kell, D. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web PLoS Computational Biology, 4 (10) DOI: 10.1371/journal.pcbi.1000204
  13. Beltrao, P., & Serrano, L. (2007). Specificity and Evolvability in Eukaryotic Protein Interaction Networks PLoS Computational Biology, 3 (2) DOI: 10.1371/journal.pcbi.0030025
  14. Beltrao, P., & Serrano, L. (2005). Comparative Genomics and Disorder Prediction Identify Biologically Relevant SH3 Protein Interactions PLoS Computational Biology, 1 (3) DOI: 10.1371/journal.pcbi.0010026
  15. Ho, B., & Dill, K. (2006). Folding Very Short Peptides Using Molecular Dynamics PLoS Computational Biology, 2 (4) DOI: 10.1371/journal.pcbi.0020027
  16. Saunders, N., Beltrão, P., Jensen, L., Jurczak, D., Krause, R., Kuhn, M., & Wu, S. (2009). Microblogging the ISMB: A New Approach to Conference Reporting PLoS Computational Biology, 5 (1) DOI: 10.1371/journal.pcbi.1000263
  17. Ho, B., & Agard, D. (2009). Probing the Flexibility of Large Conformational Changes in Protein Structures through Local Perturbations PLoS Computational Biology, 5 (4) DOI: 10.1371/journal.pcbi.1000343
  18. Wolschin, F., & Gadau, J. (2009). Deciphering Proteomic Signatures of Early Diapause in Nasonia PLoS ONE, 4 (7) DOI: 10.1371/journal.pone.0006394

1 Comment »

  1. In asking researchers if they thought there was a benefit to getting reprinted or republished most of them felt that getting into an accredited peer reviewed was the main requirement and republishing or the “eyeballs” their work got wasn’t on the radar as much simply because of the status quo. Said simply getting wider exposure wasn’t as interesting as being seen by the right people. To me both need to occur and statistics and readership will just gain in importance moving forward.

    Comment by LabGrab — November 17, 2009 @ 1:13 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.