The Public Library of Science (PLoS) is a non-profit organisation committed to making the world’s scientific and medical literature freely accessible to everyone via open access publishing. As recently announced they have just published the first article-level metrics (e.g. web server logs and related information) for all articles in their library. This is novel, interesting and potentially useful data, not currently made publicly available by other publishers. Here is a selection of some of the data, taken from the full dataset here (large file), which includes the “top ten” papers by viewing statistics.
Article level metrics for some papers published in PLoS (August 2009)
*The rank is based on the 12,549 papers for which viewing data (combined usage of HTML + PDF + XML) are available.
**Citation counts are via PubMedCentral (data from CrossRef and Scopus is also provided, see Bora’s comments and commentary at Blue Lab Coats.)
Science is not a popularity contest but…
Analysing this data is not straightforward. Some highly-viewed articles are never cited (reviews, editorial, essays, opinion, etc). Likewise, popularity and importance are not the same thing. Some articles get lots of citations but few views, which suggests that people are not actually reading the papers them before citing them. As described on the PLoS website article-level-metrics.plos.org:
“When looking at Article-Level Metrics for the first time bear the following points in mind:
- Online usage is dependent on the article type, the age of the article, and the subject area(s) it is in. Therefore you should be aware of these effects when considering the performance of any given article.
- Older articles normally have higher usage than younger ones simply because the usage has had longer to accumulate. Articles typically have a peak in their usage in the first 3 months and usage then levels off after that.
- Spikes of usage can be caused by media coverage, usage by large numbers of people, out of control download scripts or any number of other reasons. Without a detailed look at the raw usage logs it is often impossible to tell what the reason is and so we encourage you to regard usage data as indicative of trends, rather than as an absolute measure for any given article.
- We currently have missing usage data for some of our articles, but we are working to fill the gaps. Primarily this affects those articles published before June 17th, 2005.
- Newly published articles do not accumulate usage data instantaneously but require a day or two before data are shown.
- Article citations as recorded by the Scopus database are sometimes undercounted because there are two records in the database for the same article. We’re working with Scopus to correct this issue.
- All metrics will accrue over time (and some, such as citations, will take several years to accrue). Therefore, recent articles may not show many metrics (other than online usage, which accrues from day one). ”
So all the usual caveats apply when using this bibliometric data. Despite the limitations, it is more revealing than the useful (but simplistic) “highly accesssed” papers at BioMedCentral, which doesn’t always give full information on what “highly” actually means next to each published article. It will be interesting to see if other publishers now follow the lead of PLoS and BioMed Central and also publish their usage data combined with other bibliometric indicators such as blog coverage. For authors publishing with PLoS, this data has an added personal dimension too, it is handy to see how many views your paper has.
As paying customers of the services that commercial publishers provide, should scientists and their funders be demanding more of this kind of information in the future? I reckon they should. You have to wonder, why these kind of innovations have taken so long to happen, but they are a welcome addition.
[More commentary on this post over at friendfeed.]
References
- Ioannidis, J. (2005). Why Most Published Research Findings Are False PLoS Medicine, 2 (8) DOI: 10.1371/journal.pmed.0020124
- Kirsch, I., Deacon, B., Huedo-Medina, T., Scoboria, A., Moore, T., & Johnson, B. (2008). Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration PLoS Medicine, 5 (2) DOI: 10.1371/journal.pmed.0050045
- Lacasse, J., & Leo, J. (2005). Serotonin and Depression: A Disconnect between the Advertisements and the Scientific Literature PLoS Medicine, 2 (12) DOI: 10.1371/journal.pmed.0020392
- Levy, S., Sutton, G., Ng, P., Feuk, L., Halpern, A., Walenz, B., Axelrod, N., Huang, J., Kirkness, E., Denisov, G., Lin, Y., MacDonald, J., Pang, A., Shago, M., Stockwell, T., Tsiamouri, A., Bafna, V., Bansal, V., Kravitz, S., Busam, D., Beeson, K., McIntosh, T., Remington, K., Abril, J., Gill, J., Borman, J., Rogers, Y., Frazier, M., Scherer, S., Strausberg, R., & Venter, J. (2007). The Diploid Genome Sequence of an Individual Human PLoS Biology, 5 (10) DOI: 10.1371/journal.pbio.0050254
- Holy, T., & Guo, Z. (2005). Ultrasonic Songs of Male Mice PLoS Biology, 3 (12) DOI: 10.1371/journal.pbio.0030386
- Franzen, J., Gingerich, P., Habersetzer, J., Hurum, J., von Koenigswald, W., & Smith, B. (2009). Complete Primate Skeleton from the Middle Eocene of Messel in Germany: Morphology and Paleobiology PLoS ONE, 4 (5) DOI: 10.1371/journal.pone.0005723
- The PLoS Medicine Editors (2006). The Impact Factor Game PLoS Medicine, 3 (6) DOI: 10.1371/journal.pmed.0030291
- Voight, B., Kudaravalli, S., Wen, X., & Pritchard, J. (2006). A Map of Recent Positive Selection in the Human Genome PLoS Biology, 4 (3) DOI: 10.1371/journal.pbio.0040072
- Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C., Wedeen, V., & Sporns, O. (2008). Mapping the Structural Core of Human Cerebral Cortex PLoS Biology, 6 (7) DOI: 10.1371/journal.pbio.0060159
- Bourne, P. (2005). Ten Simple Rules for Getting Published PLoS Computational Biology, 1 (5) DOI: 10.1371/journal.pcbi.0010057
- Lawrence, P. (2006). Men, Women, and Ghosts in Science PLoS Biology, 4 (1) DOI: 10.1371/journal.pbio.0040019
- Hull, D., Pettifer, S., & Kell, D. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web PLoS Computational Biology, 4 (10) DOI: 10.1371/journal.pcbi.1000204
- Beltrao, P., & Serrano, L. (2007). Specificity and Evolvability in Eukaryotic Protein Interaction Networks PLoS Computational Biology, 3 (2) DOI: 10.1371/journal.pcbi.0030025
- Beltrao, P., & Serrano, L. (2005). Comparative Genomics and Disorder Prediction Identify Biologically Relevant SH3 Protein Interactions PLoS Computational Biology, 1 (3) DOI: 10.1371/journal.pcbi.0010026
- Ho, B., & Dill, K. (2006). Folding Very Short Peptides Using Molecular Dynamics PLoS Computational Biology, 2 (4) DOI: 10.1371/journal.pcbi.0020027
- Saunders, N., Beltrão, P., Jensen, L., Jurczak, D., Krause, R., Kuhn, M., & Wu, S. (2009). Microblogging the ISMB: A New Approach to Conference Reporting PLoS Computational Biology, 5 (1) DOI: 10.1371/journal.pcbi.1000263
- Ho, B., & Agard, D. (2009). Probing the Flexibility of Large Conformational Changes in Protein Structures through Local Perturbations PLoS Computational Biology, 5 (4) DOI: 10.1371/journal.pcbi.1000343
- Wolschin, F., & Gadau, J. (2009). Deciphering Proteomic Signatures of Early Diapause in Nasonia PLoS ONE, 4 (7) DOI: 10.1371/journal.pone.0006394