O'Really?

July 31, 2015

Wikipedia Science Conference @WellcomeTrust in London, September 2nd & 3rd 2015 #wikisci

There is growing interest in Wikipedia, Wikidata, Commons, and other Wikimedia projects as platforms for opening up the scientific process [1]. The first Wikipedia Science Conference will discuss activities in this area at the Wellcome Collection Conference Centre in London on the 2nd & 3rd September 2015. There will be keynote talks from Wendy Hall (@DameWendyDBE) and Peter Murray-Rust (@petermurrayrust) and many other presentations including:

  • Daniel Mietchen (@EvoMRI), National Institutes of Health: wikipedia and scholarly communication
  • Alex Bateman (@AlexBateman1), European Bioinformatics Institute: Using wikipedia to annotate scientific databases
  • Geoffrey Bilder (@GBilder), CrossRef, Using DOIs in wikipedia
  • Richard Pinch (@IMAMaths), Institute of Mathematics and its Applications. Wikimedia versus academia: a clash of cultures
  • Andy Mabbett (@PigsOnTheWing), Royal Society of Chemistry / ORCID. Wikipedia, Wikidata and more – How Can Scientists Help?
  • Darren Logan (@DarrenLogan), Wellcome Trust Sanger Institute, Using scientific databases to annotate wikipedia
  • Dario Taraborelli (@ReaderMeter), Wikimedia & Altmetrics, Citing as a public service
  • … and many more

I’ll be doing a talk on “Improving the troubled relationship between Scientists and Wikipedia” (see slides below) with help from John Byrne who has been a Wikipedian in Residence at the Royal Society and Cancer Research UK.

How much does finding out more about all this wiki-goodness cost? An absolute bargain at just £29 for two days – what’s not to like? Tickets are available on eventbrite, register now, while tickets are still available. 

References

  1. Misha Teplitskiy, Grace Lu, & Eamon Duede (2015). Amplifying the Impact of Open Access: Wikipedia and the Diffusion of
    Science Wikipedia Workshop at 9th International Conference on Web and Social Media (ICWSM), Oxford, UK arXiv: 1506.07608v1

September 18, 2009

Popular, personal and public data: Article-level metrics at PLoS

PLoS: The Public Library of ScienceThe Public Library of Science (PLoS) is a non-profit organisation committed to making the world’s scientific and medical literature freely accessible to everyone via open access publishing. As recently announced they have just published the first article-level metrics (e.g. web server logs and related information) for all articles in their library. This is novel, interesting and potentially useful data, not currently made publicly available by other publishers. Here is a  selection of some of the data, taken from the full dataset here (large file), which includes the “top ten” papers by viewing statistics.

Article level metrics for some papers published in PLoS (August 2009)

Rank* Article Journal Views Citations**
1 Why Most Published Research Findings Are False (including this one?) [1] PLoS Medicine 232847 52
2 Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration [2] PLoS Medicine 182305 15
3 Serotonin and Depression: A Disconnect between the Advertisements and the Scientific Literature [3] PLoS Medicine 105498 16
4 The Diploid Genome Sequence of an Individual Human [4] PLoS Biology 88271 54
5 Ultrasonic Songs of Male Mice [5] PLoS Biology 81331 8
6 Complete Primate Skeleton from the Middle Eocene of Messel in Germany: Morphology and Paleobiology [6] PLoS ONE 62449 0
7 The Impact Factor Game: It is time to find a better way to assess the scientific literature [7] PLoS Medicine 61353 13
8 A Map of Recent Positive Selection in the Human Genome [8] PLoS Biology 59512 94
9 Mapping the Structural Core of Human Cerebral Cortex [9] PLoS Biology 58151 8
10 Ten Simple Rules for Getting Published [10] PLoS Computational Biology 57312 1
11 Men, Women, and Ghosts in Science [11] PLoS Biology 56982 0
120 Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web [12] (w00t!) PLoS Computational Biology 16295 3
1500 Specificity and evolvability in eukaryotic protein interaction networks [13] PLoS Computational Biology 4270 7
1632 Comparative genomics and disorder prediction identify biologically relevant SH3 protein interactions [14] PLoS Computational Biology 4063 10
1755 Folding Very Short Peptides Using Molecular Dynamics [15] PLoS Computational Biology 3876 2
2535 Microblogging the ISMB: A New Approach to Conference Reporting [16] PLoS Computational Biology 3055 1
7521 Probing the Flexibility of Large Conformational Changes in Protein Structures through Local Perturbations [17] PLoS Computational Biology 1024 0
12549 Deciphering Proteomic Signatures of Early Diapause in Nasonia [18] PLoS ONE 0 0

*The rank is based on the 12,549 papers for which viewing data (combined usage of HTML + PDF + XML) are available.

**Citation counts are via PubMedCentral (data from CrossRef and Scopus is also provided, see Bora’s comments and commentary at Blue Lab Coats.)

Science is not a popularity contest but…

Analysing this data is not straightforward. Some highly-viewed articles are never cited (reviews, editorial, essays, opinion, etc). Likewise, popularity and importance are not the same thing. Some articles get lots of citations but few views, which suggests that people are not actually reading the papers them before citing them. As described on the PLoS website article-level-metrics.plos.org:

“When looking at Article-Level Metrics for the first time bear the following points in mind:

  • Online usage is dependent on the article type, the age of the article, and the subject area(s) it is in. Therefore you should be aware of these effects when considering the performance of any given article.
  • Older articles normally have higher usage than younger ones simply because the usage has had longer to accumulate. Articles typically have a peak in their usage in the first 3 months and usage then levels off after that.
  • Spikes of usage can be caused by media coverage, usage by large numbers of people, out of control download scripts or any number of other reasons. Without a detailed look at the raw usage logs it is often impossible to tell what the reason is and so we encourage you to regard usage data as indicative of trends, rather than as an absolute measure for any given article.
  • We currently have missing usage data for some of our articles, but we are working to fill the gaps. Primarily this affects those articles published before June 17th, 2005.
  • Newly published articles do not accumulate usage data instantaneously but require a day or two before data are shown.
  • Article citations as recorded by the Scopus database are sometimes undercounted because there are two records in the database for the same article. We’re working with Scopus to correct this issue.
  • All metrics will accrue over time (and some, such as citations, will take several years to accrue). Therefore, recent articles may not show many metrics (other than online usage, which accrues from day one). ”

So all the usual caveats apply when using this bibliometric data. Despite the limitations, it is more revealing than the useful (but simplistic) “highly accesssed” papers at BioMedCentral, which doesn’t always give full information on what “highly” actually means next to each published article. It will be interesting to see if other publishers now follow the lead of PLoS and BioMed Central and also publish their usage data combined with other bibliometric indicators such as blog coverage. For authors publishing with PLoS, this data has an added personal dimension too, it is handy to see how many views your paper has.

As paying customers of the services that commercial publishers provide, should scientists and their funders be demanding more of this kind of information in the future? I reckon they should. You have to wonder, why these kind of innovations have taken so long to happen, but they are a welcome addition.

[More commentary on this post over at friendfeed.]

References

  1. Ioannidis, J. (2005). Why Most Published Research Findings Are False PLoS Medicine, 2 (8) DOI: 10.1371/journal.pmed.0020124
  2. Kirsch, I., Deacon, B., Huedo-Medina, T., Scoboria, A., Moore, T., & Johnson, B. (2008). Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration PLoS Medicine, 5 (2) DOI: 10.1371/journal.pmed.0050045
  3. Lacasse, J., & Leo, J. (2005). Serotonin and Depression: A Disconnect between the Advertisements and the Scientific Literature PLoS Medicine, 2 (12) DOI: 10.1371/journal.pmed.0020392
  4. Levy, S., Sutton, G., Ng, P., Feuk, L., Halpern, A., Walenz, B., Axelrod, N., Huang, J., Kirkness, E., Denisov, G., Lin, Y., MacDonald, J., Pang, A., Shago, M., Stockwell, T., Tsiamouri, A., Bafna, V., Bansal, V., Kravitz, S., Busam, D., Beeson, K., McIntosh, T., Remington, K., Abril, J., Gill, J., Borman, J., Rogers, Y., Frazier, M., Scherer, S., Strausberg, R., & Venter, J. (2007). The Diploid Genome Sequence of an Individual Human PLoS Biology, 5 (10) DOI: 10.1371/journal.pbio.0050254
  5. Holy, T., & Guo, Z. (2005). Ultrasonic Songs of Male Mice PLoS Biology, 3 (12) DOI: 10.1371/journal.pbio.0030386
  6. Franzen, J., Gingerich, P., Habersetzer, J., Hurum, J., von Koenigswald, W., & Smith, B. (2009). Complete Primate Skeleton from the Middle Eocene of Messel in Germany: Morphology and Paleobiology PLoS ONE, 4 (5) DOI: 10.1371/journal.pone.0005723
  7. The PLoS Medicine Editors (2006). The Impact Factor Game PLoS Medicine, 3 (6) DOI: 10.1371/journal.pmed.0030291
  8. Voight, B., Kudaravalli, S., Wen, X., & Pritchard, J. (2006). A Map of Recent Positive Selection in the Human Genome PLoS Biology, 4 (3) DOI: 10.1371/journal.pbio.0040072
  9. Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C., Wedeen, V., & Sporns, O. (2008). Mapping the Structural Core of Human Cerebral Cortex PLoS Biology, 6 (7) DOI: 10.1371/journal.pbio.0060159
  10. Bourne, P. (2005). Ten Simple Rules for Getting Published PLoS Computational Biology, 1 (5) DOI: 10.1371/journal.pcbi.0010057
  11. Lawrence, P. (2006). Men, Women, and Ghosts in Science PLoS Biology, 4 (1) DOI: 10.1371/journal.pbio.0040019
  12. Hull, D., Pettifer, S., & Kell, D. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web PLoS Computational Biology, 4 (10) DOI: 10.1371/journal.pcbi.1000204
  13. Beltrao, P., & Serrano, L. (2007). Specificity and Evolvability in Eukaryotic Protein Interaction Networks PLoS Computational Biology, 3 (2) DOI: 10.1371/journal.pcbi.0030025
  14. Beltrao, P., & Serrano, L. (2005). Comparative Genomics and Disorder Prediction Identify Biologically Relevant SH3 Protein Interactions PLoS Computational Biology, 1 (3) DOI: 10.1371/journal.pcbi.0010026
  15. Ho, B., & Dill, K. (2006). Folding Very Short Peptides Using Molecular Dynamics PLoS Computational Biology, 2 (4) DOI: 10.1371/journal.pcbi.0020027
  16. Saunders, N., Beltrão, P., Jensen, L., Jurczak, D., Krause, R., Kuhn, M., & Wu, S. (2009). Microblogging the ISMB: A New Approach to Conference Reporting PLoS Computational Biology, 5 (1) DOI: 10.1371/journal.pcbi.1000263
  17. Ho, B., & Agard, D. (2009). Probing the Flexibility of Large Conformational Changes in Protein Structures through Local Perturbations PLoS Computational Biology, 5 (4) DOI: 10.1371/journal.pcbi.1000343
  18. Wolschin, F., & Gadau, J. (2009). Deciphering Proteomic Signatures of Early Diapause in Nasonia PLoS ONE, 4 (7) DOI: 10.1371/journal.pone.0006394

June 2, 2009

Who Are You? Digital Identity in Science

The Who by The WhoThe organisers of the Science Online London 2009 conference are asking people to propose their own session ideas (see some examples here), so here is a proposal:

Title: Who Are You? Digital Identity in Science

Many important decisions in Science are based on identifying scientists and their contributions. From selecting reviewers for grants and publications, to attributing published data and deciding who is funded, hired or promoted, digital identity is at the heart of Science on the Web.

Despite the importance of digital identity, identifying scientists online is an unsolved problem [1]. Consequently, a significant amount of scientific and scholarly work is not easily cited or credited, especially digital contributions: from blogs and wikis, to source code, databases and traditional peer-reviewed publications on the Web. This (proposed) session will look at current mechanisms for identifying scientists digitally including contributor-id (CrossRef), researcher-id (Thomson), Scopus Author ID (Elsevier), OpenID, Google Scholar [2], Single Sign On, PubMed, Google Scholar [2], FOAF+SSL, LinkedIn, Shared Identifiers (URIs) and the rest. We will introduce and discuss each via a SWOT analysis (Strengths, Weaknesses, Opportunities and Threats). Is digital identity even possible and ethical? Beside the obvious benefits of persistent, reliable and unique identifiers, what are the privacy and security issues with personal digital identity?

If this is a successful proposal, I’ll need some help. Any offers? If you are interested in joining in the fun, more details are at scienceonlinelondon.org

References

  1. Bourne, P., & Fink, J. (2008). I Am Not a Scientist, I Am a Number PLoS Computational Biology, 4 (12) DOI: 10.1371/journal.pcbi.1000247
  2. Various Publications about unique author identifiers bookmarked in citeulike
  3. Yours Truly (2009) Google thinks I’m Maurice Wilkins
  4. The Who (1978) Who Are You? Who, who, who, who? (Thanks to Jan Aerts for the reference!)

Blog at WordPress.com.