April 1, 2014

The Serene Scientists Serenity Prayer via Jon Butterworth

banksy church

The Church of Banksy

Whatever your religous preferences, the Serenity Prayer by Reinhold Niebuhr captures a certain wisdom about life in general. So it is good to see that physicist Jon Butterworth at UCL has adapted it [1] for scientists:

“Give me grace to accept with serenity the things that cannot be understood,

Data to investigate the things which can be understood,

And the Wisdom to know the difference.”



  1. Jon Butterworth (2014) Giles Fraser says scientists are replacing theologians. Some thoughts on that The Gruaniad, 2014-03-31

August 3, 2012

April 2, 2012

Open Data Manchester: Twenty Four Hour Data People

Sean Ryder at the Hacienda by Tangerine Dream on flickr

Sean Ryder, the original twenty-four hour Manchester party person of the Happy Mondays, spins the discs at the Wickerman festival in 2008. Creative commons licensed image via Tangerine Dream on flickr.com

According to Francis Maude, Open Data is the raw material for “next industrial revolution”. Now you should obviously take everything politicians say with a large pinch of salt (especially Maude) but despite the political hyperbole, when it comes to data he is onto something.

According to wikipedia, which is considerably more reliable than politicians, Open Data is:

“the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.”

Open Data is slowly having an impact in the world of science [1] and also in wider society. Initiatives like data.gov in the U.S. and data.gov.uk in England, also known as e-government or government 2.0, have put huge amounts of data in the public domain and there is plenty more data in the pipeline. All of this data makes novel applications possible, like cycling injury maps showing accident black spots, and many others just like it.

To discuss the current status of Open Data in Greater Manchester there were two events last week:

  1. The Open Data Manchester meetup “24 hour data people” [2] at the the Manchester Digital Laboratory (“madlab”), which recently made BBC headlines with the DIY bio project
  2. The Discover Open Data event at the Cornerhouse cinema
Here is a brief and incomplete summary of what went on at these events:


February 15, 2012

The Open Access Irony Awards: Naming and shaming them

Ask me about open access by mollyaliOpen Access (OA) publishing aims to make the results of scientific research available to the widest possible audience. Scientific papers that are published in Open Access journals are freely available for crucial data mining and for anyone or anything to read, wherever they may be.

In the last ten years, the Open Access movement has made huge progress in allowing:

“any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers.”

But there is still a long way to go yet, as much of the world’s scientific knowledge remains locked up behind publisher’s paywalls, unavailable for re-use by text-mining software and inaccessible to the public, who often funded the research through taxation.

Openly ironic?

ironicIronically, some of the papers that are inaccessible discuss or even champion the very Open Access movement itself. Sometimes the lack of access is deliberate, other times accidental – but the consequences are serious. Whether deliberate or accidental, restricted access to public scientific knowledge is slowing scientific progress [1]. Sometimes the best way to make a serious point is to have a laugh and joke about it. This is what the Open Access Irony Awards do, by gathering all the offenders in one place, we can laugh and make a serious point at the same time by naming and shaming the papers in question.

To get the ball rolling, here is are some examples:

  • The Lancet owned by Evilseviersorry I mean Elsevier, recently  published a paper on “the case for open data” [2] (please login to access article). Login?! Not very open…
  • Serial offender and über-journal Science has an article by Elias Zerhouni on the NIH public access policy [3] (Subscribe/Join AAAS to View Full Text), another on “making data maximally available” [4] (Subscribe/Join AAAS to View Full Text) and another on a high profile advocate of open science [5] (Buy Access to This Article to View Full Text) Irony of ironies.
  • From Nature Publishing Group comes a fascinating paper about harnessing the wisdom of the crowds to predict protein structures [6]. Not only have members of the tax-paying public funded this work, they actually did some of the work too! But unfortunately they have to pay to see the paper describing their results. Ironic? Also, another published in Nature Medicine proclaims the “delay in sharing research data is costing lives” [1] (instant access only $32!)
  • From the British Medical Journal (BMJ) comes the worrying news of dodgy American laws that will lock up valuable scientific data behind paywalls [7] (please subscribe or pay below). Ironic? *
  • The “green” road to Open Access publishing involves authors uploading their manuscript to self-archive the data in some kind of  public repository. But there are many social, political and technical barriers to this, and they have been well documented [8]. You could find out about them in this paper [8], but it appears that the author hasn’t self-archived the paper or taken the “gold” road and pulished in an Open Access journal. Ironic?
  • Last, but not least, it would be interesting to know what commercial publishers make of all this text-mining magic in Science [9], but we would have to pay $24 to find out. Ironic?

These are just a small selection from amongst many. If you would like to nominate a paper for an Open Access Irony Award, simply post it to the group on Citeulike or group on Mendeley. Please feel free to start your own group elsewhere if you’re not on Citeulike or Mendeley. The name of this award probably originated from an idea Jonathan Eisen, picked up by Joe Dunckley and Matthew Cockerill at BioMed Central (see tweet below). So thanks to them for the inspiration.

For added ironic amusement, take a screenshot of the offending article and post it to the Flickr group. Sometimes the shame is too much, and articles are retrospectively made open access so a screenshot will preserve the irony.

Join us in poking fun at the crazy business of academic publishing, while making a serious point about the lack of Open Access to scientific data.


  1. Sommer, Josh (2010). The delay in sharing research data is costing lives Nature Medicine, 16 (7), 744-744 DOI: 10.1038/nm0710-744
  2. Boulton, G., Rawlins, M., Vallance, P., & Walport, M. (2011). Science as a public enterprise: the case for open data The Lancet, 377 (9778), 1633-1635 DOI: 10.1016/S0140-6736(11)60647-8
  3. Zerhouni, Elias (2004). Information Access: NIH Public Access Policy Science, 306 (5703), 1895-1895 DOI: 10.1126/science.1106929
  4. Hanson, B., Sugden, A., & Alberts, B. (2011). Making Data Maximally Available Science, 331 (6018), 649-649 DOI: 10.1126/science.1203354
  5. Kaiser, Jocelyn (2012). Profile of Stephen Friend at Sage Bionetworks: The Visionary Science, 335 (6069), 651-653 DOI: 10.1126/science.335.6069.651
  6. Cooper, S., Khatib, F., Treuille, A., Barbero, J., Lee, J., Beenen, M., Leaver-Fay, A., Baker, D., Popović, Z., & players, F. (2010). Predicting protein structures with a multiplayer online game Nature, 466 (7307), 756-760 DOI: 10.1038/nature09304
  7. Epstein, Keith (2012). Scientists are urged to oppose new US legislation that will put studies behind a pay wall BMJ, 344 (jan17 3) DOI: 10.1136/bmj.e452
  8. Kim, Jihyun (2010). Faculty self-archiving: Motivations and barriers Journal of the American Society for Information Science and Technology DOI: 10.1002/asi.21336
  9. Smit, Eefke, & Van Der Graaf, M. (2012). Journal article mining: the scholarly publishers’ perspective Learned Publishing, 25 (1), 35-46 DOI: 10.1087/20120106

[CC licensed picture "ask me about open access" by mollyali.]

* Please note, some research articles in BMJ are available by Open Access, but news articles like [7] are not. Thanks to Trish Groves at BMJ for bringing this to my attention after this blog post was published. Also, some “articles” here are in a grey area for open access, particularly “journalistic” stuff like news, editorials and correspondence, as pointed out by Becky Furlong. See tweets below…

December 17, 2010

Planet Facebook

Whatever your views on Facebook [1], you can’t deny that from space, “Planet Facebook” looks rather intriguing. The wonderful diagram below of Facebook connections has been made by Paul Butler. Even miserable Facebook refuseniks (like me) can’t help but go “ooh that’s pretty” while marvelling at the masterful use of the R language to construct this beautiful map…

Planet Facebook / Planet Earth by Paul Butler


  1. John H. Tucker (2010). Status update: “I’m so glamorous”. A study of facebook users shows how narcissism and low self-esteem can be interrelated. Scientific American, 303 (5) PMID: 21033279, see also original research by Soraya Mehdizadeh at DOI:10.1089/cyber.2009.0257

September 1, 2010

How many unique papers are there in Mendeley?

Lex Macho Inc. by Dan DeChiaro on Flickr, How many people in this picture?Mendeley is a handy piece of desktop and web software for managing and sharing research papers [1]. This popular tool has been getting a lot of attention lately, and with some impressive statistics it’s not difficult to see why. At the time of writing Mendeley claims to have over 36 million papers, added by just under half a million users working at more than 10,000 research institutions around the world. That’s impressive considering the startup company behind it have only been going for a few years. The major established commercial players in the field of bibliographic databases (WoK and Scopus) currently have around 40 million documents, so if Mendeley continues to grow at this rate, they’ll be more popular than Jesus (and Elsevier and Thomson) before you can say “bibliography”. But to get a real handle on how big Mendeley is we need to know how many of those 36 million documents are unique because if there are lots of duplicated documents then it will affect the overall head count. (more…)

July 27, 2010

Twenty million papers in PubMed: a triumph or a tragedy?

pubmed.govA quick search on pubmed.gov today reveals that the freely available American database of biomedical literature has just passed the 20 million citations mark*. Should we celebrate or commiserate passing this landmark figure? Is it a triumph or a tragedy that PubMed® is the size it is? (more…)

June 22, 2010

Impact Factor Boxing 2010

Golden Gloves Prelim Bouts by Kate Gardiner[This post is part of an ongoing series about impact factors. See this post for the latest impact factors published in 2012.]

Roll up, roll up, ladies and gentlemen, Impact Factor Boxing is here again. As with last year (2009), the metrics used in this combat sport are already a year out of date. But this doesn’t stop many people from writing about impact factors and it’s been an interesting year [1] for the metrics used by many to judge the relative value of scientific work. The Public Library of Science (PLoS) launched their article level metrics within the last year following the example of BioMedCentral’s “most viewed” articles feature. Next to these new style metrics, the traditional impact factors live on, despite their limitations. Critics like Harold Varmus have recently pointed out that (quote):

“The impact factor is a completely flawed metric and it’s a source of a lot of unhappiness in the scientific community. Evaluating someone’s scientific productivity by looking at the number of papers they published in journals with impact factors over a certain level is poisonous to the system. A couple of folks are acting as gatekeepers to the distribution of information, and this is a very bad system. It really slows progress by keeping ideas and experiments out of the public domain until reviewers have been satisfied and authors are allowed to get their paper into the journal that they feel will advance their career.”

To be fair though, it’s not the metric that is flawed, more the way it is used (and abused) – a subject covered in much detail in a special issue of Nature at http://nature.com/metrics [2,3,4,5]. It’s much harder than it should be to get hold of these metrics, so I’ve reproduced some data below (fair use? I don’t know I am not a lawyer…) to minimise the considerable frustrations of using Journal Citation Reports (JCR).

Love them, loathe them, use them, abuse them, ignore them or obsess over them … here’s a small selection of the 7347 journals that are tracked in JCR  ordered by increasing impact.

Journal Title 2009 data from isiknowledge.com/JCR Eigenfactor™ Metrics
Total Cites Impact Factor 5-Year Impact Factor Immediacy Index Articles Cited Half-life Eigenfactor™  Score Article Influence™ Score
RSC Integrative Biology 34 0.596 57 0.00000
Communications of the ACM 13853 2.346 3.050 0.350 177 >10.0 0.01411 0.866
IEEE Intelligent Systems 2214 3.144 3.594 0.333 33 6.5 0.00447 0.763
Journal of Web Semantics 651 3.412 0.107 28 4.6 0.00222
BMC Bionformatics 10850 3.428 4.108 0.581 651 3.4 0.07335 1.516
Journal of Molecular Biology 69710 3.871 4.303 0.993 916 9.2 0.21679 2.051
Journal of Chemical Information and Modeling 8973 3.882 3.631 0.695 266 5.9 0.01943 0.772
Journal of the American Medical Informatics Association (JAMIA) 4183 3.974 5.199 0.705 105 5.7 0.01366 1.585
PLoS ONE 20466 4.351 4.383 0.582 4263 1.7 0.16373 1.918
OUP Bioinformatics 36932 4.926 6.271 0.733 677 5.2 0.16661 2.370
Biochemical Journal 50632 5.155 4.365 1.262 455 >10.0 0.10896 1.787
BMC Biology 1152 5.636 0.702 84 2.7 0.00997
PLoS Computational Biology 4674 5.759 6.429 0.786 365 2.5 0.04369 3.080
Genome Biology 12688 6.626 7.593 1.075 186 4.8 0.08005 3.586
Trends in Biotechnology 8118 6.909 8.588 1.407 81 6.4 0.02402 2.665
Briefings in Bioinformatics 2898 7.329 16.146 1.109 55 5.3 0.01928 5.887
Nucleic Acids Research 95799 7.479 7.279 1.635 1070 6.5 0.37108 2.963
PNAS 451386 9.432 10.312 1.805 3765 7.6 1.68111 4.857
PLoS Biology 15699 12.916 14.798 2.692 195 3.5 0.17630 8.623
Nature Biotechnology 31564 29.495 27.620 5.408 103 5.7 0.14503 11.803
Science 444643 29.747 31.052 6.531 897 8.8 1.52580 16.570
Cell 153972 31.152 32.628 6.825 359 8.7 0.70117 20.150
Nature 483039 34.480 32.906 8.209 866 8.9 1.74951 18.054
New England Journal of Medicine 216752 47.050 51.410 14.557 352 7.5 0.67401 19.870

Maybe next year Thomson Reuters, who publish this data, could start attaching large government health warnings (like on cigarette packets) and long disclaimers to this data? WARNING: Abusing these figures can seriously damage your Science – you have been warned!


  1. Rizkallah, J., & Sin, D. (2010). Integrative Approach to Quality Assessment of Medical Journals Using Impact Factor, Eigenfactor, and Article Influence Scores PLoS ONE, 5 (4) DOI: 10.1371/journal.pone.0010204
  2. Abbott, A., Cyranoski, D., Jones, N., Maher, B., Schiermeier, Q., & Van Noorden, R. (2010). Metrics: Do metrics matter? Nature, 465 (7300), 860-862 DOI: 10.1038/465860a
  3. Van Noorden, R. (2010). Metrics: A profusion of measures Nature, 465 (7300), 864-866 DOI: 10.1038/465864a
  4. Braun, T., Osterloh, M., West, J., Rohn, J., Pendlebury, D., Bergstrom, C., & Frey, B. (2010). How to improve the use of metrics Nature, 465 (7300), 870-872 DOI: 10.1038/465870a
  5. Lane, J. (2010). Let’s make science metrics more scientific Nature, 464 (7288), 488-489 DOI: 10.1038/464488a

[Creative Commons licensed picture of Golden Gloves Prelim Bouts by Kate Gardiner ]

April 30, 2010

Daniel Cohen on The Social Life of Digital Libraries

Day 106 - I am a librarian by cindiann, on FlickrDaniel Cohen is giving a talk in Cambridge today on The Social Life of Digital Libraries, abstract below:

The digitization of libraries had a clear initial goal: to permit anyone to read the contents of collections anywhere and anytime. But universal access is only the beginning of what may happen to libraries and researchers in the digital age. Because machines as well as humans have access to the same online collections, a complex web of interactions is emerging. Digital libraries are now engaging in online relationships with other libraries, with scholars, and with software, often without the knowledge of those who maintain the libraries, and in unexpected ways. These digital relationships open new avenues for discovery, analysis, and collaboration.

Daniel J. Cohen is an Associate Professor at George Mason University and has been involved in the development of the Zotero extension for the Firefox browser that enables users to manage bibliographic data while doing online research. Zotero [1] is one of many new tools [2] that are attempting to add a social dimension to scholarly information on the Web, so this should be an interesting talk.

If you’d like to come, the talk starts at 6pm in Clare College, Cambridge and you need to RSVP by email via the talks.cam.ac.uk page


  1. Cohen, D.J. (2008). Creating scholarly tools and resources for the digital ecosystem: Building connections in the Zotero project. First Monday 13 (8)
  2. Hull, D., Pettifer, S., & Kell, D. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web PLoS Computational Biology, 4 (10) DOI: 10.1371/journal.pcbi.1000204

March 16, 2010

DNA, Diversity and You at Cambridge Science Festival

Sequence BraceletsAs part of Cambridge Science festival last weekend, I joined a group of about 40 volunteers from The Sanger and EBI at an event “DNA, diversity and you”. This was a series of education and outreach events designed to explore how differences in your genetic code make you different from other individuals, and what makes the humans different from other living things -  with a bit of computational biology thrown in for good measure.  Here are some notes on a selection of the activities, in case you ever find yourself trying to explain biology, computer science or bioinformatics to anyone aged 4-18 and beyond. These resources are all tried, tested and fun to work with, for students and teachers alike:

  1. DNA origami create your own origami DNA molecule, and hands on way of learning abou tthe double helix structure of DNA
  2. DNA sequence bracelets (see picture right). Thread coloured beads according to sequence sections from a range of organisms including trout, chimpanzee, butterfly, a flesh-eating microbe and rotting corpse flower.
  3. Yummy gummy DNA (under 5′s) build your own DNA helix out of sweets and cocktail sticks. Then scoff it all afterwards.
  4. What’s my name in DNA? find out what your name is in DNA, and what the corresponding (hypothetical) protein is using software from deCODE.
  5. Function Finders translate DNA into a sequence of amino acids using wooden translator blocks, then find out which organism the amino acid sequence is from.
  6. Genome sizes (with seatbelts) Rank organisms (inc. human, zebrafish, mosquito, sugar cane and yeast) and find out if they are in the right order. Results are often not what you would expect.
  7. Play your genes right. A card-based guessing game which compares the number of genes in the human genome with the number of genes from a range of different organisms include the flu virus, E. coli bacteria, armadillo, rice plant and others.
  8. Genome Jigsaws for illustrating the process of finishing supposedly “finished” genomes, by putting together a square sequence jigsaw following base pairing rules to end up with a complete finished square.
  9. DNA Time Team examines of aspects ancestry and evolution. The activity encourages people to work out the sequence of a common ancestor by filling in the gaps on a simple evolutionary tree.
  10. Spot the difference with proteins. Comparing Heat Shock Protein (HSP) in human and other organisms to illustrate how different regions of the protein vary between different organisms and how this affects function.
  11. Ready, steady sort: a sorting network that demonstrates one technique that computers use to sort through large amounts of information like sequence data. This comes straight from Computer Science Unplugged by Tim Bell, Mike Fellows and Ian Witten. This activity can be done either as a smaller board game, or as a larger floor game. Either way, it’s a lot of fun, especially if you time people for an added competitive element (see video below)

There were a whole bunch of new activities at the festival this year, maybe these will appear on the your genome website in the future. Anyway, it was great fun to get involved, there is nothing quite like the challenge of explaining parallel computing to young kids, teenagers and their parents – actually much easier than you’d think if you’ve got access to great teaching materials.

Thanks to Francesca Gale and Louisa Wright for all the hard work that went into organising this fun and successful event.

Next Page »

Customized Rubric Theme Blog at WordPress.com.


Get every new post delivered to your Inbox.

Join 1,485 other followers