O'Really?

December 12, 2006

Buggotea: Redundant Links in Connotea

IMGP4570Dear Santa, all I want for Christmas* is a better version of Connotea, please can you sort out it’s duplicated redundant links? In my book this particular bug is “buggotea” number one. Here is the problem… [update: buggotea is partially fixed, see comments from Ian Mulvany at the nodalpoint link in the references below]

There is this handy bioinformatics web application called Connotea which I like to use, built by those nice people in the web team at Nature Publishing Group. Most readers of nodalpoint probably already know about it, but because you’re Santa and you’ve been busy lately, let me explain. Connotea can help scientists (not just bioinformaticians) to organise and share their bibliographic references, whilst discovering what other people with similar interests are reading. It’s good, but it has some bugs in it. Since it’s open-source software, anyone with the time, inclination and skills can get hold of the connotea source code and improve it. There is, however, one particularly nasty redundancy bug in Connotea that is bugging me [1]. I think it should be fixable, and that doing so would make Connotea a significantly better application than it already is. Let’s illustrate this bug with a little story…

I have five bioinformatics colleagues with Connotea usernames glycine, methionine, threonine, tyrosine and valine. They are all web-savvy researchers who use Connotea to manage and share their references. Like many bioinformaticians, they are also desperate perl hackers and one of their favourite papers is The Bioperl Toolkit: Perl Modules for the Life Sciences. This highly-cited paper by Jason Stajich et al published in Genome Research describes the libraries available in Bioperl.

My first colleague, glycine, found Jason’s Bioperl paper by browsing PubMed. Using Connotea they bookmarked a PubMed link, a particular type of Uniform Resource Identifier (URI), shown below:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=12368254

So far so good. My next colleague, tyrosine, also bookmarked a PubMed link, but a subtly different URI for the same paper. This is because the “dopt” (display format) parameter in the URI has a different value, “Abstract” instead of “AbstractPlus” like this:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=Abstract&list_uids=12368254

It’s a small difference, but, as we shall see, it has important consequences. Another colleague, threonine, found the paper on the genome.org website and bookmarked the URI of the papers full content:

http://www.genome.org/cgi/content/full/12/10/1611

…While valine just bookmarked the URI of the abstract at

http://www.genome.org/cgi/content/abstract/12/10/1611

Meanwhile methionine, who is a big fan of Digital Object Identiers (DOI), bookmarked the paper’s DOI (doi:10.1101/gr.361602), magically transforming it into a URI by prefixing it with http://dx.doi.org like this:

http://dx.doi.org/10.1101/gr.361602

Finally, duncan (that’s me) finds the paper from a PubMed search, and bookmarks the URI below from the search results like so:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=12368254&query_hl=42&itool=pubmed_docsum

It won’t take a brain surgeon to realise from the above story that six different people bookmarked six different and redundant URI’s for the same paper. With many URI’s representing any given paper this is very common. Now, one of the useful features of Connotea, is that while bookmarking all these different URI’s it uses CrossRef to retrieve any relevant metadata. The author, journal, publication date and any unique identifier(s) are automagically retrieved to save us the hassle of typing them in. So, despite the fact that we’ve all bookmarked different URI’s for the same paper, Connotea shows that three users have actually bookmarked the paper identified by the PubMed identifier PMID:12368254 and five users have bookmarked the object identified by doi:10.1101/gr.361602, see [2] for examples.

The trouble is, Connotea doesn’t currently use this metadata intelligently to reason that these URIs all represent the same paper. Because they are different URIs, it naïvely treats them as if they are completely different papers, even though they share DOI and PubMed identifiers. Of course, this redundacy isn’t really the fault of Connotea, it’s an inherent part of the web, but the result is unfortunate and avoidable fragmentation. Instead of Connotea showing Posted by glycine and 5 others to bioperl underneath each bookmark (which is what should ideally happen), it displays Posted by glycine (and 0 others) to buggotea.

With different URIs bookmarked we can’t see accurately how many people have bookmarked a given paper as most incorrectly appear to have been bookmarked only once or perhaps twice. Neither can we see who has bookmarked any given paper, unless they happened to use exactly the same URI, which is pretty unlikely. Now you can obviously look this kind of popularity data up in various citation databases, but each of these has its own unique flaws, and social bookmarking is supposed to be what Connotea is all about. The result of all this is that the shared tagging and web 2.0 goodness of Connotea is mostly lost, mysteriously disappearing just like you do, when you’ve finished delivering your presents on Christmas eve.

So Santa, if you’re reading this blog, and you know any talented perl-hacking elves with database expertise, could you please ask them to sort this ugly redundancy bug [3]? I hope they won’t be too busy helping you wrap presents and look forward to seeing a better version of Connotea sometime soon. Have a very Webby Christmas!

[Posted by santaclaus and 10 others (mostly elves) to xmas wishes]

References

  1. Possible Problems with Connotea: Redundant URIs

  2. Buggotea: The Redundancy Bug in Connotea
  3. Hull, D. (2006) Buggotea: The original post on nodalpoint.org with comments
  4. You’ve read about redundancy, now buy the T-shirts: Department of Redundancy Department and I love ❤ redundancy
  5. *I should really have said, “All I want for Christmas apart from world peace and an end to poverty and that book I mentioned and a Dukla Prague Away Kit [5] and … etc
  6. Half Man Half Biscuit (1986) ♫ All I want for Christmas is a Dukla Prague Away Kit ♫
  7. Citeulike feature requests: Shared bibliographic info for a paper across all of citeulike
  8. Examples of Buggotea in Citeulike
  9. Connotea or Citeulike?

Creative Commons License

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,595 other followers

%d bloggers like this: