December 11, 2009

The Semantic Biochemical Journal experiment

utopian documentsThere is an interesting review [1] (and special issue) in the Biochemical Journal today, published by Portland Press Ltd. It provides (quote) “a whirlwind tour of recent projects to transform scholarly publishing paradigms, culminating in Utopia and the Semantic Biochemical Journal experiment”. Here is a quick outline of the publishing projects the review describes and discusses:

  • Blogs for biomedical science
  • Biomedical Ontologies – OBO etc
  • Project Prospect and the Royal Society of Chemistry
  • The Chemspider Journal of Chemistry
  • The FEBS Letters experiment
  • PubMedCentral and BioLit [2]
  • Public Library of Science (PLoS) Neglected Tropical Diseases (NTD) [3]
  • The Elsevier Grand Challenge [4]
  • Liquid Publications
  • The PDF debate: Is PDF a hamburger? Or can we build more useful applications on top of it?
  • The Semantic Biochemical Journal project with Utopia Documents [5]

The review asks what advances these projects have made  and what obstacles to progress still exist. It’s an entertaining tour, dotted with enlightening observations on what is broken in scientific publishing and some of the solutions involving various kinds of semantics.

One conclusion made is that many of the experiments described above are expensive and difficult, but that the costs of not improving scientific publishing with various kinds of semantic markup is high, or as the authors put it:

“If the cost of semantic publishing seems high, then we also need to ask, what is the price of not doing it? From the results of the experiments we have seen to date, there is clearly a need to move forward and still a great deal of scope to innovate. If we fail to move forward in a collaborative way, if we fail to engage the key players, the price will be high. We will continue to bury scientific knowledge, as we routinely do now, in static, unconnected journal articles; to sequester fragments of that knowledge in disparate databases that are largely inaccessible from journal pages; to further waste countless hours of scientists’ time either repeating experiments they didn’t know had been performed before, or worse, trying to verify facts they didn’t know had been shown to be false. In short, we will continue to fail to get the most from our literature, we will continue to fail to know what we know, and will continue to do science a considerable disservice.”

It’s well worth reading the review, and downloading the Utopia software to experience all of the interactive features demonstrated in this special issue, especially the animated molecular viewers and sequence alignments.

Enjoy… the Utopia team would be interested to know what people think, see commentary on friendfeed,  the digital curation blog and youtube video below for more information.


  1. Attwood, T., Kell, D., McDermott, P., Marsh, J., Pettifer, S., & Thorne, D. (2009). Calling International Rescue: knowledge lost in literature and data landslide! Biochemical Journal, 424 (3), 317-333 DOI: 10.1042/BJ20091474
  2. Fink, J., Kushch, S., Williams, P., & Bourne, P. (2008). BioLit: integrating biological literature with databases Nucleic Acids Research, 36 (Web Server) DOI: 10.1093/nar/gkn317
  3. Shotton, D., Portwin, K., Klyne, G., & Miles, A. (2009). Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article PLoS Computational Biology, 5 (4) DOI: 10.1371/journal.pcbi.1000361
  4. Pafilis, E., O’Donoghue, S., Jensen, L., Horn, H., Kuhn, M., Brown, N., & Schneider, R. (2009). Reflect: augmented browsing for the life scientist Nature Biotechnology, 27 (6), 508-510 DOI: 10.1038/nbt0609-508
  5. Pettifer, S., Thorne, D., McDermott, P., Marsh, J., Villéger, A., Kell, D., & Attwood, T. (2009). Visualising biological data: a semantic approach to tool and database integration BMC Bioinformatics, 10 (Suppl 6) DOI: 10.1186/1471-2105-10-S6-S19

November 24, 2009

Semantic Web Applications and Tools for the Life Sciences (SWAT4LS) 2009, Amsterdam

Snow in Amsterdam by Bas van GaalenLast Friday, the Centrum Wiskunde & Informatica (CWI) in Amsterdam hosted a workshop called Semantic Web Applications and Tools for the Life Sciences (SWAT4LS) 2009.

Following on from last year [1], the workshop proceedings will be published at ceur-ws.org and in a special issue of the Journal of Biomedical Semantics, but if you want to find out what happened in the meantime, take a look at the #swat4ls2009 hashtag on twitter. Twitter makes bloggers lazy (they blog less but tweet more), but thankfully Nico Adams has studiously blogged the workshop very extensively.

Disruptive Technologies Director (cool job title!) Anita de Waard from Elsevier was asking what were the conclusions of the workshop. So here is an incomplete summary: Roughly speaking, people agreed to disagree (again). Keynote speaker Barend Mons argued that redundant data should be eliminated through the use of “nano-publications” and micro-attribution in his entertaining but controversial keynote. Some people in the audience disagreed with this. Greg Tyrelle thinks that redundancy is a feature, not a bug, in the Web and we have to deal with it. Alan Ruttenberg argued that semantic web reasoners  are required to clean up and sanity check all the messy and noisy biological data but emphasised the importance of Computer Scientists learning to speak Biologists language.

The good thing about this workshop is its size: small, friendly but internationally attended. Thanks to M. Scott Marshall, Albert Burger, Adrian Paschke, Paolo Romano and Andrea Splendiani for organising another good workshop, hope to see you again next year (if not before).


  1. Burger, A., Romano, P., Paschke, A., & Splendiani, A. (2009). Semantic Web Applications and Tools for Life Sciences, 2008 – Introduction BMC Bioinformatics, 10 (Suppl 10) DOI: 10.1186/1471-2105-10-S10-S1 part of the special issue on SWAT4LS 2008

[CC-licensed picture of Amsterdam in the snow by Bas van Gaalen]

June 4, 2009

Improving the OBO Foundry Principles

The Old Smithy Pub by loop ohThe Open Biomedical Ontologies (OBO) are a set of reference ontologies for describing all kinds of biomedical data, see [1-5] for examples. Every year, users and developers of these ontologies gather from around the globe for a workshop at the EBI near Cambridge, UK. Following on from the first workshop last year, the 2nd OBO workshop 2009 is fast approaching.

In preparation, I’ve been revisiting the OBO Foundry documentation, part of which establishes a set of principles for ontology development. I’m wondering how they could be improved because these principles are fundamental to the whole effort. We’ve been using one of the OBO ontologies (called Chemical Entities of Biological Interest (ChEBI)) in the REFINE project to mine data from the PubMed database. OBO Ontologies like ChEBI and the Gene Ontology are really crucial to making sense of the massive data which are now common in biology and medicine – so this is stuff that matters.

The OBO Foundry Principles, a sort of Ten Commandments of Ontology (or Obology if you prefer) currently look something like this (copied directly from obofoundry.org/crit.shtml):

  1. The ontology must be open and available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed under the original name or with the same identifiers.The OBO ontologies are for sharing and are resources for the entire community. For this reason, they must be available to all without any constraint or license on their use or redistribution. However, it is proper that their original source is always credited and that after any external alterations, they must never be redistributed under the same name or with the same identifiers.
  2. The ontology is in, or can be expressed in, a common shared syntax. This may be either the OBO syntax, extensions of this syntax, or OWL. The reason for this is that the same tools can then be usefully applied. This facilitates shared software implementations. This criterion is not met in all of the ontologies currently listed, but we are working with the ontology developers to have them available in a common OBO syntax.
  3. The ontologies possesses a unique identifier space within the OBO Foundry. The source of a term (i.e. class) from any ontology can be immediately identified by the prefix of the identifier of each term. It is, therefore, important that this prefix be unique.
  4. The ontology provider has procedures for identifying distinct successive versions.
  5. The ontology has a clearly specified and clearly delineated content. The ontology must be orthogonal to other ontologies already lodged within OBO. The major reason for this principle is to allow two different ontologies, for example anatomy and process, to be combined through additional relationships. These relationships could then be used to constrain when terms could be jointly applied to describe complementary (but distinguishable) perspectives on the same biological or medical entity. As a corollary to this, we would strive for community acceptance of a single ontology for one domain, rather than encouraging rivalry between ontologies.
  6. The ontologies include textual definitions for all terms. Many biological and medical terms may be ambiguous, so terms should be defined so that their precise meaning within the context of a particular ontology is clear to a human reader.
  7. The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
  8. The ontology is well documented.
  9. The ontology has a plurality of independent users.
  10. The ontology will be developed collaboratively with other OBO Foundry members.

ResearchBlogging.orgI’ve been asking all my frolleagues what they think of these principles and have got some lively responses, including some here from Allyson Lister, Mélanie Courtot, Michel Dumontier and Frank Gibson. So what do you think? How could these guidelines be improved? Do you have any specific (and preferably constructive) criticisms of these ambitious (and worthy) goals? Be bold, be brave and be polite. Anything controversial or “off the record” you can email it to me… I’m all ears.

CC-licensed picture above of the Old Smithy (pub) by Loop Oh. Inspired by Michael Ashburner‘s standing OBO joke (Ontolojoke) which goes something like this: Because Barry Smith is one of the leaders of OBO, should the project be called the OBO Smithy or the OBO Foundry? 🙂


  1. Noy, N., Shah, N., Whetzel, P., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D., Storey, M., Chute, C., & Musen, M. (2009). BioPortal: ontologies and integrated data resources at the click of a mouse Nucleic Acids Research DOI: 10.1093/nar/gkp440
  2. Côté, R., Jones, P., Apweiler, R., & Hermjakob, H. (2006). The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries BMC Bioinformatics, 7 (1) DOI: 10.1186/1471-2105-7-97
  3. Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L., Eilbeck, K., Ireland, A., Mungall, C., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S., Scheuermann, R., Shah, N., Whetzel, P., & Lewis, S. (2007). The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration Nature Biotechnology, 25 (11), 1251-1255 DOI: 10.1038/nbt1346
  4. Smith, B., Ceusters, W., Klagges, B., Köhler, J., Kumar, A., Lomax, J., Mungall, C., Neuhaus, F., Rector, A., & Rosse, C. (2005). Relations in biomedical ontologies Genome Biology, 6 (5) DOI: 10.1186/gb-2005-6-5-r46
  5. Bada, M., & Hunter, L. (2008). Identification of OBO nonalignments and its implications for OBO enrichment Bioinformatics, 24 (12), 1448-1455 DOI: 10.1093/bioinformatics/btn194

June 1, 2009

Scott Marshall on Interoperability

M. Scott MarshallScott Marshall is visiting Manchester this week, he will be doing a seminar on Friday 5th June, here are some details for anyone who is interested in attending:

Speaker: Dr. M. Scott Marshall, The University of Amsterdam

Date/Time: 5th June 2009, 11:00

Location: Room MLG.001 (Lecture Theatre), MIB building, (number 16 on campus map)

Title: Standards Enabled Interoperability: W3C Semantic Web for Health Care and Life Sciences Interest Group

Abstract: The W3C Semantic Web for Health Care and Life Sciences Interest Group (HCLS) has the mission of developing, advocating for, and supporting the use of Semantic Web technologies for biological science, translational medicine and health care. HCLS covers hot topics including data integration and federation, bridging commonly used domain standards such as CDISC and HL7, and the applications of medical terminologies. This talk will introduce the HCLS, as well as provide an overview of the activities that are currently ongoing within the task forces, as well as new developments and the recent Face2Face meeting. The role of information extraction and the current interest in Shared Identifiers will also be discussed.


  1. Ruttenberg, A., Rees, J., Samwald, M., & Marshall, M. (2009). Life sciences on the Semantic Web: the Neurocommons and beyond Briefings in Bioinformatics, 10 (2), 193-204 DOI: 10.1093/bib/bbp004

May 21, 2009

Upcoming Gig: The Italian Job at NETTAB

NETTAB: Network Tools and Applications in BiologyNetwork Tools and Applications in Biology (NETTAB) is a series of workshops in Bioinformatics. It focuses on the most promising and innovative ICT tools and their utility in Bioinformatics. These workshops aim to introduce participants to the evolving network standards and technologies that are being applied to the field of biology.

Since 2001, the NETTAB workshops have being doing a Giro d’Italia or  Grand Tour of Italy; Genova, Bologna, Naples, Sardinia, Lake Como and Pisa have all played host to the workshop. This year, NETTAB 2009 is in Catania at the Università degli Studi di Catania in Sicily close to Mount Etna.

There is special theme for this years workshop, held on June 10-13, on Technologies, Tools and Applications for Collaborative and Social Bioinformatics Research and Development. So I’m very pleased that Paolo Romano asked me to do a keynote presentation (w00t!) on the work we have been doing in the REFINE project and myExperiment. Grazie Paolo, grazie. And thanks Carole Goble too for the recommendation.

If you’re going to NETTAB this year, see you there. If you’d like to come, today is the last day for the early bird discount, sign up at the registration page. The scientific programme looks interesting, it will be good to meet Alex Bateman and Tim Clark and the rest of this years speakers.

Now, if my keynote presentation is going to (as Michael Caine once famously said [1]) “blow the bl**dy doors off” [2], it needs loads more work. So I’d better get back to it. Ciao!

[Update: See reports from day one, day two and day three of NETTAB 2009.]


  1. Peter Collinson and Troy Kennedy-Martin (1969) The Italian Job
  2. Michael Caine (1969) “You’re only supposed to blow the bl**dy doors off!”
  3. Cannata, N., Schröder, M., Marangoni, R., & Romano, P. (2008). A Semantic Web for bioinformatics: goals, tools, systems, applications BMC Bioinformatics, 9 (Suppl 4) DOI: 10.1186/1471-2105-9-S4-S1

May 6, 2009

Michel Dumontier on Representing Biochemistry

Michel Dumontier by Tom HeathMichel Dumontier is visiting Manchester this week, he will be doing a seminar on Monday 11th of May,  here are some details for anyone who is interested in attending:

Title: Increasingly Accurate Representation of Biochemistry

Speaker: Michel Dumontier, dumontierlab.com

Time: 14.00, Monday 11th May 2009
Venue: Atlas 1, Kilburn Building, University of Manchester, number 39 on the Google Campus Map

Abstract: Biochemical ontologies aim to capture and represent biochemical entities and the relations that exist between them in an accurate manner. A fundamental starting point is biochemical identity, but our current approach for generating identifiers is haphazard and consequently integrating data is error-prone. I will discuss plausible structure-based strategies for biochemical identity whether it be at molecular level or some part thereof (e.g. residues, collection of residues, atoms, collection of atoms, functional groups) such that identifiers may be generated in an automatic and curator/database independent manner. With structure-based identifiers in hand, we will be in a position to more accurately capture context-specific biochemical knowledge, such as how a set of residues in a binding site are involved in a chemical reaction including the fact that a key nitrogen atom must first be de-protonated. Thus, our current representation of biochemical knowledge may improve such that manual and automatic methods of biocuration are substantially more accurate.

Update: Slides are now available via SlideShare.

[Creative Commons licensed picture of Michel in action at ISWC 2008 from Tom Heath]


  1. Michel Dumontier and Natalia Villanueva-Rosales (2009) Towards pharmacogenomics knowledge discovery with the semantic web Briefings in Bioinformatics DOI:10.1093/bib/bbn056
  2. Doug Howe et al (2008) Big data: The future of biocuration Nature 455, 47-50 doi:10.1038/455047a

December 2, 2008

SWAT4LS: The Semantic Web in Scotland

James Clerk MaxwellLast Friday, the UK National e-Science Centre in Edinburgh hosted a workhop, Semantic Web Applications and Tools for the Life Sciences (see SWAT4LS.org for the full details). Here are some incomplete and abbreviated notes from the workshop where there were some interesting people, paperware and software.

People and Paperware

70 people registered to attend SWAT4LS in total, many familiar names and faces, plus some new people I’ve never met before: (more…)

October 30, 2008

Congratulations Matthew Horridge!

George Best Genius by sahmeepeeSo, congratulations are due to Matthew Horridge, Bijan Parsia and Ulrike Sattler from The University of Manchester for winning the keenly fought best paper prize at the International Semantic Web Conference [ISWC 2008] in Karlsruhe for their paper “Laconic and Precise Justifications in OWL”. An abstract of the paper is reproduced below:

“A justification for an entailment in an OWL ontology is a minimal subset of the ontology that is sufficient for that entailment to hold. Since justifications respect the syntactic form of axioms in an ontology, they are usually neither syntactically nor semantically minimal. This paper presents two new subclasses of justifications—laconic justifications and precise justifications. Laconic justifications only consist of axioms that do not contain any superfluous “parts”. Precise justifications can be derived from laconic justifications and are characterised by the fact that they consist of flat, small axioms, which facilitate the generation of semantically minimal repairs. Formal definitions for both types of justification are presented. In contrast to previous work in this area, these definitions make it clear as to what exactly “parts of axioms” are. In order to demonstrate the practicability of computing laconic, and hence precise justifications, an algorithm is provided and results from an empirical evaluation carried out on several published ontologies are presented. The evaluation showed that laconic/precise justifications can be computed in a reasonable time for entailments in a range of ontologies that vary in size and complexity. It was found that in half of the ontologies sampled there were entailments that had more laconic/precise justifications than regular justifications. More surprisingly it was observed that for some ontologies there were fewer laconic justifications than regular justifications.”

But what does it all mean? One of the results of this research project has been an explanations plug-in for the Protégé ontology editor, see explanation in OWL at http://owl.cs.manchester.ac.uk. This helps users to understand when and why the reasoning goes all pear-shaped through better explanations than has previously been possible. So this is another step toward making building better ontologies with the Web Ontology Language (OWL) easier and less confusing. Yay!


And the winner is... by guitarfish

  1. Matthew Horridge, Bijan Parsia and Ulrike Sattler (2008). Laconic and Precise Justifications in OWL Lecture Notes in Computer Science, LNCS Volume 5318/-1 The Semantic Web – ISWC 2008 DOI:10.1007/978-3-540-88564-1_21

[Picture of Manchester United player George Best by Sammy, Best paper prize picture by guitarfish]

October 27, 2008

OWL Experiences and Directions (OWLED) 2008

Great Grey Owl by Brian ScottThe Web Ontology Language (OWL) is a language for creating ontologies on the Web. It does exactly what it says on the tin. But what is an ontology? One way to think of it is as a better way of storing data and knowledge. Instead of just capturing and describing data in a databases, ontology languages like OWL provide ways to capture and describe knowledge in a knowledge base. Ontologies can allow more intelligent querying, integration and understanding of data than is possible using a plain old relational database.

Since 2003 developers and users of the Web Ontology Language, abbreviated to OWL (not WOL), have been gathering at a two-day workshop called OWLED (OWL Experiences and Directions). This year the workshop is in Karlsruhe in Germany. The full list of accepted papers is available, as with previous years, this years workshop has a distinctly biological flavour to the proceedings: (more…)

May 15, 2008

BBC: Building a Better ChEBI

molecule by vabellon, on FlickrChemical Entitites of Biological Interest, ChEBI, is a freely available dictionary [1] of molecular entities, especially small chemical compounds. Like all big dictionaries and ontologies, it has its own unique challenges. Fortunately, those nice people at the EBI are holding a workshop to discuss future developments in ChEBI. In preparation for the workshop, here are some brief notes on how ChEBI could be made better. [Disclaimer: I’m fairly new to ChEBI and “thinking out loud” here, add comments below if I’ve said anything stupid or wrong]


