February 12, 2010

The 3rd OBO Foundry Workshop 2010, Cambridge, UK

Ultrawide Wellcome Trust Genome Campus, Cambridge by Tim NugentThe Open Biomedical Ontologies (OBO) [1] are a set of reference ontologies for describing all kinds of biomedical data shared in a centralised repository called The OBO Foundry. Every year, users and developers of these ontologies gather from around the globe for a workshop at the EBI near Cambridge, UK. Following on from the first workshop two years ago, the second workshop last year it’s already time for the third workshop on February 15th-16th. All the details and agenda are here if you’re interested. This workshop is possible thanks to sponsorship from the BBSRC funds for Workshop on Data Standards and the EU ELIXIR ‘Data Integration & Interoperability’ Package 7.

[Update: outcomes from the workshop are available here, along with a summary of discussion from Monday and a summary of discussion from Tuesday.]


  1. Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L., Eilbeck, K., Ireland, A., Mungall, C., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S., Scheuermann, R., Shah, N., Whetzel, P., & Lewis, S. (2007). The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration Nature Biotechnology, 25 (11), 1251-1255 DOI: 10.1038/nbt1346

[Ultrawide panoramic picture of the Wellcome Trust Genome Campus by Tim Nugent, as featured on the cover of the EMBL-EBI Annual Scientific Report 2009. Making those pictures looks like a lot of fun.]

February 5, 2010

Classic paper: Montagues and Capulets in Science

Romeo and Juliet by HappyHippoSnacksIn preparation for a joint seminar I’ll be doing with Midori Harris here at the EBI, here’s a classic paper [1,2] on the social problems of building biomedical ontologies. This paper is worth reading (or re-reading) because it makes lots of relevant points about the use and abuse of research and how people misunderstand each other [3]. It’s funny (and available Open Access too) plus how many papers do you read with an abstract written in the style of Big Bard Bill Shakespeare?

ABSTRACT: Two households, both alike in dignity, In fair Genomics, where we lay our scene, (One, comforted by its logic’s rigour, Claims ontology for the realm of pure, The other, with blessed scientist’s vigour, Acts hastily on models that endure), From ancient grudge break to new mutiny, When ‘being’ drives a fly-man to blaspheme. From forth the fatal loins of these two foes, Researchers to unlock the book of life; Whole misadventured piteous overthrows, Can with their work bury their clans’ strife. The fruitful passage of their GO-mark’d love, And the continuance of their studies sage, Which, united, yield ontologies undreamed-of, Is now the hour’s traffic of our stage; The which if you with patient ears attend, What here shall miss, our toil shall strive to mend.

So if you read the paper, you have to ask yourself, are you a Montague or a Capulet?


  1. Carole Goble and Chris Wroe (2004). The Montagues and the Capulets Comparative and Functional Genomics, 5 (8), 623-632 DOI: 10.1002/cfg.442
  2. Carole Goble (2004) The Capulets and Montagues: A plague on both your houses?, SOFG: Standards and Ontologies for Functional Genomics
  3. William Shakespeare (1596) Romeo and Juliet

[Romeo and Juliet picture via Happy Hippo Snacks]

January 21, 2010

Blogging a Book about Bio-Ontologies

Waterloo Station Ultrawide Panoramic by Tim NugentIf you wanted to write a guide to Biomedical and Biological Ontologies [1], especially the what, why, when, how, where and who, there are at least three choices for publishing your work:

  1. Journal publishing in your favourite scientific journal.
  2. Book publishing with your favourite academic or technical publisher.
  3. Self publishing on a web blog with your favourite blogging software.

Each of these has its own unique problems:

  • The trouble with journals is that they typically don’t publish “how to” guides, although you might be able to publish some kind of review.
  • The trouble with books, and academic books in particular, is that people (and machines) often don’t read them. Also, academic books can be prohibitively expensive to buy and this can make the data inside them less visible and accessible to the widest audience. Unfortunately all that lovely knowledge gets locked up behind publishers paywalls. To add insult to injury, most academic books take a very long time to publish, often several years. By the time of printing, the content of many academic books is often very dated.
  • The trouble with blogs, they aren’t peer-reviewed in the traditional way and they tend to be written by a single person from a not very neutral point of view. Or as Dave once put it “vanity publishing for arrogant people with an inflated ego“. Ouch.

So the people behind the Ontogenesis network (Robert Stevens and Phillip Lord with funding from the EPSRC grant ref: EP/E021352/1) had an idea. Why not blog a book about Ontology? As a publishing experiment – it might just work by combining the merits of books and blogs together in order to overcome their shortcomings. This will involve getting a small group of about twenty people (mostly bio-ontologists) together, and writing about what an ontology is, why you would want to a biomedical ontology, how to build one and so on. We will be doing some of the peer-review online too.

As part of an ongoing experiment, we are posting all this information on a blog called http://ontogenesis.knowledgeblog.org if you’d like to follow, subscribe to the feed and read the manifesto.


  1. Yu, A. (2006). Methods in biomedical ontology Journal of Biomedical Informatics, 39 (3), 252-266 DOI: 10.1016/j.jbi.2005.11.006

[Ultrawide panoramic picture of Waterloo station by Tim Nugent]

November 24, 2009

Semantic Web Applications and Tools for the Life Sciences (SWAT4LS) 2009, Amsterdam

Snow in Amsterdam by Bas van GaalenLast Friday, the Centrum Wiskunde & Informatica (CWI) in Amsterdam hosted a workshop called Semantic Web Applications and Tools for the Life Sciences (SWAT4LS) 2009.

Following on from last year [1], the workshop proceedings will be published at ceur-ws.org and in a special issue of the Journal of Biomedical Semantics, but if you want to find out what happened in the meantime, take a look at the #swat4ls2009 hashtag on twitter. Twitter makes bloggers lazy (they blog less but tweet more), but thankfully Nico Adams has studiously blogged the workshop very extensively.

Disruptive Technologies Director (cool job title!) Anita de Waard from Elsevier was asking what were the conclusions of the workshop. So here is an incomplete summary: Roughly speaking, people agreed to disagree (again). Keynote speaker Barend Mons argued that redundant data should be eliminated through the use of “nano-publications” and micro-attribution in his entertaining but controversial keynote. Some people in the audience disagreed with this. Greg Tyrelle thinks that redundancy is a feature, not a bug, in the Web and we have to deal with it. Alan Ruttenberg argued that semantic web reasoners  are required to clean up and sanity check all the messy and noisy biological data but emphasised the importance of Computer Scientists learning to speak Biologists language.

The good thing about this workshop is its size: small, friendly but internationally attended. Thanks to M. Scott Marshall, Albert Burger, Adrian Paschke, Paolo Romano and Andrea Splendiani for organising another good workshop, hope to see you again next year (if not before).


  1. Burger, A., Romano, P., Paschke, A., & Splendiani, A. (2009). Semantic Web Applications and Tools for Life Sciences, 2008 – Introduction BMC Bioinformatics, 10 (Suppl 10) DOI: 10.1186/1471-2105-10-S10-S1 part of the special issue on SWAT4LS 2008

[CC-licensed picture of Amsterdam in the snow by Bas van Gaalen]

September 4, 2009

XML training in Oxford

XML Summer School 2009The XML Summer School returns this year at St. Edmund Hall, Oxford from 20th-25th September 2009. As always, it’s packed with high quality technical training for every level of expertise, from the Hands-on Introduction for beginners through to special classes devoted to XQuery and XSLT, Semantic Technologies, Open Source Applications, Web 2.0, Web Services and Identity. The Summer School is also a rare opportunity to experience what life is like as a student in one of the world’s oldest university cities while enjoying a range of social events that are a part of the unique summer school experience.

This year, classes and sessions are taught and chaired by:

W3C XML 10th anniversaryThe Extensible Markup Language (XML) has been around for just over ten years, quickly and quietly finding its niche in many different areas of science and technology. It has been used in everything from modelling biochemical networks in systems biology [1], to electronic health records [2], scientific publishing, the provision of the PubMed service (which talks XML) [3] and many other areas. As a crude measure of its importance in biomedical science, PubMed currently has no fewer than 800 peer-reviewed publications on XML. It’s hard to imagine life without it. So whether you’re a complete novice looking to learn more about XML or a seasoned veteran wanting to improve your knowledge, register your place and find out more by visiting xmlsummerschool.com. I hope to see you there…


  1. Hucka, M. (2003). The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models Bioinformatics, 19 (4), 524-531 DOI: 10.1093/bioinformatics/btg015
  2. Bunduchi R, Williams R, Graham I, & Smart A (2006). XML-based clinical data standardisation in the National Health Service Scotland. Informatics in primary care, 14 (4) PMID: 17504574
  3. Sayers, E., Barrett, T., Benson, D., Bryant, S., Canese, K., Chetvernin, V., Church, D., DiCuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L., Helmberg, W., Kapustin, Y., Landsman, D., Lipman, D., Madden, T., Maglott, D., Miller, V., Mizrachi, I., Ostell, J., Pruitt, K., Schuler, G., Sequeira, E., Sherry, S., Shumway, M., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusova, T., Wagner, L., Yaschenko, E., & Ye, J. (2009). Database resources of the National Center for Biotechnology Information Nucleic Acids Research, 37 (Database) DOI: 10.1093/nar/gkn741

June 4, 2009

Improving the OBO Foundry Principles

The Old Smithy Pub by loop ohThe Open Biomedical Ontologies (OBO) are a set of reference ontologies for describing all kinds of biomedical data, see [1-5] for examples. Every year, users and developers of these ontologies gather from around the globe for a workshop at the EBI near Cambridge, UK. Following on from the first workshop last year, the 2nd OBO workshop 2009 is fast approaching.

In preparation, I’ve been revisiting the OBO Foundry documentation, part of which establishes a set of principles for ontology development. I’m wondering how they could be improved because these principles are fundamental to the whole effort. We’ve been using one of the OBO ontologies (called Chemical Entities of Biological Interest (ChEBI)) in the REFINE project to mine data from the PubMed database. OBO Ontologies like ChEBI and the Gene Ontology are really crucial to making sense of the massive data which are now common in biology and medicine – so this is stuff that matters.

The OBO Foundry Principles, a sort of Ten Commandments of Ontology (or Obology if you prefer) currently look something like this (copied directly from obofoundry.org/crit.shtml):

  1. The ontology must be open and available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed under the original name or with the same identifiers.The OBO ontologies are for sharing and are resources for the entire community. For this reason, they must be available to all without any constraint or license on their use or redistribution. However, it is proper that their original source is always credited and that after any external alterations, they must never be redistributed under the same name or with the same identifiers.
  2. The ontology is in, or can be expressed in, a common shared syntax. This may be either the OBO syntax, extensions of this syntax, or OWL. The reason for this is that the same tools can then be usefully applied. This facilitates shared software implementations. This criterion is not met in all of the ontologies currently listed, but we are working with the ontology developers to have them available in a common OBO syntax.
  3. The ontologies possesses a unique identifier space within the OBO Foundry. The source of a term (i.e. class) from any ontology can be immediately identified by the prefix of the identifier of each term. It is, therefore, important that this prefix be unique.
  4. The ontology provider has procedures for identifying distinct successive versions.
  5. The ontology has a clearly specified and clearly delineated content. The ontology must be orthogonal to other ontologies already lodged within OBO. The major reason for this principle is to allow two different ontologies, for example anatomy and process, to be combined through additional relationships. These relationships could then be used to constrain when terms could be jointly applied to describe complementary (but distinguishable) perspectives on the same biological or medical entity. As a corollary to this, we would strive for community acceptance of a single ontology for one domain, rather than encouraging rivalry between ontologies.
  6. The ontologies include textual definitions for all terms. Many biological and medical terms may be ambiguous, so terms should be defined so that their precise meaning within the context of a particular ontology is clear to a human reader.
  7. The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
  8. The ontology is well documented.
  9. The ontology has a plurality of independent users.
  10. The ontology will be developed collaboratively with other OBO Foundry members.

ResearchBlogging.orgI’ve been asking all my frolleagues what they think of these principles and have got some lively responses, including some here from Allyson Lister, Mélanie Courtot, Michel Dumontier and Frank Gibson. So what do you think? How could these guidelines be improved? Do you have any specific (and preferably constructive) criticisms of these ambitious (and worthy) goals? Be bold, be brave and be polite. Anything controversial or “off the record” you can email it to me… I’m all ears.

CC-licensed picture above of the Old Smithy (pub) by Loop Oh. Inspired by Michael Ashburner‘s standing OBO joke (Ontolojoke) which goes something like this: Because Barry Smith is one of the leaders of OBO, should the project be called the OBO Smithy or the OBO Foundry? 🙂


  1. Noy, N., Shah, N., Whetzel, P., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D., Storey, M., Chute, C., & Musen, M. (2009). BioPortal: ontologies and integrated data resources at the click of a mouse Nucleic Acids Research DOI: 10.1093/nar/gkp440
  2. Côté, R., Jones, P., Apweiler, R., & Hermjakob, H. (2006). The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries BMC Bioinformatics, 7 (1) DOI: 10.1186/1471-2105-7-97
  3. Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L., Eilbeck, K., Ireland, A., Mungall, C., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S., Scheuermann, R., Shah, N., Whetzel, P., & Lewis, S. (2007). The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration Nature Biotechnology, 25 (11), 1251-1255 DOI: 10.1038/nbt1346
  4. Smith, B., Ceusters, W., Klagges, B., Köhler, J., Kumar, A., Lomax, J., Mungall, C., Neuhaus, F., Rector, A., & Rosse, C. (2005). Relations in biomedical ontologies Genome Biology, 6 (5) DOI: 10.1186/gb-2005-6-5-r46
  5. Bada, M., & Hunter, L. (2008). Identification of OBO nonalignments and its implications for OBO enrichment Bioinformatics, 24 (12), 1448-1455 DOI: 10.1093/bioinformatics/btn194

May 13, 2009

XML Summer School, Oxford

XML Summer School, Oxford, U.K.After a brief absence, it is good to see the XML Summer School is back again this September (20th-25th) at St. Edmund Hall, Oxford. This is  “a unique event for everyone using, designing or implementing solutions using XML and related technologies.” I’ve been both a delegate and a speaker here over the years; back in 2005, with Nick Drummond we presented the Protégé and OWL tutorial which was good fun.  So here is what I.M.H.O. makes the XML summer school worth a look: (more…)

May 6, 2009

Michel Dumontier on Representing Biochemistry

Michel Dumontier by Tom HeathMichel Dumontier is visiting Manchester this week, he will be doing a seminar on Monday 11th of May,  here are some details for anyone who is interested in attending:

Title: Increasingly Accurate Representation of Biochemistry

Speaker: Michel Dumontier, dumontierlab.com

Time: 14.00, Monday 11th May 2009
Venue: Atlas 1, Kilburn Building, University of Manchester, number 39 on the Google Campus Map

Abstract: Biochemical ontologies aim to capture and represent biochemical entities and the relations that exist between them in an accurate manner. A fundamental starting point is biochemical identity, but our current approach for generating identifiers is haphazard and consequently integrating data is error-prone. I will discuss plausible structure-based strategies for biochemical identity whether it be at molecular level or some part thereof (e.g. residues, collection of residues, atoms, collection of atoms, functional groups) such that identifiers may be generated in an automatic and curator/database independent manner. With structure-based identifiers in hand, we will be in a position to more accurately capture context-specific biochemical knowledge, such as how a set of residues in a binding site are involved in a chemical reaction including the fact that a key nitrogen atom must first be de-protonated. Thus, our current representation of biochemical knowledge may improve such that manual and automatic methods of biocuration are substantially more accurate.

Update: Slides are now available via SlideShare.

[Creative Commons licensed picture of Michel in action at ISWC 2008 from Tom Heath]


  1. Michel Dumontier and Natalia Villanueva-Rosales (2009) Towards pharmacogenomics knowledge discovery with the semantic web Briefings in Bioinformatics DOI:10.1093/bib/bbn056
  2. Doug Howe et al (2008) Big data: The future of biocuration Nature 455, 47-50 doi:10.1038/455047a

December 2, 2008

SWAT4LS: The Semantic Web in Scotland

James Clerk MaxwellLast Friday, the UK National e-Science Centre in Edinburgh hosted a workhop, Semantic Web Applications and Tools for the Life Sciences (see SWAT4LS.org for the full details). Here are some incomplete and abbreviated notes from the workshop where there were some interesting people, paperware and software.

People and Paperware

70 people registered to attend SWAT4LS in total, many familiar names and faces, plus some new people I’ve never met before: (more…)

October 30, 2008

Congratulations Matthew Horridge!

George Best Genius by sahmeepeeSo, congratulations are due to Matthew Horridge, Bijan Parsia and Ulrike Sattler from The University of Manchester for winning the keenly fought best paper prize at the International Semantic Web Conference [ISWC 2008] in Karlsruhe for their paper “Laconic and Precise Justifications in OWL”. An abstract of the paper is reproduced below:

“A justification for an entailment in an OWL ontology is a minimal subset of the ontology that is sufficient for that entailment to hold. Since justifications respect the syntactic form of axioms in an ontology, they are usually neither syntactically nor semantically minimal. This paper presents two new subclasses of justifications—laconic justifications and precise justifications. Laconic justifications only consist of axioms that do not contain any superfluous “parts”. Precise justifications can be derived from laconic justifications and are characterised by the fact that they consist of flat, small axioms, which facilitate the generation of semantically minimal repairs. Formal definitions for both types of justification are presented. In contrast to previous work in this area, these definitions make it clear as to what exactly “parts of axioms” are. In order to demonstrate the practicability of computing laconic, and hence precise justifications, an algorithm is provided and results from an empirical evaluation carried out on several published ontologies are presented. The evaluation showed that laconic/precise justifications can be computed in a reasonable time for entailments in a range of ontologies that vary in size and complexity. It was found that in half of the ontologies sampled there were entailments that had more laconic/precise justifications than regular justifications. More surprisingly it was observed that for some ontologies there were fewer laconic justifications than regular justifications.”

But what does it all mean? One of the results of this research project has been an explanations plug-in for the Protégé ontology editor, see explanation in OWL at http://owl.cs.manchester.ac.uk. This helps users to understand when and why the reasoning goes all pear-shaped through better explanations than has previously been possible. So this is another step toward making building better ontologies with the Web Ontology Language (OWL) easier and less confusing. Yay!


And the winner is... by guitarfish

  1. Matthew Horridge, Bijan Parsia and Ulrike Sattler (2008). Laconic and Precise Justifications in OWL Lecture Notes in Computer Science, LNCS Volume 5318/-1 The Semantic Web – ISWC 2008 DOI:10.1007/978-3-540-88564-1_21

[Picture of Manchester United player George Best by Sammy, Best paper prize picture by guitarfish]

Next Page »

Blog at WordPress.com.