February 25, 2010

Apache Maven: A Misbehavin’ Build Tool?

Filed under: ChEBI,programming,technology — Duncan Hull @ 11:00 am
Tags: , , , , , ,

Chocolate Tools by JanneMOne of the many tools we use in our team to manage the development of the ChEBI software is an automated build tool called Apache Maven. Opinions are often divided on whether Maven is a good or a bad thing. Most of them are very subjective, argumentative and often very extended. See why does Maven have such a bad reputation? and 25 things* I hate about Maven for examples.

All this is fairly predictable, and I could add a few tales of Maven woe to the pile myself. But wondering if Maven is any good reminded me of something Bjarne Stroustrup [1,2,3] (one of the people behind the C++ programming language) once said in an article on the problem with programming:

“There are just two kinds of [programming] languages: the ones everybody complains about and the ones nobody uses.”

Actually when you think about it this applies to build systems too, there are two kinds. It also applies to just about any technology you care to name, you can crudely classify them all into two categories:

  1. Those technologies everybody complains about…
  2. … and the rest, that nobody uses.

So is Maven any good? Worth using? Worth the pain? Depends on who you ask. What we can say for sure, is that like many technologies, everybody complains about it.


  1. Bjarne Stroustrup (2010). Viewpoint: What should we teach new software developers? Why? Communications of the ACM, 53 (1) DOI: 10.1145/1629175.1629192
  2. Bjarne Stroustrup (2007). Evolving a language in and for the real world: C++ 1991-2006 Proceedings of the third ACM SIGPLAN conference on History of programming languages DOI: 10.1145/1238844.1238848
  3. Bjarne Stroustrup (1993). A history of C++: 1979–1991 The second ACM SIGPLAN conference on History of programming languages DOI: 10.1145/154766.155375

* Only 25? That seems like quite a short list to me.

[CC-licensed Chocolate Tools image by JanneM, some commentary on this post over at friendfeed.]

February 12, 2010

The 3rd OBO Foundry Workshop 2010, Cambridge, UK

Ultrawide Wellcome Trust Genome Campus, Cambridge by Tim NugentThe Open Biomedical Ontologies (OBO) [1] are a set of reference ontologies for describing all kinds of biomedical data shared in a centralised repository called The OBO Foundry. Every year, users and developers of these ontologies gather from around the globe for a workshop at the EBI near Cambridge, UK. Following on from the first workshop two years ago, the second workshop last year it’s already time for the third workshop on February 15th-16th. All the details and agenda are here if you’re interested. This workshop is possible thanks to sponsorship from the BBSRC funds for Workshop on Data Standards and the EU ELIXIR ‘Data Integration & Interoperability’ Package 7.

[Update: outcomes from the workshop are available here, along with a summary of discussion from Monday and a summary of discussion from Tuesday.]


  1. Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L., Eilbeck, K., Ireland, A., Mungall, C., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S., Scheuermann, R., Shah, N., Whetzel, P., & Lewis, S. (2007). The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration Nature Biotechnology, 25 (11), 1251-1255 DOI: 10.1038/nbt1346

[Ultrawide panoramic picture of the Wellcome Trust Genome Campus by Tim Nugent, as featured on the cover of the EMBL-EBI Annual Scientific Report 2009. Making those pictures looks like a lot of fun.]

February 5, 2010

Classic paper: Montagues and Capulets in Science

Romeo and Juliet by HappyHippoSnacksIn preparation for a joint seminar I’ll be doing with Midori Harris here at the EBI, here’s a classic paper [1,2] on the social problems of building biomedical ontologies. This paper is worth reading (or re-reading) because it makes lots of relevant points about the use and abuse of research and how people misunderstand each other [3]. It’s funny (and available Open Access too) plus how many papers do you read with an abstract written in the style of Big Bard Bill Shakespeare?

ABSTRACT: Two households, both alike in dignity, In fair Genomics, where we lay our scene, (One, comforted by its logic’s rigour, Claims ontology for the realm of pure, The other, with blessed scientist’s vigour, Acts hastily on models that endure), From ancient grudge break to new mutiny, When ‘being’ drives a fly-man to blaspheme. From forth the fatal loins of these two foes, Researchers to unlock the book of life; Whole misadventured piteous overthrows, Can with their work bury their clans’ strife. The fruitful passage of their GO-mark’d love, And the continuance of their studies sage, Which, united, yield ontologies undreamed-of, Is now the hour’s traffic of our stage; The which if you with patient ears attend, What here shall miss, our toil shall strive to mend.

So if you read the paper, you have to ask yourself, are you a Montague or a Capulet?


  1. Carole Goble and Chris Wroe (2004). The Montagues and the Capulets Comparative and Functional Genomics, 5 (8), 623-632 DOI: 10.1002/cfg.442
  2. Carole Goble (2004) The Capulets and Montagues: A plague on both your houses?, SOFG: Standards and Ontologies for Functional Genomics
  3. William Shakespeare (1596) Romeo and Juliet

[Romeo and Juliet picture via Happy Hippo Snacks]

January 21, 2010

Blogging a Book about Bio-Ontologies

Waterloo Station Ultrawide Panoramic by Tim NugentIf you wanted to write a guide to Biomedical and Biological Ontologies [1], especially the what, why, when, how, where and who, there are at least three choices for publishing your work:

  1. Journal publishing in your favourite scientific journal.
  2. Book publishing with your favourite academic or technical publisher.
  3. Self publishing on a web blog with your favourite blogging software.

Each of these has its own unique problems:

  • The trouble with journals is that they typically don’t publish “how to” guides, although you might be able to publish some kind of review.
  • The trouble with books, and academic books in particular, is that people (and machines) often don’t read them. Also, academic books can be prohibitively expensive to buy and this can make the data inside them less visible and accessible to the widest audience. Unfortunately all that lovely knowledge gets locked up behind publishers paywalls. To add insult to injury, most academic books take a very long time to publish, often several years. By the time of printing, the content of many academic books is often very dated.
  • The trouble with blogs, they aren’t peer-reviewed in the traditional way and they tend to be written by a single person from a not very neutral point of view. Or as Dave once put it “vanity publishing for arrogant people with an inflated ego“. Ouch.

So the people behind the Ontogenesis network (Robert Stevens and Phillip Lord with funding from the EPSRC grant ref: EP/E021352/1) had an idea. Why not blog a book about Ontology? As a publishing experiment – it might just work by combining the merits of books and blogs together in order to overcome their shortcomings. This will involve getting a small group of about twenty people (mostly bio-ontologists) together, and writing about what an ontology is, why you would want to a biomedical ontology, how to build one and so on. We will be doing some of the peer-review online too.

As part of an ongoing experiment, we are posting all this information on a blog called http://ontogenesis.knowledgeblog.org if you’d like to follow, subscribe to the feed and read the manifesto.


  1. Yu, A. (2006). Methods in biomedical ontology Journal of Biomedical Informatics, 39 (3), 252-266 DOI: 10.1016/j.jbi.2005.11.006

[Ultrawide panoramic picture of Waterloo station by Tim Nugent]

January 15, 2010

Bio2RDF: Large Scale, Distributed Biological Knowledge Discovery

Filed under: ChEBI — Duncan Hull @ 2:11 pm
Tags: , , , , , , ,

Bio2RDFMichel Dumontier was visiting the EBI this week, here’s the details of his seminar Bio2RDF and Beyond! Large Scale, Distributed Biological Knowledge Discovery (slides embedded below) for anyone interested who missed it:

Abstract: The Bio2RDF.org [1] project aims to transform silos of bioinformatics data into a distributed platform for biological knowledge discovery. Initial work focused on building a public database of open-linked data with web-resolvable identifiers that provides information about named entities. This involved a syntactic normalization to convert open data represented in a variety of formats (flatfile, tab, xml, web services) to RDF-based linked data with normalized names (HTTP URIs) and basic typing from source databases. Bio2RDF entities also make reference to other open linked data networks (e.g. dbPedia) thus facilitating traversal across information spaces. However, a significant problem arises when attempting to undertake more sophisticated knowledge discovery approaches such as question answering or symbolic data mining. This is because knowledge is represented in a fundamentally different manner, requiring one to know the underlying data model and reconcile the artefactual differences when they arise. In this talk, we describe our data integration strategy that makes use of both syntactic and semantic normalization to consistently marshal knowledge to a common data model while leveraging explicit logic-based mappings with community ontologies to further enhance the biological knowledgescope. Coupled with the web-service based Semantic Automated Discovery and Integration (SADI) framework, Bio2RDF is well placed to serve up biological data for prediction and analysis.

Some quick notes: Bio2RDF is currently indexing around 5 billion triples, and is built with the open source Virtuoso database. There are some scalability issues in making the system cope with up to a total of 15+ billion triples currently required. There is nothing in Bio2RDF yet that deals with the redundancy problem, e.g. “buggotea” and its friends.


  1. Belleau, F., Nolin, M., Tourigny, N., Rigault, P., & Morissette, J. (2008). Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Journal of Biomedical Informatics, 41 (5), 706-716 DOI: 10.1016/j.jbi.2008.03.004

January 11, 2010

Abscisic Acid: Entity of the Month

Sweetgum bud by Martin LaBarHappy New Year from the ChEBI team where release 64 is now available, containing 534,142 total entities, of which 19,645 are annotated entities and 693 were submitted via the ChEBI submission tool. This month’s entity of the month is Abscisic acid.

(+)-Abscisic acid (CHEBI:2365), known commonly just as abscisic acid or ABA, is a ubiquitous isoprenoid plant hormone which is synthesized in the methylerythritol phosphate (MEP) pathway (also known as the non-mevalonate pathway) by cleavage of C40 carotenoids.

First identified and characterised in 1963 by Fredrick Addicott and his associates at the University of California, Davis [1], ABA was originally believed to play a major role in abscission of fruits (hence its early name of ‘abscisin II’). This is now known to be true for only a small number of plants, a wider role being to act as a regulator of plant responses to a variety of environmental stresses such as drought, extremes of temperatures, and high salinity. Such responses include stimulating the closure of stomata, inhibiting shoot growth while not affecting root growth, and inducing seeds to synthesise storage proteins.

Because of its essential function in plant physiology, targeting the ABA signalling pathway holds considerable promise for future applications in agriculture. Now, in a recent issue of Nature, Ning Zheng and his co-worker Laura Sheard from the University of Washington summarise recent converging studies which reveal the details of how ABA transmits its message [2]. In particular, an article by an international team led by Eric Xu of the Van Andel Research Institute describes how their crystallographic work on unbound ABA and ABA bound to some of its receptors, together with extensive biochemical studies from elsewhere, identify a conserved gate–latch–lock mechanism underlying ABA signalling [3].


  1. Ohkuma, K., Lyon, J., Addicott, F., & Smith, O. (1963). Abscisin II, an Abscission-Accelerating Substance from Young Cotton Fruit Science, 142 (3599), 1592-1593 DOI: 10.1126/science.142.3599.1592
  2. Sheard, L., & Zheng, N. (2009). Plant biology: Signal advance for abscisic acid Nature, 462 (7273), 575-576 DOI: 10.1038/462575a
  3. Melcher, K., Ng, L., Zhou, X., Soon, F., Xu, Y., Suino-Powell, K., Park, S., Weiner, J., Fujii, H., Chinnusamy, V., Kovach, A., Li, J., Wang, Y., Li, J., Peterson, F., Jensen, D., Yong, E., Volkman, B., Cutler, S., Zhu, J., & Xu, H. (2009). A gate–latch–lock mechanism for hormone signalling by abscisic acid receptors Nature, 462 (7273), 602-608 DOI: 10.1038/nature08613

[CC-licensed picture of sweetgum bud by Martin Labar]

December 21, 2009

Happy Christmas Lectures 2009

Sue Hartley: Christmas lecturerIf you weren’t able to attend this years Christmas lectures in person, they are being televised tonight in the UK on More4 from 7pm. This year, they are given by Professor Sue Hartley [1] (pictured right) from the University of Sussex. Here is some blurb on the series from the Royal Institution called “The 300 million year war“.

Plants might seem passive, defenceless and almost helpless. But they are most definitely not! Thanks to a war with animals that’s lasted over 300 million years, they’ve developed many terrifying and devious ways to defend themselves and attack their enemies. Vicious poisons, lethal materials and even cunning forms of communicating with unlikely allies are just some of the weapons in their armoury. Using these and other tactics, plants have seen off everything from dinosaurs to caterpillars.

In the 2009 Royal Institution Christmas Lectures, Professor Sue Hartley will show you plants as you’ve never seen them before. They are complicated, cunning, beautiful and with plenty of tricks up their sleeve. And what’s more, we humans are dependent on them in ways you’d never imagine. As well as much of our food, our drugs, medicines and materials are all by-products of this epic 300 million year war.

So if you’re festively feasting this holiday, those brussel sprouts, carrots, potatoes won’t look so innocent now. The lectures are aimed at children, but can be enjoyed by kids of all ages (including grown ups). You can follow some of the action on twitter: hashtag #xmaslectures and @rigb_science. Speaking of Brussel sprouts, the related Royal Institution video How Much Methane Does A Cow Produce In An Hour? might also be of interest.

Since it’s the end of the year, happy holidays to you all (thanks for visiting O’Really?) hope to see you again in 2010.


  1. Hartley, S., & Gange, A. (2009). Impacts of Plant Symbiotic Fungi on Insect Herbivores: Mutualism in a Multitrophic Context Annual Review of Entomology, 54 (1), 323-342 DOI: 10.1146/annurev.ento.54.110807.090614

December 11, 2009

The Semantic Biochemical Journal experiment

utopian documentsThere is an interesting review [1] (and special issue) in the Biochemical Journal today, published by Portland Press Ltd. It provides (quote) “a whirlwind tour of recent projects to transform scholarly publishing paradigms, culminating in Utopia and the Semantic Biochemical Journal experiment”. Here is a quick outline of the publishing projects the review describes and discusses:

  • Blogs for biomedical science
  • Biomedical Ontologies – OBO etc
  • Project Prospect and the Royal Society of Chemistry
  • The Chemspider Journal of Chemistry
  • The FEBS Letters experiment
  • PubMedCentral and BioLit [2]
  • Public Library of Science (PLoS) Neglected Tropical Diseases (NTD) [3]
  • The Elsevier Grand Challenge [4]
  • Liquid Publications
  • The PDF debate: Is PDF a hamburger? Or can we build more useful applications on top of it?
  • The Semantic Biochemical Journal project with Utopia Documents [5]

The review asks what advances these projects have made  and what obstacles to progress still exist. It’s an entertaining tour, dotted with enlightening observations on what is broken in scientific publishing and some of the solutions involving various kinds of semantics.

One conclusion made is that many of the experiments described above are expensive and difficult, but that the costs of not improving scientific publishing with various kinds of semantic markup is high, or as the authors put it:

“If the cost of semantic publishing seems high, then we also need to ask, what is the price of not doing it? From the results of the experiments we have seen to date, there is clearly a need to move forward and still a great deal of scope to innovate. If we fail to move forward in a collaborative way, if we fail to engage the key players, the price will be high. We will continue to bury scientific knowledge, as we routinely do now, in static, unconnected journal articles; to sequester fragments of that knowledge in disparate databases that are largely inaccessible from journal pages; to further waste countless hours of scientists’ time either repeating experiments they didn’t know had been performed before, or worse, trying to verify facts they didn’t know had been shown to be false. In short, we will continue to fail to get the most from our literature, we will continue to fail to know what we know, and will continue to do science a considerable disservice.”

It’s well worth reading the review, and downloading the Utopia software to experience all of the interactive features demonstrated in this special issue, especially the animated molecular viewers and sequence alignments.

Enjoy… the Utopia team would be interested to know what people think, see commentary on friendfeed,  the digital curation blog and youtube video below for more information.


  1. Attwood, T., Kell, D., McDermott, P., Marsh, J., Pettifer, S., & Thorne, D. (2009). Calling International Rescue: knowledge lost in literature and data landslide! Biochemical Journal, 424 (3), 317-333 DOI: 10.1042/BJ20091474
  2. Fink, J., Kushch, S., Williams, P., & Bourne, P. (2008). BioLit: integrating biological literature with databases Nucleic Acids Research, 36 (Web Server) DOI: 10.1093/nar/gkn317
  3. Shotton, D., Portwin, K., Klyne, G., & Miles, A. (2009). Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article PLoS Computational Biology, 5 (4) DOI: 10.1371/journal.pcbi.1000361
  4. Pafilis, E., O’Donoghue, S., Jensen, L., Horn, H., Kuhn, M., Brown, N., & Schneider, R. (2009). Reflect: augmented browsing for the life scientist Nature Biotechnology, 27 (6), 508-510 DOI: 10.1038/nbt0609-508
  5. Pettifer, S., Thorne, D., McDermott, P., Marsh, J., Villéger, A., Kell, D., & Attwood, T. (2009). Visualising biological data: a semantic approach to tool and database integration BMC Bioinformatics, 10 (Suppl 6) DOI: 10.1186/1471-2105-10-S6-S19

December 5, 2009

Adrenaline: Entity of the Month

XML Summer School, Oxford, U.K.December’s entity of the month at ChEBI is Adrenaline, for all the adrenaline junkies out there. This accompanies ChEBI release 63, containing 536,978 total entities, of which 19,501 are annotated entities and 678 were submitted via the ChEBI submission tool. Text reproduced below from the ChEBI website:

Adrenaline (CHEBI:33568), also known as epinephrine, is a catecholamine that acts as a hormone and neurotransmitter.

It was first isolated from an extract of the suprarenal (adrenal) gland as its mono-benzoyl derivative by the American biochemist and pharmacologist John Jacob Abel in 1889 [1] who later also crystallised it as a hydrate. The pure compound was produced in 1901 by the Japanese industrial chemist Jokichi Takamine [2] and patented as ‘Adrenalin’. Two chemists, Stolz and Dakin, independently reported the synthesis of the compound in 1904 [3,4].

Adrenaline is a potent ‘fight-or-flight’ hormone, which is produced in stress situations. When produced in the body, it leads to an increase in heart-rate, vasodilation and the supply of both glucose and oxygen to the muscles and the brain, thus preparing the body for rapid action if needed. The increase in glucose supply is achieved through the binding of adrenaline to β-adrenergic receptors in the liver. This triggers the adenylate cyclase pathway, which, in turn, leads to increased glycogenolysis activity. On the other hand, adrenaline suppresses both digestive processes as well as immune responses. As such, it can be used in the treatment of anaphylactic shock [5] as well as for the treatment of cardiac arrest and cardiac disrythmias [6].

The biosynthesis of adrenaline is regulated by the central nervous system. It is ultimately derived from L-tyrosine, which is converted into L-dihydroxyphenylalanine (L-DOPA) by the action of tyrosine 3-monooxygenase (EC Adrenaline is produced through the conversion of L-DOPA into dopamine into noradrenaline into adrenaline itself.


  1. Abel, J.J. (1899) Ueber den blutdruckerregenden Bestandtheil der Nebenniere, das Epinephrin. Z. Physiol. Chem. 18, 318–324.
  2. Takamine, J., (1902) The isolation of the active principle of the suprarenal gland. J. Physiol. 27 (Suppl), xxix–xxx.
  3. Stolz, F. (1904) Ueber Adrenalin und Alkylaminoacetobrenzkatechin. Ber. Dtsch. Chem. Ges. 37, 4149–4154.
  4. Dakin, H.D. (1905) The synthesis of a substance allied to noradrenaline. Proc. Roy. Soc. Lon. Ser. B 76, 491–497.
  5. ANCHOR, J. (2004). Appropriate use of epinephrine in anaphylaxis The American Journal of Emergency Medicine, 22 (6), 488-490 DOI: 10.1016/j.ajem.2004.07.016
  6. Rainer TH, & Robertson CE (1996). Adrenaline, cardiac arrest, and evidence based medicine. Journal of accident & emergency medicine, 13 (4), 234-7 PMID: 8832338

[CC licensed picture of dan wakeham pipe by jeffcapeshop]

December 3, 2009

It’s Snowing (JavaScript)!

You know it’s December when it starts snowing in your web browser. Let it snow, let it snow, let it snow!

Or programmatically:

snowStorm = new SnowStorm();

There was a time, not so very long ago when JavaScript snow would have been “best viewed in browser x”. Thankfully now JavaScript much more reliable, the JBrowse [1] Genome Browser provides a nice example of this in bioinformatics. JBrowse is one of many proofs that JavaScript can be used to take some of the computing load off the server, and do it in the client (web browser) instead, while providing more sophisticated applications for users – not just gimmicks like snow.


  1. Skinner, M., Uzilov, A., Stein, L., Mungall, C., & Holmes, I. (2009). JBrowse: A next-generation genome browser Genome Research, 19 (9), 1630-1638 DOI: 10.1101/gr.094607.109

[Creative Commons licensed snowstorm picture by Atli Harðarson, JavaScript SnowStorm code by Scott Schiller, move your mouse around to guide the snowstorm.]

« Previous PageNext Page »

Blog at WordPress.com.