Christoph Steinbeck has already written some ChEBI notes, these just add a little more detail.Some notes from day one of the first ChEBI workshop, 19th May 2008. There were four talks from Colin Batchelor (Royal Society of Chemistry), Ulrike Witting (EML Research GmbH Hiedelberg), Giles Weaver (Unilever) and Paula de Matos (EBI).
1. Colin Batchelor: text mining and ontological best practice
Colin Batchelor discussed ambiguities in the ChEBI ontology. Currently, one of the most important ambiguities in ChEBI is the overloaded “is-a” (Notorious ISA ) relation, which in most biomedical ontologies should be transitive , for example if we state that
Elephant ISA Animal and that an
African Elephant ISA Elephant then, we also imply that an
African Elephant ISA Animal.
However, ISA in ChEBI is NOT transitive, which leads to problematic ambiguity.
For example, as of May 2008, release 44 of ChEBI states that
Colin classified ambiguous ISA’s into four basic kinds:
- An amount of a compound has a biological role, e.g. tris ISA buffer
- An amount of a compound has an application, e.g. sodium dodecyl sulfate ISA detergent
- A less-abstract type is an example of a more abstract type, e.g. propane ISA alkanes
- Just plain weird, e.g. metals ISA atoms, (problems with plurals to represent classes also raised by Jane Lomax of the gene ontology)
2. Ulrike Wittig EML heidelberg sabio-rk
Ulrike Wittig discussed ChEBI in the context and sabio-rk, for association of chemical compound information and reaction kinetics data. Describe data flow, input, curation and user interfaces, e.g. Sabio-rk can export as sbml via web services. In summary:
- Curated data based on literature information, currently around 2000 papers with more than 25,000 entries
- Platform for kinetic data storage and exchange
- Free and available for academic use
- Uses existing standard data formats, controlled vocabularies and ontologies
- Links to original data sources
3. Giles Weaver, Unilever
Giles talked about Genome Scale Metabolic Networks and ChEBI, compared BioCyc (ecocyc) BIGG vs ChEBI. Touched on the importance and sensitivity of environmental issues at Unilever, following the sustainable palm oil protests by Greenpeace. Copy of slides unavailable.
4: Paula de Matos, ChEBI: The story so far
Private data vs. public data characterises much of the history of ChEBI. Three main principles behind ChEBI, should be
- Free, nothing held in the database must be proprietary
- State provenance (every data item in the database should be fully traceable an explicitly referenced to the original source)
- Data availble without constraint, for example as database table dumps etc
ChEBI is available in french, german, spanish and latin (russian also?), see ChEBI sourceforge tracker page for latest developements. Links to patent databases, currently manual, but will become automatic.
Down the pub, The George in Babraham
The workshop was briefly joined by petrol-heads Jeremy Clarkson, Richard Hammond and James May. They quickly returned back to filming Top Gear after discovering Petrol was not considered to be a Chemical of Biological Interest. (disclaimer: this is only a slight distortion of the truth)
The pub, where many the most important (CHEBI:16236 stimulated) scientific discussions take place. Including: If money was no object, how big should ChEBI be? How many biologically interesting molecules are there? Somewhere between 100,000 and 1,000,000 (if you include plant secondary metabolites). Currently ChEBI only has around 14,000 compounds. Markush structures, functionally equivalent R groups. No trip to the pub with a group of Chemists can avoid a discussion of Markush structures.
Other attendees (incomplete list)
Excluding speakers and organisers:
- Michael Ashburner, Richard Cammack, Peter Corbett (brains behind OSCAR), Helen Parkinson, Bernard De Bono, Jane Lomax, Christopher Southan, Alan Tonge, Dominic Clark…
- ChEBI team: Paul de Matos, Janna Hastings, Christoph Steinbeck, Kirill Degtyarenko, Marcus Ennis
- Industrial participants from Astra Zeneca, Unilever, Accelrys, European Patent Office.
Outcomes from day one
- ISA relations will be fixed, probably within 6 months
- Coverage of ChEBI will continue to be increased, but limited by human curation power available.
- Possibility to add chemical compounds implied in gene ontology, e.g. Cysteine Biosynthesis implies cysteine without explicitly naming it. These might be good candidates for inclusion, on top of the user requests.
- Hangovers all round
- Brachman, R.J. (1983) What is-a is and isn’t: An analysis of taxonomic links in semantic networks. IEEE Computer, 16(10):30-36. DOI:10.1109/MC.1983.1654194
- Smith, B., Ceusters, W., Klagges, B., Köhler, J., Kumar, A., Lomax, J., Mungall, C., Neuhaus, F., Rector, A. L., and Rosse, C. (2005). Relations in biomedical ontologies. Genome Biology, 6(5). doi:10.1186/gb-2005-6-5-r46