Some rough and ready notes from day two of the first ChEBI workshop, 20th May 2008. There were two talks, one from Kirill Degtyarenko (European Patent Office) and the other from Janna Hastings (EBI), followed by a discussion.
Kirill Degtyarenko: Good annotation practice for chemical data, ChEBI experience
Kirill’s talk described how to give the most appropriate names, especially since “biologists don’t name things properly, if at all” (!). Systematic (IUPAC) names are usually better than common names except for “the unprounounceables” for example, an antibiotic called (E)-roxithromycin (ChEBI:48935) has the IUPAC name:
…which just trips of the tongue (and fits beautifully, without line breaks onto regular computer screens). Fortunately, the curator can draw the chemical (note the wavy bond, unknown stereochemistry), using the curator tools, then the inchi and smiles strings are generated from the drawing. Currently they use something called ACD/Name which can generate PubChem links automatically. As of May 2008 14,000 chebi ids translates to around 11,000 CIDs in PubChem, which is structures only.
Janna Hastings: Hands-on ChEBI training
Janna gave a great hands-on training session, in four blocks
- Searching and Browsing ChEBI
- Understanding the ChEBI ontology
- Download and programmatic access
Some (but not all) of this material is also available in the ChEBI user manual. The exercises are (and were) a great way to understand how ChEBI works,
but I can’t seem to find them online, see the ChEBI tutorial materials.
Following the tutorial, there was a general discussion including:
How many curator years would it take to get to (say) half a million compounds in ChEBI? Paul de Matos, it took 5 years to get 15,000 entries, one human curator does around 1,500 entries per year.
Applying swissprot-trembl model to ChEBI. Automatic but lower quality curation, with higher coverage (TrEMBL) vs. Human curated, higher quality, lower coverage (SWISSPROT) – ChEBI currently follows swissprot model. Also, some questions about coconut oil (natural product) from linoleic acid, not really a compound.
Summary: ChEBI is fairly small but growing, with an interesting future. One of the things that sets ChEBI apart from other Chemical compound dictionaries / ontologies / databases etc is the fact that is freely available. So, long live ChEBI and all who sail in her!
I thoroughly enjoyed the workshop, thanks to everyone involved, I had a great time and even learnt some Organic Chemistry, as a bonus. Because we finished at lunchtime, this left a free afternoon to wander around Cambridge
Chemistry + Cambridge → Chembridge?
There is so much Chemistry-related research going on in and around Cambridge (Sanger, EBI, Biochemistry, Chemistry, Babraham etc), maybe they should consider renaming it “Chembridge”? It’s not so much Silicon Fen but Carbon Fen… Somebody has thought of it already, chembridge.com.
Anyway, while in Cambridge, it would be rude not to drop in on Peter Murray-Rust. Peter kindly showed me a little of the The Unilever Centre for Molecular Science Informatics. He gave a quick demo of MACIE (joint project with Janet Thornton), this has some nice visualisation tools (a bit like animaps) based on SVG and CMLReact and some Microsoft Word-based tools for authoring chemical data, funded by (guess who?) Microsoft.
Also spoke a little with Jim Downing (working with CML and JUMBO). Can easily generate CML from a Molfile, but I’m still trying to work out what the benefits of CML might be (tools? APIs? other cool stuff?), and if they are justified by the cost. Also time to catch up with Andrew Walkingshaw (GOLEM and CrystalEye demo). Thanks Andrew and Peter for taking the time to show me around some of “Chembridge”.
[Gratuitous Margaret Thatcher link above, Chemist, Prime Minster and patron of Crowne House Hotel, one of the EBI’s favoured guest houses. They do a really excellent English Cooked Breakfast, fit for Prime Ministers and paupers alike.]
- Holliday, G. L., Murray-Rust, P., and Rzepa, H. S. (2006). Chemical markup, xml, and the world wide web. 6. cmlreact, an xml vocabulary for chemical reactions. J. Chem. Inf. Model., 46(1):145-157. DOI:10.1021/ci0502698
- Holliday, G. L., Almonacid, D. E., Bartlett, G. J., O’Boyle, N. M., Torrance, J. W., Murray-Rust, P., Mitchell, J. B., and Thornton, J. M. (2007). Macie (mechanism, annotation and classification in enzymes): novel tools for searching catalytic mechanisms. Nucleic Acids Res, 35(Database issue), pubmed.gov/17082206, DOI:10.1093/nar/gkl774