BBC: Building a Better ChEBI

May 15, 2008

BBC: Building a Better ChEBI

Filed under: semweb — Duncan Hull @ 4:09 pm
Tags: ChEBI, compound, dictionary, ebi, inchi, Kirill Degtyarenko, molecule, ontology, smiles, workshop

Chemical Entitites of Biological Interest, ChEBI, is a freely available dictionary [1] of molecular entities, especially small chemical compounds. Like all big dictionaries and ontologies, it has its own unique challenges. Fortunately, those nice people at the EBI are holding a workshop to discuss future developments in ChEBI. In preparation for the workshop, here are some brief notes on how ChEBI could be made better. [Disclaimer: I’m fairly new to ChEBI and “thinking out loud” here, add comments below if I’ve said anything stupid or wrong]

ChEBI: Too much, too young?

Some dictionaries try to describe too much. When it comes to writing down knowledge, it isn’t always easy to know where to stop. To define scope, the BI in ChEBI stands for “Biological Interest”. So this begs the question, why does ChEBI describe all sorts of subatomic particles that are of little (or no) biological relevance? While electrons (ChEBI:10545) and protons (ChEBI:24636) play an important role in Biology, you have to wonder what the biological interest of neutrinos (ChEBI:36352) and bosons (ChEBI:36341) is. Who decides what is “biologically interesting” and how?

Then there is the inescapable legacy of IUPAC, which ChEBI aligns itself with closely, but unfortunately IUPAC is a bit dated and cumbersome (or so I’m told).

ChEBI: I just can’t get enough?

Some people are never happy. Take any dictionary or ontology and they will pick holes in it. “It doesn’t say this, it doesn’t say that, this is wrong” etc. In no particular order:

The master copy of ChEBI is stored in an Oracle database. However, a common way of sharing ChEBI is the be OBO flat file format, but this is difficult to reason with. This means you can’t easily check ChEBI for contradictions and many of the relationships in ChEBI (“is-a” etc) have to be maintained by hand. This is a tedious and error prone process, where some relationships could be inferred by a reasoner. A mapping from OBO to OWL is available to make reasoning possible in the future.
ChEBI could be much better aligned and related with other ontologies [2,3], like the Gene Ontology for example.
ChEBI could also be aligned with wikipedia as well, no seriously, I’m not joking (and neither is Peter Murray-Rust)!
Many of the structures shown are not what the id says, e.g. glucose-6-phosphate (ChEBI:17665) is not the acid in the picture and or the one represented by the inchi.
Things like Mg-ADP and Mg-ATP which is the form most present in biology, are not currently present
Sugar alcohols are poorly represented, are all the hexoses (ChEBI:18133) present as well?
Could be more closely integrated with ~~LIPID Metabolites And Pathways Strategy (lipidmaps.org)~~, ChEBI already does this.

If I missed anything off the list, of things that are “wrong” with ChEBI, please let me know. If you’re going to the workshop, see you there (alongwith Christoph Steinbeck and maybe Kirill Degtyarenko I suppose)

References

Kirill Degtyarenko et al [2008] ChEBI: a database and ontology for chemical entities of biological interest Nucleic Acids Research January; 36(Database issue): D344-D350 doi:10.1093/nar/gkm791 pubmed.gov/17932057
Michael Bada and Lawrence Hunter [2008] Identification of OBO Nonalignments and Its Implications for OBO Enrichment. Bioinformatics. 2008 May 7 [Epub ahead of print] pubmed.gov/18463117, 10.1093/bioinformatics/btn194
Michael Bada, Robert Stevens, Carole Goble, Yolanda Gil, Michael Ashburner, Judith A Blake, Michael J Cherry, Midori Harris and Suzanna Lewis (2004) A short study on the success of the Gene Ontology Journal of Web Semantics, 1(2):235-240, DOI:10.1016/j.websem.2003.12.003

Gratuitous musical link: Much too much, much too young… The Specials

[Atomium, crystalline Iron ChEBI:18248, picture by victor abellón]. Thanks also to Paul Dobson and Doug Kell for help putting some of these notes together.

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

Comments (1)

1 Comment »

[…] Building a Better ChEBI […]

Pingback by Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » ChEBI — May 15, 2008 @ 10:58 pm | Reply

RSS feed for comments on this post. TrackBack URI

O'Really?

May 15, 2008