Chemical Entitites of Biological Interest, ChEBI, is a freely available dictionary  of molecular entities, especially small chemical compounds. Like all big dictionaries and ontologies, it has its own unique challenges. Fortunately, those nice people at the EBI are holding a workshop to discuss future developments in ChEBI. In preparation for the workshop, here are some brief notes on how ChEBI could be made better. [Disclaimer: I'm fairly new to ChEBI and "thinking out loud" here, add comments below if I've said anything stupid or wrong]
ChEBI: Too much, too young?
Some dictionaries try to describe too much. When it comes to writing down knowledge, it isn’t always easy to know where to stop. To define scope, the BI in ChEBI stands for “Biological Interest”. So this begs the question, why does ChEBI describe all sorts of subatomic particles that are of little (or no) biological relevance? While electrons (ChEBI:10545) and protons (ChEBI:24636) play an important role in Biology, you have to wonder what the biological interest of neutrinos (ChEBI:36352) and bosons (ChEBI:36341) is. Who decides what is “biologically interesting” and how?
Then there is the inescapable legacy of IUPAC, which ChEBI aligns itself with closely, but unfortunately IUPAC is a bit dated and cumbersome (or so I’m told).
ChEBI: I just can’t get enough?
Some people are never happy. Take any dictionary or ontology and they will pick holes in it. “It doesn’t say this, it doesn’t say that, this is wrong” etc. In no particular order:
- The master copy of ChEBI is stored in an Oracle database. However, a common way of sharing ChEBI is the be OBO flat file format, but this is difficult to reason with. This means you can’t easily check ChEBI for contradictions and many of the relationships in ChEBI (“is-a” etc) have to be maintained by hand. This is a tedious and error prone process, where some relationships could be inferred by a reasoner. A mapping from OBO to OWL is available to make reasoning possible in the future.
- ChEBI could be much better aligned and related with other ontologies [2,3], like the Gene Ontology for example.
- ChEBI could also be aligned with wikipedia as well, no seriously, I’m not joking (and neither is Peter Murray-Rust)!
- Many of the structures shown are not what the id says, e.g. glucose-6-phosphate (ChEBI:17665) is not the acid in the picture and or the one represented by the inchi.
- Things like Mg-ADP and Mg-ATP which is the form most present in biology, are not currently present
- Sugar alcohols are poorly represented, are all the hexoses (ChEBI:18133) present as well?
- Could be more closely integrated with
LIPID Metabolites And Pathways Strategy (lipidmaps.org), ChEBI already does this.
If I missed anything off the list, of things that are “wrong” with ChEBI, please let me know. If you’re going to the workshop, see you there (alongwith Christoph Steinbeck and maybe Kirill Degtyarenko I suppose)
- Kirill Degtyarenko et al  ChEBI: a database and ontology for chemical entities of biological interest Nucleic Acids Research January; 36(Database issue): D344-D350 doi:10.1093/nar/gkm791 pubmed.gov/17932057
- Michael Bada and Lawrence Hunter  Identification of OBO Nonalignments and Its Implications for OBO Enrichment. Bioinformatics. 2008 May 7 [Epub ahead of print] pubmed.gov/18463117, 10.1093/bioinformatics/btn194
- Michael Bada, Robert Stevens, Carole Goble, Yolanda Gil, Michael Ashburner, Judith A Blake, Michael J Cherry, Midori Harris and Suzanna Lewis (2004) A short study on the success of the Gene Ontology Journal of Web Semantics, 1(2):235-240, DOI:10.1016/j.websem.2003.12.003
Gratuitous musical link: Much too much, much too young… The Specials
This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.