Michael Ashburner at the University of Cambridge once famously quipped that “Biologists would rather share their toothbrush than share a gene name” [1]. And so we have many different colourful and imaginative names for genes. The same mis-naming rule applies the reactants and products (input and output) of metabolism. Here are some example names, would you like to share my toothbrush chemical name? There are so many different toothbrushes names to choose from…
April 10, 2008
March 5, 2008
Cheminformatics 2.0
January 18, 2008
One Thousand Databases High (and rising)
Well it’s that time of year again. The 15th annual stamp collecting edition of the journal Nucleic Acids Research (NAR), also known as the 2008 Database issue [1], was published earlier this week. This year there are 1078 databases listed in the collection, 110 more than the previous one (see Figure 1). As we pass the one thousand databases mark (1kDB) I wonder, what proportion of the data in these databases will never be used?
R.I.P. Biological Data?
It seems highly likely that lots of this data is stored in what Usama Fayyad at Yahoo! Research! Laboratories! calls data tombs [2], because as he puts it:
“Our ability to capture and store data has far outpaced our ability to process and utilise it. This growing challenge has produced a phenomenon we call the data tombs, or data stores that are effectively write-only; data is deposited to merely rest in peace, since in all likelihood it will never be accessed again.”
Like last year, lets illustrate the growth with an obligatory graph, see Figure 1.
Figure 1: Data growth: the ability to capture and store biological data has far outpaced our ability to understand it. Vertical axis is number of databases listed in Nucleic Acids Research [1], Horizontal axis is the year. (Picture drawn with Google Charts API which is OK but as Alf points out, doesn’t do error bars yet).
Another day, another dollar database
Does it matter that large quantities of this data will probably never be used? How could you find out, how much and which data was “write-only”? Will Biologists ever catch up with the physicists when it comes to Very Large stamp collections Databases? Biological databases are pretty big, but can you imagine handling up to 1,500 megabytes of data per second for ten years as the Physicists will soon be doing? You can already hear the (arrogant?) Physicists taunting the Biologists, “my database is bigger than yours”. So there.
Whichever of these databases you are using, happy data mining in 2008. If you are lucky, the data tombs you are working will contain hidden treasure that will make you famous and/or rich. Maybe. Any stamp collector will tell you, some stamps can become very valuable. There’s Gold in them there hills databases you know…
- Galperin, M. Y. (2007). The molecular biology database collection: 2008 update. Nucleic Acids Research, Vol. 36, Database issue, pages D2-D4. DOI:10.1093/nar/gkm1037
- Fayyad, U. and Uthurusamy, R. (2002). Evolving data into mining solutions for insights. Communications of the ACM, 45(8):28-31. DOI:10.1145/545151.545174
- This post originally published on nodalpoint (with comments)
- Stamp collectors picture, top right, thanks to daxiang stef / stef yau
August 7, 2007
Scifoo: Geek Out! Le Geek, C’est Chic…
As well as big famous superstars at Science Foo Camp (scifoo), there is a chance to meet and “geek out” with younger engineers and scientists like Vince Smith, Aaron Schwartz and Vaughan Bell.
Aaron Schwartz and the open library project
On Sunday at scifoo, Aaron (of archive.org) gave a quick demo of the Open Library. Currently this project is taking books that are out of print and not in other book catalogues like Amazon, and making them available online. They are intending to move into archiving scientific journals, so watch that space. I’ve always wondered how the internet archive survived financially, and managed all its interesting projects (like the open library). It’s all funded by some bloke called Brewster Kahle. They provide some great services, like hosting digital artifacts for free, see http://www.archive.org/create/.
Vince Smith, Museums and Drupal
Vince Smith is a “cyber-taxonomist” at the Natural History Museum in London. He’s a world expert on parasitic lice, and uses a multi-site installation of Drupal, see vsmith.info (Hmmm, that drupal skin looks familiar…). Vince uses a drupal module for bibliographic citations, called biblio, looks handy. It’d be nice to have it on nodalpoint? Anyway, anytime spent looking around Vince’s site is time well spent.
Vaughan Bell, Mind Hacker
Vaughan Bell is a clinical psychologist. We chatted about wikipedia and science, as demonstrated by Schizophrenia. He’s also a contributor to a book on MindHacks and blogs at mindhacks.com. My suitcase is full of free O’Reilly book-schwag I filled my boots with on Friday, one of which is Vaughan’s book. Looks like it will be a good read on the plane home, because my brain is in need of some serious “optimisation”.
(Two more geeks, pictured right, but regular nodalpoint readers will know all about them already, Deepak Singh and Euan Adie.)
Theres plenty more I could blog about scifoo, but I’m all foo-ked up, geeked out and mashed-up. It’s time to go home. For more scifoo blogging see www.technorati.com/tags/scifoo, www.nature.com/scifoo and network.nature.com/blogs/tag/scifoo.
References
- Aaaaah: Freak Out! Le Freak, C’est Chic…
This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.
August 6, 2007
Scifoo day three: Genome Voyeurism with Lincoln Stein
On day three of Science Foo Camp (scifoo) biologist Lincoln Stein (picture right) gave a presenation on what he calls “genome voyeurism”, using Jim Watsons genome as an example. This session demonsrated the current and future possibilities of individuals having their own DNA sequenced, what has been called “personal genomics“.
Unlike the session on genomics yesterday on day two, where George Church, Eric Lander, 23andme, Sergey and Larry (and even Sergey’s pet dog) are all present, today they are conspicuously absent.
Lincolns presentation starts with a video (see youtube video below) of Jim Watson receiving his genome on a disk from Baylor College of Medicine, Houston. Lincoln tells how Jim puts his genome (stored on a hard drive) next to his Nobel prize medallion in his office. After all the press publicity, Jim deposits the data in GenBank, and it becomes available worldwide. (more…)
May 31, 2007
Google Metabolic Maps
These days, new Google products and code seem to appear on a weekly basis. Take, for example, Google Gears which takes advantage of SQLite, mentioned on nodalpoint recently. They certainly don’t hang about at the Googleplex in Mountain View, California. Wouldn’t it be great if Google applied some of that engineering expertise and agility to science and bioinformatics? Just imagine: we could have Google Metabolic Maps, a virtual globe of the cell for scientists everywhere…
Scientists have been drawing metabolic maps for a very long time, but unfortunately when it comes to charting and understanding metabolic pathways, we’re still at the “here be dragons” stage of bio-cartography. I’m obviously not the first person to dream of this, but imagine maps of metabolic pathways looked more like Google Earth or Google Maps, than the old fashioned style maps, many life scientists will be familiar with. Now imagine just a little more, that these maps weren’t just available on conventional screens, but we’re given the Minority Report treatment, courtesy of Mr Bill Gates and his wizzy surface magic at Microsoft. Wouldn’t that be great? Metabolic maps on an interactive tabletop computer. Just like Tom Cruise in the movies, we’d be able to effortlessly swish around metabolism (or the metabolome / proteome / genome / [insert-your-favourite]ome). Imagine if it was all open-source too, no boundaries, no passports…
Now, you may say that I’m a dreamer, but I’m not the only one [1,2,3].
References
- Zhenjun Hu, Joe Mellor, Jie Wu, Minoru Kanehisa, Joshua M. Stuart and Charles DeLisi (2007) Towards zoomable multidimensional maps of the cell Nature biotechnology 25 (5), 547-54. DOI:10.1038/nbt1304
- Hiroaki Kitano, Akira Funahashi, Yukiko Matuoka and Kanae Oda (2005) Using process diagrams for the graphical representation of biological networks Nature biotechnology 23 (8), 961-6. DOI:10.1038/nbt1111
- John Lennon and Yoko Ono (1971) Imagine
- this post originally published on nodalpoint with comments
This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.
December 12, 2006
Buggotea: Redundant Links in Connotea
Dear Santa, all I want for Christmas* is a better version of Connotea, please can you sort out it’s duplicated redundant links? In my book this particular bug is “buggotea” number one. Here is the problem… [update: buggotea is partially fixed, see comments from Ian Mulvany at the nodalpoint link in the references below]
There is this handy bioinformatics web application called Connotea which I like to use, built by those nice people in the web team at Nature Publishing Group. Most readers of nodalpoint probably already know about it, but because you’re Santa and you’ve been busy lately, let me explain. Connotea can help scientists (not just bioinformaticians) to organise and share their bibliographic references, whilst discovering what other people with similar interests are reading. It’s good, but it has some bugs in it. Since it’s open-source software, anyone with the time, inclination and skills can get hold of the connotea source code and improve it. There is, however, one particularly nasty redundancy bug in Connotea that is bugging me [1]. I think it should be fixable, and that doing so would make Connotea a significantly better application than it already is. Let’s illustrate this bug with a little story…
November 7, 2006
People 2.0: Pioneers of the next generation Web
UK news-rag The Grauniad has a series of interviews with some of the people behind the next generation web, so-called Web 2.0. After reading these interviews, I can’t help wondering, who are the equivalent pioneers in bioinformatics?
The interviews include…
…and several others too. Most of the interviews are worth reading, I particularly enjoyed Mullenweg’s which contains a wonderful quote:
Q: What is your big idea?
A: I don’t have big ideas. I sometimes have small ideas, which seem to work out.
So who is currently pioneering the “Web of Science”, Bioinformatics 2.0 if you like? Ensemblian Ewan Birney? Ian Holmes at Berkeley? Or somebody else?
[Image credit: Picture from Steve Jurvetson, this post originally published on nodalpoint with comments]
October 27, 2006
MEDIE: MEDLINE++
MEDIE is an “intelligent” semantic search engine that retrieves biomedical correlations from over 14 million articles in MEDLINE. You can find abstracts and sentences in MEDLINE by specifying the semantics of correlations; for example, What activates tumour suppressor protein p53? So just how useful is MEDIE and is it at the cutting edge?
At the Manchester Interdisciplinary Biocentre (MIB) launch yesterday, Professor Jun’ichi Tsujii gave a presentation on Linking text with knowledge – challenges for Text Mining in Biology. As part of this presentation he gave a demonstration of Medie: an intelligent search engine for Medline. This tool looks quite impressive if you experiment with some sample queries. I wonder what nodalpointers, especially hardened text-miners, natural language processing (NLP) nerds and computational linguists, make of Medie?
[This post was originally published on nodalpoint, with comments]



