March 16, 2010

DNA, Diversity and You at Cambridge Science Festival

Sequence BraceletsAs part of Cambridge Science festival last weekend, I joined a group of about 40 volunteers from The Sanger and EBI at an event “DNA, diversity and you”. This was a series of education and outreach events designed to explore how differences in your genetic code make you different from other individuals, and what makes the humans different from other living things -  with a bit of computational biology thrown in for good measure.  Here are some notes on a selection of the activities, in case you ever find yourself trying to explain biology, computer science or bioinformatics to anyone aged 4-18 and beyond. These resources are all tried, tested and fun to work with, for students and teachers alike:

  1. DNA origami create your own origami DNA molecule, and hands on way of learning abou tthe double helix structure of DNA
  2. DNA sequence bracelets (see picture right). Thread coloured beads according to sequence sections from a range of organisms including trout, chimpanzee, butterfly, a flesh-eating microbe and rotting corpse flower.
  3. Yummy gummy DNA (under 5′s) build your own DNA helix out of sweets and cocktail sticks. Then scoff it all afterwards.
  4. What’s my name in DNA? find out what your name is in DNA, and what the corresponding (hypothetical) protein is using software from deCODE.
  5. Function Finders translate DNA into a sequence of amino acids using wooden translator blocks, then find out which organism the amino acid sequence is from.
  6. Genome sizes (with seatbelts) Rank organisms (inc. human, zebrafish, mosquito, sugar cane and yeast) and find out if they are in the right order. Results are often not what you would expect.
  7. Play your genes right. A card-based guessing game which compares the number of genes in the human genome with the number of genes from a range of different organisms include the flu virus, E. coli bacteria, armadillo, rice plant and others.
  8. Genome Jigsaws for illustrating the process of finishing supposedly “finished” genomes, by putting together a square sequence jigsaw following base pairing rules to end up with a complete finished square.
  9. DNA Time Team examines of aspects ancestry and evolution. The activity encourages people to work out the sequence of a common ancestor by filling in the gaps on a simple evolutionary tree.
  10. Spot the difference with proteins. Comparing Heat Shock Protein (HSP) in human and other organisms to illustrate how different regions of the protein vary between different organisms and how this affects function.
  11. Ready, steady sort: a sorting network that demonstrates one technique that computers use to sort through large amounts of information like sequence data. This comes straight from Computer Science Unplugged by Tim Bell, Mike Fellows and Ian Witten. This activity can be done either as a smaller board game, or as a larger floor game. Either way, it’s a lot of fun, especially if you time people for an added competitive element (see video below)

There were a whole bunch of new activities at the festival this year, maybe these will appear on the your genome website in the future. Anyway, it was great fun to get involved, there is nothing quite like the challenge of explaining parallel computing to young kids, teenagers and their parents – actually much easier than you’d think if you’ve got access to great teaching materials.

Thanks to Francesca Gale and Louisa Wright for all the hard work that went into organising this fun and successful event.

June 19, 2009

Nettab 2009 Day Three: Semantic Integration

Catania ElephantA brief report (well just some scribbled notes, bullet points and links really) on the third and final day of Network Applications and Tools in Biology (NETTAB) 2009 in Catania, Sicily. There was a special section on Methods and Tools for RNA Structure and Functional Analysis. Disclaimer: RNA mania isn’t really my thing – so the RNA presentations and papers are grossly under-represented in this mini-report (sorry).

  • Keynote: Semantically Integrated eCommunities in Biomedicine: Next-Generation Models of Biomedical Communication, Tim Clark Massachusetts General Hospital and Harvard Medical School, Boston. His presentation opened by asking: What do the following have in common?

    1. Alzheimer’s Disease
    2. Huntington’s Disease
    3. Nicotine Addiction
    4. Schizophrenia
    5. Bipolar Disorder
    6. Autism
    7. Parkinson’s Disease
    8. ALS (Amyotrophic lateral sclerosis)
    9. Neuropathic Pain
    10. Major Depressive Disorder
    11. Cancer (multiple forms)


    1. Highly complex disorders
    2. Much information, incomplete understanding
    3. Inadequate treatment options
    4. Huge cost in human suffering
    5. Multi-factorial causality
    6. Require multi-disciplinary collaboration for progress to understanding and cure

    Tim discussed using The Science Collaboration Framework (SCF) a reusable, semantically-aware toolkit for building on-line communities. These make heavy use of Open Linked Data, controlled vocabularies and  Drupal to build websites to tackle the above disorders. For example pdonlineresearch.org (Parkinson’s Disease), StemBook.org (Harvard Stem Cell Institute) and alzforum.org (Alzheimers) [1]. The controlled vocabulary and ontology approach works well for understood stuff (where named entities are known) but not so good at the outer boundaries of our knowledge. Reusable framework for building web communities, Uses shared ontologies/vocabularies, Open source, freely available.

  • Michaela Guendel (Leaf Bioscience) presented DC-THERA Directory: A Knowledge Management System to Support Collaboration on Dendritic Cell and Immunology Research,  using cell type ontology, dendritic cell ontology, chebi, obi. Project involves Andrea Splendiani, Ciro Scognamiglio and Marco Brandizi
  • GePh-CARD: an information exchange application for an Hub & Spoke Network for Skeletal Dysplasias was presented by M. Mordenti & L. Sangiorgi
  • Panel Discussion: Collaborative and Social Bioinformatics Research and Development: Why, When, Who and How? Alex Bateman, Tim Clark, Duncan Hull and all participants. This panel discussion concentrated on Who? (experts vs. non experts, crowds vs. individuals, how to motivate and reward people to contribute to online communities. community annotation of data only possible when curators cede control of data) and then Where? (open wikis vs. closed ones, private vs. public data, wikis often not suitable for highly structured data, centralised vs. distributed systems)
  • Keynote: Bacterial Phylogeny and Taxonomy in the High-Throughput Sequencing World, Gabriel Valiente
  • Magdalena Musielak (has worked with Piotr Byzia) presented RNA tertiary structure prediction with ModeRNA,
  • Olivier Perriquet presented Improved heuristic for pairwise RNA secondary structure prediction,
  • Giampaolo Bella talked about Analysing microRNA by Theorem Proving. qualitative logic proving before quantitative experimental measures e.g. “shall we go to restaurant” before “how much does it cost”?
  • Mapping miRNA genes on human fragile sites and translocation breakpoints Alfredo Ferro et al.
  • Keynote: Computational challenages in the study of small RNAs Doron Betel, memorial sloan-kettering cancer center
  • microrna.gr. a suite of web based tools for elucidating microrna function was presented by Giorgo L. Papadopoulous, DIANA bioinformatics lab, biomedical Science research center, Alexander Fleming, Vari, Athens, Greece
  • Last but not least there was miRScape: a cytoscape plugin to annotate biological networks with microRNAs

The Tenth NETTAB (2010) Workshop will be in Rome, where the theme will be Oncology Bioinformatics and will be held at the end of  May or beginning of  June 2010.


  1. Das, S., Girard, L., Green, T., Weitzman, L., Lewis-Bowen, A., & Clark, T. (2009). Building biomedical web communities using a semantically aware content management system Briefings in Bioinformatics, 10 (2), 129-138 DOI: 10.1093/bib/bbn052

May 15, 2009

Y.M.C.A. – Just a little bit of G.T.C.A.

OK, look I know that by posting the latest viral marketing video from Bio-Rad Laboratories, Inc. I’m just a pawn (or vector) in their advertising game. This particular video has been around for a couple of months now but it is probably poor internet hygiene to spread these pandemic viral videos. I should just catch it, kill it and bin it. However, I can’t resist this one any longer because, like the last one, it is pretty kitsch, pretty funny and in a strange way, it might just increase the public awareness of Science. Maybe.

And it’s Friday today too, so to the tune of Y.M.C.A. by the Village People, you are now infected with just a little bit of (altogether now…) G.T.C.A.!

The lyrics go a little something like this: (more…)

February 20, 2009

Mistaken Identity: Google thinks I’m Maurice Wilkins

Who's afraid of Google?In a curious case of mistaken identity, Google seems to think I’m Maurice Wilkins. Here is how. If you Google the words DNA and mania (google.com/search?q=dna+mania) one of the first results is a tongue-in-cheek article I wrote two years ago about our obsession with Deoxyribonucleic Acid. Now Google (or more precisely Googlebot) seems to think this article is written by one M Wilkins. That’s M Wilkins as in the physicist Maurice Wilkins, the third man of the double helix (after Watson and Crick) and Nobel prize winner back in ’62. How could such a silly (but amusing) mistake be made? Because the article is about what Wilkins once said, but not actually by Wilkins. Computers can’t tell the difference between these two things. Consequently, it has been known for some time that Google Scholar has many other mistaken identities for authors like this. Scholar even thinks there is an author called Professor Forgotten Password (a prolific author who has been widely cited in many fields)!

The other curiosity is this, the original post on nodalpoint.org is also counted as a citation in Google Scholar too. It’s a bit of a mystery how scholar actually works, what it includes (and excludes) and how big it is, but you’ll find the article counted as a proper citation for a book about genes. Scientific spammers must be licking their lips with the opportunity to influence results and citation counts, with humble blog posts, rather than more kosher articles in peer-reviewed scientific journals.

So what does this all this curious interweb mischief tell us?

  1. Identifying people on the web is a tricky business, more complex than most people think
  2. Googlebot needs to have its algowithms tweaked by those Google Scholars at the Googleplex. Not really surprising, what else did you expect from Beta software? (P.S. Googlebot, when you read this, I’m not Maurice Wilkins, that’s not my name. I haven’t won a Nobel prize either.  I’m sort of flattered that you’ve mistaken me for such a distinguished scientist, so I’ll enjoy my alternative identity while it lasts.)
  3. Blogs are increasingly part of the scientific conversation, counted in various bibliometrics, will Google Scholar (and the rest) start indexing other blogs too? Where will this trend leave more conventional bibliometrics like the impact factor?

(Note: These search results were correct at the time of writing, but may change over time, results preserved for posterity on flickr)


  1. Maurice Wilkins (2003) The Third Man of the Double Helix: The Autobiography of Maurice Wilkins isbn:0198606656
  2. Péter Jacsó (2008) Savvy searching – Google Scholar revisited. Online Information Review 32: 102-11 DOI:10.1108/14684520810866010 (see also Defrosting the Digital Library)
  3. Douglas Kell (2008) What’s in a name? Guest, ghost and indeed quite imaginary authorships BBSRC blogs
  4. Neil R. Smalheiser and Vetle I. Torvik Author Name Disambiguation (This is a preprint version of a chapter published in Volume 43 (2009) of the Annual Review of Information Science and Technology (ARIST) (B. Cronin, Ed.) which is available from the publisher Information Today, Inc (http://books.infotoday.com/asist/#arist).
  5. Duncan Hull (2007) DNA mania. Nodalpoint.org
  6. Jules De Martino and Katie White (2008) That’s not my name (video)

July 25, 2008

How to spend a £400 million Science budget

A thought experiment with lots of money

The Queens Ahead by canonsnapperThe Biotechnology and Biological Sciences Research Council (BBSRC) is the United Kingdom’s funding agency for academic research and training in the non-clinical life sciences. It supports a total of around 1600 scientists and 2000 research students in universities and institutes in the UK. The head of our laboratory, Douglas Kell, has recently been appointed Chief Executive of the BBSRC [1]. Congratulations Doug, we wish you the very best in your new job. Now, according to bbsrc.ac.uk, their annual budget is a cool £400 million (just short of $800 million or €500 million). This has left me wondering, how would you spend a £400 million Science budget for the life sciences? For the purposes of this article, imagine it was you that had been put in charge of said budget, and Prime Minister Gordon Brown (texture like sun) had given you, yes YOU, a big bag of cash to distribute as you see fit. A mouth-watering prospect, I think you’ll agree. Here, is my personal opinion of how, in my dreams, I would spend the money. (more…)

March 18, 2008

Genomes to Systems 2008: Day One

Filed under: sysbio — Duncan Hull @ 9:27 am
Tags: , , , , , , , ,

Genomes to Systems is a biannual conference held in Manchester covering the latest post-genome developments. Here are some brief and incomplete notes on some of the speakers and topics from day one of the 2008 conference. (more…)

January 18, 2008

One Thousand Databases High (and rising)

StampsWell it’s that time of year again. The 15th annual stamp collecting edition of the journal Nucleic Acids Research (NAR), also known as the 2008 Database issue [1], was published earlier this week. This year there are 1078 databases listed in the collection, 110 more than the previous one (see Figure 1). As we pass the one thousand databases mark (1kDB) I wonder, what proportion of the data in these databases will never be used?

R.I.P. Biological Data?

It seems highly likely that lots of this data is stored in what Usama Fayyad at Yahoo! Research! Laboratories! calls data tombs [2], because as he puts it:

“Our ability to capture and store data has far outpaced our ability to process and utilise it. This growing challenge has produced a phenomenon we call the data tombs, or data stores that are effectively write-only; data is deposited to merely rest in peace, since in all likelihood it will never be accessed again.”

Like last year, lets illustrate the growth with an obligatory graph, see Figure 1.

Figure 1: Data growth: the ability to capture and store biological data has far outpaced our ability to understand it. Vertical axis is number of databases listed in Nucleic Acids Research [1], Horizontal axis is the year. (Picture drawn with Google Charts API which is OK but as Alf points out, doesn’t do error bars yet).

Another day, another dollar database

Does it matter that large quantities of this data will probably never be used? How could you find out, how much and which data was “write-only”? Will Biologists ever catch up with the physicists when it comes to Very Large stamp collections Databases? Biological databases are pretty big, but can you imagine handling up to 1,500 megabytes of data per second for ten years as the Physicists will soon be doing? You can already hear the (arrogant?) Physicists taunting the Biologists, “my database is bigger than yours”. So there.

Whichever of these databases you are using, happy data mining in 2008. If you are lucky, the data tombs you are working will contain hidden treasure that will make you famous and/or rich. Maybe. Any stamp collector will tell you, some stamps can become very valuable. There’s Gold in them there hills databases you know…

  1. Galperin, M. Y. (2007). The molecular biology database collection: 2008 update. Nucleic Acids Research, Vol. 36, Database issue, pages D2-D4. DOI:10.1093/nar/gkm1037
  2. Fayyad, U. and Uthurusamy, R. (2002). Evolving data into mining solutions for insights. Communications of the ACM, 45(8):28-31. DOI:10.1145/545151.545174
  3. This post originally published on nodalpoint (with comments)
  4. Stamp collectors picture, top right, thanks to daxiang stef / stef yau

January 15, 2008

Who’s the Daddy? PCR…

Filed under: biotech,omics — Duncan Hull @ 1:04 pm
Tags: , , , , ,

PCR, When you need to know who the Daddy is ♫ …

♫ There was a time when to amplify DNA,

You had to grow tons and tons of tiny cells.

Then along came a guy named Dr. Kary Mullis,

Said you can amplify in vitro just as well.

Just mix your template with a buffer and some primers,

Nucleotides and polymerases, too.

Denaturing, annealing, and extending.

Well it’s amazing what heating and cooling and heating will do.

PCR, when you need to detect mutations.

PCR, when you need to recombine.

PCR, when you need to find out who the daddy is.

PCR, when you need to solve a crime. ♫

(repeat chorus)

When you’ve finished chuckling at that ridiculous viral marketing video, go and Dance Naked in the Mind Field with Kary Mullis. Found via Respectful Insolence: Scientists for better PCR.

August 6, 2007

Scifoo day three: Genome Voyeurism with Lincoln Stein

On day three of Science Foo Camp (scifoo) biologist Lincoln Stein (picture right) gave a presenation on what he calls “genome voyeurism”, using Jim Watsons genome as an example. This session demonsrated the current and future possibilities of individuals having their own DNA sequenced, what has been called “personal genomics“.

Unlike the session on genomics yesterday on day two, where George Church, Eric Lander, 23andme, Sergey and Larry (and even Sergey’s pet dog) are all present, today they are conspicuously absent.

Lincolns presentation starts with a video (see youtube video below) of Jim Watson receiving his genome on a disk from Baylor College of Medicine, Houston. Lincoln tells how Jim puts his genome (stored on a hard drive) next to his Nobel prize medallion in his office. After all the press publicity, Jim deposits the data in GenBank, and it becomes available worldwide. (more…)

January 22, 2007

DNA mania

Filed under: bio — Duncan Hull @ 10:29 pm
Tags: , , , ,

What does DNA do when it’s not being transcribed into RNA? It causes DNA mania…

Quote of the Day

“DNA, you know, is Midas’ gold. Everyone who touches it goes mad.”

Maurice Wilkins

Read the rest in [1,2]

Do you or your colleagues ever suffer from DNA mania [3,4]? A biochemist friend of mine once semi-jokingly remarked that people’s manic obsession with DNA is a bit like buying some food and being more interested in the bar-code on the packaging, than the food inside. In his particular area of research, DNA is about as exciting as bar-codes, because it doesn’t even leave the nucleus of the cell, at least in Eukaryotes. I wonder what readers of nodalpoint think of this analogy? Anyway, as a result of this philosophy, most of his community have developed an unhealthy and manic interest in proteins rather than DNA. You could call this particular obsessive-compulsive disorder “protein mania”.

Depending on the scientific obsession(s) of your particular community, you might need to substitute Protein or RNA for DNA in the above quote, as appropriate. And if that is all too molecular for you, substitute any other of your favourite bioinformatics buzzwords.


  1. Horace Freeland Judson (1996) The Eighth Day of Creation: Makers of the Revolution in Biology
  2. John Sulston (2006) Won for All: How the Drosophila Genome was sequenced: a book by Michael Ashburner
  3. André Pichot (1999) Histoire de la notion de gène (one of the first documented uses of the phrase “DNA mania”)
  4. Denis Noble (2006) The Music of Life: Biology Beyond the Genome (an antidote to DNA mania and the Dawkinian gene-centric view of Life)
  5. DNA Photograph taken by Unapersona in Ciutat de les Arts i les Ciències, Calatrava building, Valencia, Spain.

Customized Rubric Theme Blog at WordPress.com.


Get every new post delivered to your Inbox.

Join 1,485 other followers