January 15, 2008

Who’s the Daddy? PCR…

Filed under: biotech,omics — Duncan Hull @ 1:04 pm
Tags: , , , , ,

PCR, When you need to know who the Daddy is ♫ …

♫ There was a time when to amplify DNA,

You had to grow tons and tons of tiny cells.

Then along came a guy named Dr. Kary Mullis,

Said you can amplify in vitro just as well.

Just mix your template with a buffer and some primers,

Nucleotides and polymerases, too.

Denaturing, annealing, and extending.

Well it’s amazing what heating and cooling and heating will do.

PCR, when you need to detect mutations.

PCR, when you need to recombine.

PCR, when you need to find out who the daddy is.

PCR, when you need to solve a crime. ♫

(repeat chorus)

When you’ve finished chuckling at that ridiculous viral marketing video, go and Dance Naked in the Mind Field with Kary Mullis. Found via Respectful Insolence: Scientists for better PCR.

March 30, 2007

This month’s molecule is…

Filed under: biotech — Duncan Hull @ 10:10 pm
Tags: , , , , , ,

Space-filling and backbone model of 1HRYThere are a number of “Molecule of the Month” style mini-reviews on the web, which highlight one particular molecule (usually a protein) every month, in an accessible style. Two of my personal favourites are protein spotlight: one month, one protein written by Vivienne Baillie Gerritsen of the Swiss-Prot team and Molecule of the Month at the Protein Databank PDB edited by David Goodsell. Both these features are worth a quick read because they can help bio-literate and bio-curious users to increase and reinforce their knowledge relatively quickly.

Part of what makes the PDB one worth reading is the colourful visualisations and short descriptions that go with it. For March 2007, PDBs molecule of the month is Zinc Fingers. Meanwhile, over at swissprot, the molecule is Sex-determining region Y protein (Sry), used to illustrate the tenuous nature of sex.

[This post originally published on nodalpoint with comments]

October 20, 2006

Manchester Biocentre Launch

MIB: Spot the test tubeThe Manchester Interdisciplinary Biocentre (MIB) is officially opening on 25/26th October 2006. The centre has been about a decade in the making, and aims to be a world-class research centre, with around £37 million (~$70 million) of initial funding from the Wellcome Trust charity, UK Research Councils and others. If you’re looking for a bioinformatics job, PhD, PostDoc etc in the UK, MIB is continuously hiring and looks like a good place to work, if the opening programme (which follows) is anything to go by.

Unfortunately the MIB web pages aren’t quite world class yet, the promotional launch material is only available in pdf format, *sigh*, see references below. So I’m blogging the MIB Symposium launch programme here to put the stuff online. Talks scheduled for the second day of the opening, 26th October 2006, are listed below, and these can be attended by free registration (see references):

Session 1: Bio-molecular machines, 9.00-11.00

Session chaired by Alan North, Dean of the Faculty of Life Sciences

  • John E. Walker (MRC Dunn Human Nutrition Unit, Cambridge, UK): Biomolecular rotary motors.
  • Yoshi Nakamura (Tokyo University, Japan): Aptamer as RNA-made super antibody for basic and therapeutic applications
  • John McCarthy, (Manchester Interdisciplinary Biocentre): Molecular mechanisms underlying post-transcrptional gene expression.
  • Refreshment break

Session 2: Biomolecular Structure and Dynamics, 11.00-12.40

Session chaired by Bob Ford, Professor of Structural Biology, Faculty of Life Sciences.

Session 3: Systems and Information, 13.35-15.45

Session chaired by John Perkins, Dean of Faculty of Engineering and Physical Sciences.

Session 4: Biocatalysis, 16.10-17.00

Session chaired by Hans Westerhoff, Manchester Interdisciplinary Biocentre

  • Nigel Scrutton (MIB and Faculty of Life Sciences): ‘Squeezing’ barriers – a dynamical view of enzyme catalysis.
  • Gill Stephens, (MIB and School of Chemical Engineering): Redox biocatalysis – the next generation of enzymes for manufacturing pharmaceutical intermediates and specialty chemicals.

Session 5: Bionanoscience and engineering: 17.00-18.00

Session chaired by Peter Fielden, Chemical Engineering

  • Joseph Wang (Arizona State University, USA): Nanomaterials for monitoring and controlling biomolecular interactions.
  • Milan Stojanovich (Columbia University Medical School, New York, USA): Deoxyribozyme-based devices.

Session 6: Postgenomic Analytical Technologies, 18.00-19.10

Session chaired by Roy Goodacre, MIB and School of Chemistry

  • Ruedi Aebersold (ETH Zürich): Quantitative Proteomics and Systems Biology
  • Simon Gaskell, MIB and School of Chemistry: New analytical science in proteomics and metabolomics.
  • Concluding remarks.

October 8, 2006

Bio-Ignorance: Communicating Biology to Computer Scientists

The Human GenomeMany computer scientists and software engineers are not familiar with basic biology or bioinformatics. Many biologists and bioinformaticians are not familiar with basic computer science or software engineering. This article points to some resources that can help with the former, and asks, what can be done about the latter?

Progress in both computer science and biology is closely linked and dependent on people understanding each others strange language, cross-pollinating ideas and creating technology which hopefully has hybrid vigour. So for example, biologists and bioinformaticians have a healthy apetite for all kinds of better, cheaper, faster and sometimes novel computation. This requires they understand basic computer science and software enginnering. In the other direction, computer scientists often need realistic scenarios to motivate the invention, development and testing of genuinely novel technology. As for the software engineers, more on them later…

It sounds great, but before you can even say the words “inter-discplinary”, there are considerable barriers to communication. The various camps speak different languages, and have radically different cultures. To illustrate this communication breakdown, here is a story from the lab where I work. A while ago, I was discussing the Gene Ontology with a colleague, who shall remain anonymous. This colleague was educated, doing PhD level research and what I’d consider a fairly typical computer scientist. Soon the conversation turned to chromosomes, and they asked me:

“What is a Chromosome?”

Initially I was shocked. How could somebody not know what a chromosome was? Had they never read a newspaper? Never watched the television? Surely, most people have at least a vague idea what a chromosome is? After recovering from the shock, I told this person that according to the Gene Ontology a chromosome is “a very long molecule of DNA and associated proteins that carries hereditary information.” Perhaps this bio-ignorance is an extreme case, but unfortunately, it is all too common. Many computer scientists and software engineers I know stopped studying biology as soon as they possibly could, opting for the so-called “harder” sciences: physics, chemistry and mathematics. Consequently, many (but not all) computer scientists are bio-ignorant. What can we do about it? We really need to understand each other if we are going to make any progress. How can we improve communication between biologists and computer scientists?

Part of the solution to this problem is well-written literature that explains basic concepts quickly and clearly without getting bogged down in jargon or stuck on esoteric details, see the references below for some examples. One of my personal favourites is a little book called The Human Genome: a beginner’s guide to the chemical code of life authored by Jeremy Cherfas. This book is lavishly illustrated and beautifully written, but most importantly of all at 72 pages it is blisteringly concise, so stands a chance of being read by computer geeks and nerds. It is even funny in places, the Nobel laureate and geneticist Thomas Hunt Morgan is amusingly depicted as a red-eyed wild type, just like the fruit flies he studied. Anyway, I lent my copy of said book to my computer science buddy, and they learnt not just what chromosomes are, but also a little bit about why Biology and Genetics are such fascinating subjects.

The literature listed below can help one-way understanding of biology by outsiders, but communication is a two-way street. What about the other direction? Is there any literature that explains computer science and software engineering specifically to biologists and bioinformaticians? I don’t know of any particularly good examples, that are concise, well written and illustrated, but perhaps you do. I’ve frequently found bioinformaticians and biologists misunderstand what computer science is about, and confuse it with software engineering, but that is another story. The moral of this story is, don’t be surprised if people working in different fields to you lack a basic understanding of what you consider fundamental concepts that everybody knows. If they are bio-ignorant computer scientists, you should patiently and tirelessly explain yourself and maybe point to some of the resources below. Maybe we can understand each other just a little better.


  1. Anonymous GO:0005694 Chromosome: A very long molecule of DNA AmiGO! Your friend in the Gene Ontology
  2. Alvis Brazma, Helen Parkinson, Thomas Schlitt and Mohammadreza Shojatalab (2001) All you need to know about biology in twenty pages European Bioinformatics Institute (EMBL-EBI) (A technical introduction, written for EBI employees, but useful elsewhere)
  3. Jeremy Cherfas (2002) The Human Genome: a beginner’s guide to the chemical code of life (isbn:0751337161) Dorling Kindersley (A quick but informative introduction that your granny could understand)
  4. Jeremy Cherfas (2006) International Plant Genetic Resources Institute (IPGRI) public awareness blog IPGRI, Rome, Italy. (Some deserved nodalpoint Google Juice for these news and press releasess)
  5. Carole Goble and Chris Wroe (2005) The Montagues and the Capulets: In fair Genomics, where we lay our scene… Comparative and Functional Genomics 5(8):623-632 (A paper describing communication breakdown between two different research “houses”, very possibly the only paper on genomics that will make you laugh. seeAlso Shakespearean Genomics: a plague on both your houses)
  6. John Gribbin Dorling Kindersley’s Essential Science: Human Genome, Global warming, Expanding universe, Food for the future, Digital revolution and How the brain works http://www.dk.com (Some interesting books here)
  7. John W. Kimball Chromosomes Kimball’s Biology Pages (How does John Kimball manage to write so much good introductory material sabout Biology?)
  8. John Bonham, John Paul Jones and Jimmy Page (1969) Communication Breakdown Led Zeppelin (Communication breakdown, it’s always the same, I’m having a nervous breakdown, drive me insane!)
  9. This post was originally published on nodalpoint with comments.

Creative Commons License

This work is licensed under a

Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

July 21, 2006

AAAI: Dude, Where’s My Service?

GogloAs the number of bioinformatics services on the web increases, finding a tool or database that performs the task you require can be problematic. At the AAAI poster session on Wednesday, I presented our paper describing a novel solution to this problem. It uses a reasoner to “intelligently” search for web services, by semantically matching service requests with advertisements and has some advantages over comparable solutions…

I won’t go into all the gory details here but our technique extends and complements current approaches for matchmaking services. Some of the key features described in the paper are that it allows you describe to relationship(s) between the input and output of a service. E.g. What is the relationship between the input and output protein sequence of InterProScan? This relationship can help match requests for services with their adverts with higher precision and recall. I don’t mind admitting its been hard work getting this research published because a large part of the AI community use shamelessly toy and fictitious scenarios to motivate their work. Then they build incredibly complicated software stacks that are only understood by the small clique of people that designed them. When you show some of these people real-world bioinformatics services, they don’t seem to care too much, preferring to bury their heads in the sand of make-believe. There, thats got it off my chest!

So it was re-assuring when people came by the poster, listened to my speel and asked lots of questions. Ora Lassila from Nokia (one of the people responsible for hyping the whole idea up in the first place) dropped by to have a look. He was interested in adapting the technique for locating services in a registry, used by mobile devices. (I wonder if anyone out there needs BLAST on their mobile phone?!) It was good to meet Ora, and talk about semantics.

There is nothing quite like standing in front of a poster for three hours and tirelessly explaining it to complete strangers who work in disparate fields. It certainly helps to get your ideas straight. Where would we be without conferences?


  1. Danny Leiner (2000) Dude, Where’s My Car?
  2. Massimo Paolucci, Takahiro Kawamura, Terry Payne and Katia Sycara (2002) Semantic Matching of Web Service Capabilities
  3. Duncan Hull, Evgeny Zolin, Andrey Bovykin, Ian Horrocks, Ulrike Sattler and Robert Stevens (2006) Deciding Semantic Matching of Stateless Services in the Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-06)

July 19, 2006

June 2, 2006

Debugging Web Services

Filed under: biotech,informatics — Duncan Hull @ 11:19 pm
Tags: , , , , , , , , ,

IMGP4014There are a growing number of biomedical services out there on Wild Wild Web for performing various computations on DNA, RNA and proteins as well as the associated scientific literature. Currently, using and debugging these services can be hard work. SOAP UI (SOAP User Interface) is newish and handy free tool to help debug services and get your in silico experiments and analyses done, hopefully more easily.

So why should bioinformaticans care about Web Services? Three of the most important advantages are:

  1. They can reduce the need to install and maintain hundreds of tools and databases locally on desktop(s) or laboratory server(s) as these resources are programmatically accessible over the web.
  2. They can remove the need for tedious and error-prone screen-scraping, or worse, “cut-and-paste” of data between web applications that don’t have fully programmatic interfaces.
  3. It is possible to compose and orchestrate services into workflows or pipelines, which are repeatable and verifiable descriptions of your experiments that you can share. Needless to say, sharing repeatable experiments has always been important part of science, its shouldn’t be any different on the Web of Science.

All this distributed computing goodness comes at a price though and there are several disadvantages of using web services. We will focus on one here: Debugging services, which can be problematic. In order to do this, bioinformaticians need to understand a little bit about how web services work and how to debug them.

Death by specification

Debugging services sounds straightforward, but many publicly available biomedical services, are not the simpler RESTian type, but the more complex SOAP-and-WSDL type of web service. Consequently, debugging usually requires a basic understanding these protocols and interfaces, the so-called Simple” Object Access Protocol (SOAP) and Web Services Description Language (WSDL). However these specifications are both big, complicated and being superceded by newer versions so you might lose the will-to-live while reading them. Also, individual services described in WSDL are easier for machines to read, than for humans, and therefore give humble bioinformaticians a big headache. As an example, have a look at the WSDL for BLAST at the DNA Databank of Japan (DDBJ).

So, if you’re not intimately familiar with the WSDL 1.1 specification (frankly, life is too short and they keep moving the goal-posts anyway), it is not very clear what is going on here. WSDL describes messages, port types, end points, part-names, bindings, bla-bla-bla, and lots of other seemingly unnecessary abstractions. To add insult to injury WSDL is used in several different styles and is expressed in verbose XML. Down with the unnecessary abstractions! But the problems don’t stop there. From looking at this WSDL, you have to make several leaps of imagination to understand what the corresponding SOAP messages this BLAST service accepts and responds with will look like. So when you are analysing your favourite protein sequence(s) with BLAST or perhaps InterProScan it can be difficult or impossible to work out what went wrong.


This is where SOAPUI, can make life easier. You can launch SOAPUI using the Java Web Start, load a WSDL in and you can begin to see what is going on. One of the nice features, is it will show you what the SOAP messages look like, which saves you having to work it out in your head. So, going back to our BLAST example…

  1. Launch the SOAPUI tool and select File then New WSDL Project (Give project a name and save it when prompted).
  2. Right click on the Project folder and select add WSDL from URL
  3. Type in http://xml.nig.ac.jp/wsdl/Blast.wsdl or your own favourite from this list of molecular biology wsdl.
  4. When asked: Create default requests for all operations select Yes
  5. The progress bar will whizz away while it imports the file, once its done, you can see a list of operations
  6. If you click on one of them e.g. searchParam then Request1, then select Open Request Editor it spawns two new windows…
  7. The first (left-hand) window shows the SOAP request that is sent to the BLAST service:
    	... boring namespace declarations ... >
    		<blas:searchParam soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
    			<!-- use BLASTp -->
    			<program xsi:type="xsd:string">blastp</program>
    			<!-- Use SWISSPROT data  -->
    			<database xsi:type="xsd:string">SWISS</database>
    			<!-- protein sequence -->
    			<!-- no parameters -->
    			<param xsi:type="xsd:string"></param>
  8. When you click on the green request button, this message is sent to the service. Note: you have to fill in the parameters values as they default to: “?”.
  9. After submitting the request above, the SOAP response appears in the second (right-hand) window:
    ... namespace declarations... >
          <n:searchParamResponse xmlns:n="http://tempuri.org/Blast">
             <Result xsi:type="xsd:string">BLASTP 2.2.12 [Aug-07-2005] ...
    		 Sequences producing significant alignments:                      (bits) Value
    		 sp|Q04671|P_HUMAN P protein (Melanocyte-specific transporter pro...   104   8e-23 ...

Not all users of web services will want the gory details of SOAP, but for serious users, its a handy tool for understanding how any given web service works. This can be invaluable in working out what happened if, or more probably when, an individual service behaves unexpectedly. If you know of any other tools that make web services easier to use and debug, I’d be interested to hear about them.

Conclusions: It’s not rocket science

In my experience, small tools (like SOAPUI) can make a BIG difference. I’ve used a deliberately simple (and relatively reliable) BLAST service for demonstration purposes, but the interested reader / hacker might want to use this tool to play with more complex programs like the NCBI Web Services or InterProScan at the EBI. Using such services often requires good testing and debugging support, for example, when you compose (or “mashup”) services into complex workflows, using a client such as the Taverna workbench. This is where SOAP UI might just help you test and debug web services provided by other laboratories and data centres around the world, so you can use them reliably in your in silico experiments.

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

May 26, 2006

BioGrids: From Tim Bray to Jim Gray (via Seymour Cray)

Filed under: biotech — Duncan Hull @ 11:30 pm
Tags: , , , , , , , , , , ,

Recycle or Globus Toolkit?Grid Computing already plays an important role in the life sciences, and will probably continue doing so for the forseeable future. BioGrid (Japan), myGrid (UK) and CoreGrid (Europe) are just three current examples, there are many more Grid and Super Duper Computer projects in the life sciences. So, is there an accessible Hitch Hikers Guide to the Grid for newbies, especially bioinformaticians?

Unfortunately much of the literature of Grid Computing is esoteric and inaccessible, liberally sprinkled with abstract and wooly concepts like “Virtual Organisations” with a large side-order of acronym soup. This makes it difficult or impossible for the everyday bioinformatican to understand or care about. Thankfully, Tim Bray from Sun Microsystems has a written an accessible review of the area, “Grids for dummies”, if you like. Its worth a read if you’re a bioinformatician with a need for more heavyweight distributed computing than the web currently provides, but you find Grid-speak is usually impenetrable nonsense.

One of the things Tims discusses in his review is Microsoftie Jim Gray, who is partly responsible for the 2020 computing initiative mentioned on nodalpoint earlier. Tim describes Jim’s article Distributed Computing Economics. In this, Jim uses wide variety of examples to illustrate the current economics of grids, from “Megaservices” like Google, Yahoo! and Hotmail to the bioinformaticians favourites, BLAST and FASTA. So how might Grids affect the average bioinformatician? There are many different applications of Grid computing, but two areas spring to mind:

  1. Running your in silico experiments (genome annotation, sequence analysis, protein interactions etc), using someone elses memory, disk space, processors on the Grid. This could mean you will be able to do your experiments more quickly and reliably than you can using the plain ol’ Web.
  2. Executing high-throughput and long-running experiments, e.g. you’ve got a ton of microarray data and it takes hours or possibly days to analyse computationally.

So if you deal with microarray data daily, you probably know all this stuff already, but Tims overview and Jims commentary are both accessible pieces to pass on to your colleagues in the lab. If this kind of stuff pushes your button, you might also be interested in the eProtein Scientific Meeting and Workshop Proceedings.

[This post was originally published on nodalpoint with comments.]

« Previous Page

Blog at WordPress.com.