O'Really?

October 27, 2006

MEDIE: MEDLINE++

Filed under: informatics — Duncan Hull @ 10:21 pm
Tags: Jun'ichi Tsujii, MEDIE, Medline, MIB, nactem, NLP, nodalpoint, pubmed, software, text mining

MEDIE is an “intelligent” semantic search engine that retrieves biomedical correlations from over 14 million articles in MEDLINE. You can find abstracts and sentences in MEDLINE by specifying the semantics of correlations; for example, What activates tumour suppressor protein p53? So just how useful is MEDIE and is it at the cutting edge?

At the Manchester Interdisciplinary Biocentre (MIB) launch yesterday, Professor Jun’ichi Tsujii gave a presentation on Linking text with knowledge – challenges for Text Mining in Biology. As part of this presentation he gave a demonstration of Medie: an intelligent search engine for Medline. This tool looks quite impressive if you experiment with some sample queries. I wonder what nodalpointers, especially hardened text-miners, natural language processing (NLP) nerds and computational linguists, make of Medie?

[This post was originally published on nodalpoint, with comments]

Leave a Comment

October 20, 2006

Manchester Biocentre Launch

Filed under: bio,biotech — Duncan Hull @ 10:25 pm
Tags: Alan Fersht, Alan North, Biocentre, Douglas Kell, Gerhard Wagner, Gill Stephens, Greg Stephanopoulos, Jean Beggs, John McCarthy, Jun'ichi Tsujii, Manchester, MIB, Nigel Scrutton, nodalpoint, Roy Goodacre, Ruedi Aebersold, Wellcome Trust

The Manchester Interdisciplinary Biocentre (MIB) is officially opening on 25/26th October 2006. The centre has been about a decade in the making, and aims to be a world-class research centre, with around £37 million (~$70 million) of initial funding from the Wellcome Trust charity, UK Research Councils and others. If you’re looking for a bioinformatics job, PhD, PostDoc etc in the UK, MIB is continuously hiring and looks like a good place to work, if the opening programme (which follows) is anything to go by.

Unfortunately the MIB web pages aren’t quite world class yet, the promotional launch material is only available in pdf format, *sigh*, see references below. So I’m blogging the MIB Symposium launch programme here to put the stuff online. Talks scheduled for the second day of the opening, 26th October 2006, are listed below, and these can be attended by free registration (see references):

Session 1: Bio-molecular machines, 9.00-11.00

Session chaired by Alan North, Dean of the Faculty of Life Sciences

John E. Walker (MRC Dunn Human Nutrition Unit, Cambridge, UK): Biomolecular rotary motors.
Yoshi Nakamura (Tokyo University, Japan): Aptamer as RNA-made super antibody for basic and therapeutic applications
John McCarthy, (Manchester Interdisciplinary Biocentre): Molecular mechanisms underlying post-transcrptional gene expression.
Refreshment break

Session 2: Biomolecular Structure and Dynamics, 11.00-12.40

Session chaired by Bob Ford, Professor of Structural Biology, Faculty of Life Sciences.

Alan Fersht (MRC Centre for Protein Engineering, Cambridge, UK): Structure and Stability of the Tumour Suppressor p53.
Gerhard Wagner (Harvard Medical School, Boston, MA, USA): NMR studies of protein interactions regulating gene expression.
Stefan Weber (Freie Universität Berlin): Aspects and Prospects of Modern Electron Paramagnetic Resonance Spectroscopy.
Lunch

Session 3: Systems and Information, 13.35-15.45

Session chaired by John Perkins, Dean of Faculty of Engineering and Physical Sciences.

Greg Stephanopoulos (MIT, Cambridge MA, USA): Promise and challenges of systems biology in advancing biotechnology and biomedical research.
Douglas Kell, (MIB and School of Chemistry, University of Manchester): Systems Biology – why and how.
Jean Beggs (Wellcome Trust Centre for Cell Biology, The Univeristy of Edinburgh, UK): Messenger RNA processing – a complex system.
Jun’ichi Tsujii (MIB and School of Computer Science, University of Manchester): Linking text with knowledge – challenges for Text Mining in Biology.

Session 4: Biocatalysis, 16.10-17.00

Session chaired by Hans Westerhoff, Manchester Interdisciplinary Biocentre

Nigel Scrutton (MIB and Faculty of Life Sciences): ‘Squeezing’ barriers – a dynamical view of enzyme catalysis.
Gill Stephens, (MIB and School of Chemical Engineering): Redox biocatalysis – the next generation of enzymes for manufacturing pharmaceutical intermediates and specialty chemicals.

Session 5: Bionanoscience and engineering: 17.00-18.00

Session chaired by Peter Fielden, Chemical Engineering

Joseph Wang (Arizona State University, USA): Nanomaterials for monitoring and controlling biomolecular interactions.
Milan Stojanovich (Columbia University Medical School, New York, USA): Deoxyribozyme-based devices.

Session 6: Postgenomic Analytical Technologies, 18.00-19.10

Session chaired by Roy Goodacre, MIB and School of Chemistry

Ruedi Aebersold (ETH Zürich): Quantitative Proteomics and Systems Biology
Simon Gaskell, MIB and School of Chemistry: New analytical science in proteomics and metabolomics.
Concluding remarks.

References

Hannah Hoag (2004) All systems go: Manchester Interdisciplinary Biocentre Nature. 427 (6974), 568-9. DOI:10.1038/nj6974-568a
John McCarthy (2004) Tackling the challenges of interdisciplinary bioscience. Nature Reviews Molecular cell biology. 5 (11), 933-7. DOI:10.1038/nrm1501
More publications about Manchester Interdisciplinary Biocentre (MIB) tagged in Connotea
Manchester Biocentre launch programme 25th October 2006 (pdf)
Manchester Biocentre launch programme 26th October 2006 (pdf)
This post was originally published on nodalpoint with comments

Leave a Comment

October 8, 2006

Bio-Ignorance: Communicating Biology to Computer Scientists

Filed under: bio,biotech,book review,communication — Duncan Hull @ 10:38 pm
Tags: Alvis Brazma, Carole Goble, chromsome, Gene Ontology, human genome, hybrid, interdisciplinary, Jeremy Cherfas, John Gribbin, Led Zeppelin

Many computer scientists and software engineers are not familiar with basic biology or bioinformatics. Many biologists and bioinformaticians are not familiar with basic computer science or software engineering. This article points to some resources that can help with the former, and asks, what can be done about the latter?

Progress in both computer science and biology is closely linked and dependent on people understanding each others strange language, cross-pollinating ideas and creating technology which hopefully has hybrid vigour. So for example, biologists and bioinformaticians have a healthy apetite for all kinds of better, cheaper, faster and sometimes novel computation. This requires they understand basic computer science and software enginnering. In the other direction, computer scientists often need realistic scenarios to motivate the invention, development and testing of genuinely novel technology. As for the software engineers, more on them later…

It sounds great, but before you can even say the words “inter-discplinary”, there are considerable barriers to communication. The various camps speak different languages, and have radically different cultures. To illustrate this communication breakdown, here is a story from the lab where I work. A while ago, I was discussing the Gene Ontology with a colleague, who shall remain anonymous. This colleague was educated, doing PhD level research and what I’d consider a fairly typical computer scientist. Soon the conversation turned to chromosomes, and they asked me:

“What is a Chromosome?”

Initially I was shocked. How could somebody not know what a chromosome was? Had they never read a newspaper? Never watched the television? Surely, most people have at least a vague idea what a chromosome is? After recovering from the shock, I told this person that according to the Gene Ontology a chromosome is “a very long molecule of DNA and associated proteins that carries hereditary information.” Perhaps this bio-ignorance is an extreme case, but unfortunately, it is all too common. Many computer scientists and software engineers I know stopped studying biology as soon as they possibly could, opting for the so-called “harder” sciences: physics, chemistry and mathematics. Consequently, many (but not all) computer scientists are bio-ignorant. What can we do about it? We really need to understand each other if we are going to make any progress. How can we improve communication between biologists and computer scientists?

Part of the solution to this problem is well-written literature that explains basic concepts quickly and clearly without getting bogged down in jargon or stuck on esoteric details, see the references below for some examples. One of my personal favourites is a little book called The Human Genome: a beginner’s guide to the chemical code of life authored by Jeremy Cherfas. This book is lavishly illustrated and beautifully written, but most importantly of all at 72 pages it is blisteringly concise, so stands a chance of being read by computer geeks and nerds. It is even funny in places, the Nobel laureate and geneticist Thomas Hunt Morgan is amusingly depicted as a red-eyed wild type, just like the fruit flies he studied. Anyway, I lent my copy of said book to my computer science buddy, and they learnt not just what chromosomes are, but also a little bit about why Biology and Genetics are such fascinating subjects.

The literature listed below can help one-way understanding of biology by outsiders, but communication is a two-way street. What about the other direction? Is there any literature that explains computer science and software engineering specifically to biologists and bioinformaticians? I don’t know of any particularly good examples, that are concise, well written and illustrated, but perhaps you do. I’ve frequently found bioinformaticians and biologists misunderstand what computer science is about, and confuse it with software engineering, but that is another story. The moral of this story is, don’t be surprised if people working in different fields to you lack a basic understanding of what you consider fundamental concepts that everybody knows. If they are bio-ignorant computer scientists, you should patiently and tirelessly explain yourself and maybe point to some of the resources below. Maybe we can understand each other just a little better.

References

Anonymous GO:0005694 Chromosome: A very long molecule of DNA AmiGO! Your friend in the Gene Ontology
Alvis Brazma, Helen Parkinson, Thomas Schlitt and Mohammadreza Shojatalab (2001) All you need to know about biology in twenty pages European Bioinformatics Institute (EMBL-EBI) (A technical introduction, written for EBI employees, but useful elsewhere)
Jeremy Cherfas (2002) The Human Genome: a beginner’s guide to the chemical code of life (isbn:0751337161) Dorling Kindersley (A quick but informative introduction that your granny could understand)
Jeremy Cherfas (2006) International Plant Genetic Resources Institute (IPGRI) public awareness blog IPGRI, Rome, Italy. (Some deserved nodalpoint Google Juice for these news and press releasess)
Carole Goble and Chris Wroe (2005) The Montagues and the Capulets: In fair Genomics, where we lay our scene… Comparative and Functional Genomics 5(8):623-632 (A paper describing communication breakdown between two different research “houses”, very possibly the only paper on genomics that will make you laugh. seeAlso Shakespearean Genomics: a plague on both your houses)
John Gribbin Dorling Kindersley’s Essential Science: Human Genome, Global warming, Expanding universe, Food for the future, Digital revolution and How the brain works http://www.dk.com (Some interesting books here)
John W. Kimball Chromosomes Kimball’s Biology Pages (How does John Kimball manage to write so much good introductory material sabout Biology?)
John Bonham, John Paul Jones and Jimmy Page (1969) Communication Breakdown Led Zeppelin (Communication breakdown, it’s always the same, I’m having a nervous breakdown, drive me insane!)
This post was originally published on nodalpoint with comments.

This work is licensed under a

Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

Leave a Comment

July 25, 2006

AAAI’06: Highlights and conclusions

Filed under: semweb — Duncan Hull @ 10:46 pm
Tags: AAAI, AI, Alan Turing, Artificial Intelligence, Bruce Buchanan, conference, DARPA, Pedro Domingos, Sebastian Thrun, semantic web, Stephen Muggleton, Turing test

The AAAI conference finished last Thursday, here are some highlights and papers that might be worth reading if you are interested in building and / or using a more “intelligent” (and possibly semantic) web in bioinformatics.

Here are the papers or talks I enjoyed the most and hope you might also find them useful or inspiring.

Unifying Logical and Statistical AI talk given by Pedro Domingos.
Intelligent agents must be able to handle the complexity and uncertainty of the real world. Logical AI (of which the semantic web is an example) has focused mainly on the former, and statistical AI (e.g. machine learning) on the latter. The two approaches can be united, with significant benefits, some of which are demonstrated by the Alchemy system
Developing an intelligent personal assistant: The CALO (Cognitive Agent at that Learns and Organises) project talk given by Karen Myers.
CALO is a desktop assistant that learns what you do in the lab / office. Sounds spooky, but involves some interesting technology and fascinating research questions.
Bookmark hierarchies and collaborative recommendation by Ben Markines, Lubomira Stoilova and Filippo Menczer.
Describes an open-source, academically-oriented social bookmarking site where you can donate your bookmarks to science at givealink
Social network-based Trust in Prioritised Default Logic by Yarden Katz and Jennifer Golbeck.
Who and how can you trust on the Web?
Google vs Berners-Lee was a memorable debate. According to Jim Hendler, Tim and Peter are reconciling their differences now

Not particularly webby, but…

…entertaining nonetheless.

Stephen Muggletons talk on Computational Biology and Chemical Turing Machines, went down well but unfortunately I was stuck in a parallel track, experiencing “death by ontology”.
Bruce Buchanan gave a talk What Do We Know About Knowledge. A roller-coaster ride through the last 2000+ years of human attempts to understand what knowledge is, how to represent it and why it is powerful
Winning the DARPA Grand Challenge with an AI Robot called Stanley talk given by Sebastian Thrun, amazing presentation on a driving a robotic car through the desert over rough terrain. However, it doesn’t take too much imagination to think of horrific applications of this. Next year they will try to drive it from San Francisco to Los Angeles on a public freeway, and Stanley hasn’t even passed its driving test yet!

Turing’s dream

Appropriately, the conference which was subtitled Celebrating 50 years of AI finished with two talks by Lenhart K. Schubert and Stuart M. Shieber about the Turing test. The first discussed Turing’s dream and the Knowledge Challenge, the second talk asked Does the Turing Test Demonstrate Intelligence or Not? Now I’m back in Manchester, where Turing once worked, I can’t help wondering, what would Alan make of the current state of AI and the semantic web? I think there are several possibilities, he could be thinking:

EITHER: Fifty odd years later, they’re not still wasting time working on that Turing test are they?!
OR: He is smugly satisifed that he devised a test, that no machine has passed, and perhaps never will, but has provided us with a satisfactory operational definition of “intelligence” ;
…AND What the hell is the “Semantic Web”?

We will never know what Alan Turing would make of todays efforts to make a more intelligent web. However, that won’t stop me speculating that he would be impressed by the current uses of computers (intelligent or otherwise) to drive robots through the desert, perform all sort of computations on proteins and to search for information on this massive distributed global knowledge-base we call the “Web”. Not bad for 50 years of work, here’s to the next 50…

References

Alan Turing (1950) Computing Machinery and Intelligence: The Turing TestMind 59(236):433-460
Stephen H. Muggleton (2006) Exceeding human limits: The Chemical Turing MachineNature 440:409-410
Stephen H. Muggleton (2006) Towards Chemical Universal Turing Machines in Proceedings on the 21st National Conference on Artificial Intelligence
Picture credit: Image from Steve Jurvetson
This post was originally published on nodalpoint with comments

Leave a Comment

July 21, 2006

AAAI: Dude, Where’s My Service?

Filed under: biotech — Duncan Hull @ 10:54 pm
Tags: AAAI, AI, Artificial Intelligence, Katia Sycara, Massimo Paolucci, Ora Lassila, semantic web, Terry Payne, web services

As the number of bioinformatics services on the web increases, finding a tool or database that performs the task you require can be problematic. At the AAAI poster session on Wednesday, I presented our paper describing a novel solution to this problem. It uses a reasoner to “intelligently” search for web services, by semantically matching service requests with advertisements and has some advantages over comparable solutions…

I won’t go into all the gory details here but our technique extends and complements current approaches for matchmaking services. Some of the key features described in the paper are that it allows you describe to relationship(s) between the input and output of a service. E.g. What is the relationship between the input and output protein sequence of InterProScan? This relationship can help match requests for services with their adverts with higher precision and recall. I don’t mind admitting its been hard work getting this research published because a large part of the AI community use shamelessly toy and fictitious scenarios to motivate their work. Then they build incredibly complicated software stacks that are only understood by the small clique of people that designed them. When you show some of these people real-world bioinformatics services, they don’t seem to care too much, preferring to bury their heads in the sand of make-believe. There, thats got it off my chest!

So it was re-assuring when people came by the poster, listened to my speel and asked lots of questions. Ora Lassila from Nokia (one of the people responsible for hyping the whole idea up in the first place) dropped by to have a look. He was interested in adapting the technique for locating services in a registry, used by mobile devices. (I wonder if anyone out there needs BLAST on their mobile phone?!) It was good to meet Ora, and talk about semantics.

There is nothing quite like standing in front of a poster for three hours and tirelessly explaining it to complete strangers who work in disparate fields. It certainly helps to get your ideas straight. Where would we be without conferences?

References

Danny Leiner (2000) Dude, Where’s My Car?
Massimo Paolucci, Takahiro Kawamura, Terry Payne and Katia Sycara (2002) Semantic Matching of Web Service Capabilities
Duncan Hull, Evgeny Zolin, Andrey Bovykin, Ian Horrocks, Ulrike Sattler and Robert Stevens (2006) Deciding Semantic Matching of Stateless Services in the Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-06)

Leave a Comment

July 19, 2006

AAAI: Google and the Semantic, Satanic, Romantic Web

Filed under: biotech,google,informatics — Duncan Hull @ 11:07 pm
Tags: AAAI, AI, Artificial Intelligence, Boston, Cory Doctorow, Danny Ayers, google, identity, metacrap, metadata, owl, PageRank, Peter Norvig, rdf, semantic web, Stuart Russell, Tim Berners-Lee, Tim Finin, URI, w3c, Yolanda Gil

Tim Berners-Lee delivered his one hour keynote at the AAAI’06 conference yesterday on the Semantic Web, after an introduction from Yolanda Gil. Tim gave an impassioned speech covering the last 16 years of the web and discussed the future of sharing data on the web using persistent URI’s and W3C standards like RDF and OWL. At the end of it all, there were some searching questions from Peter Norvig, Director of Research at Google Labs.

Peter opened his questions by saying.

Many people usually ask me, when I stand up and ask questions after keynote speeches at conferences:

“Peter, what do you have against the Semantic Web?”

Here is roughly what Peter said, the semantic web will never work because:

People are stupid: Google has lots of experience of dealing with peoples stoopidity on the web. Many people don’t write well-formed HTML, they don’t run web servers properly and they keep changing what their URIs identify. It sucks, but this is the world, imperfect and messy and we just have to deal with it. These same people can’t be expected to use the Resource Description Framework (RDF) and the Web Ontology Language (OWL), which are much more complicated and considerably less fool-proof. (Perhaps you could call this the dumb-antic web?!)

Tim replied that a large part of the semantic web can be populated by taking existing relational databases and mapping them into RDF/OWL. The structured data is already there, it just needs web-izing in a mashup-friendly format. (What I like to call the romantic web: people will publish their data freely on the web this way, especially in e-science for example. This will allow sharing and re-use in unexpected ways.)
People are competitive: People working for commercial companies and market leaders can’t be expected to just put their raw data on the web as RDF/OWL, they have little interest in standards. This is how they make a living, beating their competitors and locking their customers into proprietary data formats, so they can keep selling them software / hardware to use their data. (Analagous problems in science, scientists can be reluctant to share and publish data, if someone else will make new discoveries with it and claim all the glory)

Tim replied that most bookstores thought putting their stock levels and prices on the web was a bad idea as it would give sensitive information away to their competitors. However, they soon realised that this would allow their customers to search, browse and eventually (kerr-ching) buy their books.
People cheat and lie: People lie about what their content is about, again, Google is on the receiving end of this. Cheats try to fool the PageRank algorithm by saying their web pages are about books or movies, when they are really about Viagra or Pornography. The same fate awaits RDF and OWL, cheaters will use ontologies to tell bare-faced lies about their data. (What I like to call the satanic web: people do evil things).

Tim didn’t have any good answers to this, although later in the day there were some papers touching on the issue of Trust and Policies in the semantic web layer cake.

These lively debates are raging on un-abated, in the corridors, lecture theatres and bars. AAAI is now in full swing and its great to be here!

June 28, 2006

Marginal Power

Filed under: Uncategorized — Duncan Hull @ 11:03 pm
Tags: Garage, LISP, Paul Graham, Procrastination, startup

LISP Hacker and Painter Paul Graham writes entertaining essays about technology. His latest piece, discusses how important and sometimes lucrative ideas usually come from the “garage” outside rather than the inside, what he calls The Power of the Marginal. His essay rambles a bit in places, but has some interesting observations that are relevant to bioinformatics. For example…

“…if you’re an outsider you should actively seek out contrarian projects. Instead of working on things the eminent have made prestigious, work on things that could steal that prestige.”

Paul did a PhD in Computer Science and has fond memories of being a student which will ring true with anyone who has been there:

“That’s what I remember about grad school: apparently endless supplies of time, which I spent worrying about, but not writing, my dissertation.”

PhDs and obscurity go hand-in-hand and according to this essay, obscurity and marginality is good for you. It doesn’t taste as good as junk food but is allegedly “good for you”. Pauls personal choice of marginality is the relatively obscure language called LISP, and the people I’ve met who use this langugage are either crazy or at the top of their game, sometimes both. Does LISP turn people crazy or are crazy people attracted to the obscurity of LISP?

Either way, Paul Grahams occasionally crazy essays are worth a read if and when you have a moment to spare. Even better, read them when you don’t have the time and are procrastinating writing your PhD thesis or next Bioinformatics paper.

June 12, 2006

Bend it like Bezier?

Filed under: book review,sport — Duncan Hull @ 11:12 pm
Tags: Beckham, bend, bezier, curve, football, Germany, Ken Bray, nodalpoint, Pierre Bézier, Ronaldinho, Teamgeist, World Cup

Football informatics, theory and practice: Germany 2006

The frenchman Pierre Bézier knew a thing or two about curves. But as World Cup fever tightens its grip around the globe, it is the footballers in Germany who are showing us just how much they know about the practical science of curving and bending the ball into the goal. Is there any essential curve-theory for World Cup stars like Beckham, Ronaldinho and Thierry Henry to read and brush-up on in their German hotels this summer?

Sports scientist Dr Ken Bray from the University of Bath in the UK hopes that sportsmen and spectators alike will be reading his new book How to Score – Science and the Beautiful Game. This is another popular science book that tries to make fluid dynamics accessible to the layman. In publicising his new book, Ken points out that the new Adidas Teamgeist™ football will unsettle goalkeepers at the World Cup, because the balls move more in the air than traditional ones. This smells of marketing-hype, both for the ball and the book, but it is interesting and topical nonetheless.

Mathematicians and numerical analysts have known for years, the really essential reading for footballers this summer is the famous curves index. These wonderful web pages, free online and completely devoid of hype, describe all the equations for putting the ball in the back of the net in great style. After reading these pages, perhaps World Cup footballers will be able to curve the unpredictable Teamgeist™ ball even more lavishly than before. Just imagine the confusion of a goalkeeper facing a free-kick, when the ball follows a right strophoid curve: y² = x²(a – x)/(a + x)! This would certainly be more entertaining than the all too predictable and common straight line: y = mx + c that soars over the bar and into row Z of the spectators behind the goal…

Whether scientists, footballers or spectators, we can all enjoy the science of curving at the World Cup in Germany this summer. Bis Bald Berlin!

References

Leave a Comment

June 2, 2006

Debugging Web Services

Filed under: biotech,informatics — Duncan Hull @ 11:19 pm
Tags: bloatware, debug, mashup, soap, soapui, taverna, web services, workflow, WSDL, xml

There are a growing number of biomedical services out there on Wild Wild Web for performing various computations on DNA, RNA and proteins as well as the associated scientific literature. Currently, using and debugging these services can be hard work. SOAP UI (SOAP User Interface) is newish and handy free tool to help debug services and get your in silico experiments and analyses done, hopefully more easily.

So why should bioinformaticans care about Web Services? Three of the most important advantages are:

They can reduce the need to install and maintain hundreds of tools and databases locally on desktop(s) or laboratory server(s) as these resources are programmatically accessible over the web.
They can remove the need for tedious and error-prone screen-scraping, or worse, “cut-and-paste” of data between web applications that don’t have fully programmatic interfaces.
It is possible to compose and orchestrate services into workflows or pipelines, which are repeatable and verifiable descriptions of your experiments that you can share. Needless to say, sharing repeatable experiments has always been important part of science, its shouldn’t be any different on the Web of Science.

All this distributed computing goodness comes at a price though and there are several disadvantages of using web services. We will focus on one here: Debugging services, which can be problematic. In order to do this, bioinformaticians need to understand a little bit about how web services work and how to debug them.

Death by specification

Debugging services sounds straightforward, but many publicly available biomedical services, are not the simpler RESTian type, but the more complex SOAP-and-WSDL type of web service. Consequently, debugging usually requires a basic understanding these protocols and interfaces, the so-called “Simple” Object Access Protocol (SOAP) and Web Services Description Language (WSDL). However these specifications are both big, complicated and being superceded by newer versions so you might lose the will-to-live while reading them. Also, individual services described in WSDL are easier for machines to read, than for humans, and therefore give humble bioinformaticians a big headache. As an example, have a look at the WSDL for BLAST at the DNA Databank of Japan (DDBJ).

So, if you’re not intimately familiar with the WSDL 1.1 specification (frankly, life is too short and they keep moving the goal-posts anyway), it is not very clear what is going on here. WSDL describes messages, port types, end points, part-names, bindings, bla-bla-bla, and lots of other seemingly unnecessary abstractions. To add insult to injury WSDL is used in several different styles and is expressed in verbose XML. Down with the unnecessary abstractions! But the problems don’t stop there. From looking at this WSDL, you have to make several leaps of imagination to understand what the corresponding SOAP messages this BLAST service accepts and responds with will look like. So when you are analysing your favourite protein sequence(s) with BLAST or perhaps InterProScan it can be difficult or impossible to work out what went wrong.

Using SOAPUI

This is where SOAPUI, can make life easier. You can launch SOAPUI using the Java Web Start, load a WSDL in and you can begin to see what is going on. One of the nice features, is it will show you what the SOAP messages look like, which saves you having to work it out in your head. So, going back to our BLAST example…

Launch the SOAPUI tool and select File then New WSDL Project (Give project a name and save it when prompted).
Right click on the Project folder and select add WSDL from URL
Type in http://xml.nig.ac.jp/wsdl/Blast.wsdl or your own favourite from this list of molecular biology wsdl.
When asked: Create default requests for all operations select Yes
The progress bar will whizz away while it imports the file, once its done, you can see a list of operations
If you click on one of them e.g. searchParam then Request1, then select Open Request Editor it spawns two new windows…

The first (left-hand) window shows the SOAP request that is sent to the BLAST service:

<soapenv:Envelope
	... boring namespace declarations ... >
	 <soapenv:Body>

		<blas:searchParam soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
			<!-- use BLASTp -->
			<program xsi:type="xsd:string">blastp</program>

			<!-- Use SWISSPROT data  -->
			<database xsi:type="xsd:string">SWISS</database>

			<!-- protein sequence -->
			<query xsi:type="xsd:string">MHLEGRDGRR YPGAPAVELL QTSVPSGLAE LVAGKRRLPR GAGGADPSHS</query>

			<!-- no parameters -->
			<param xsi:type="xsd:string"></param>
		</blas:searchParam>

	</soapenv:Body>
</soapenv:Envelope>

When you click on the green request button, this message is sent to the service. Note: you have to fill in the parameters values as they default to: “?”.

After submitting the request above, the SOAP response appears in the second (right-hand) window:

<soap:Envelope
... namespace declarations... >
   <soap:Body>

      <n:searchParamResponse xmlns:n="http://tempuri.org/Blast">
         <Result xsi:type="xsd:string">BLASTP 2.2.12 [Aug-07-2005] ...
		 Sequences producing significant alignments:                      (bits) Value
		 sp|Q04671|P_HUMAN P protein (Melanocyte-specific transporter pro...   104   8e-23 ...
		 </Result>
      </n:searchParamResponse>
   </soap:Body>
</soap:Envelope>

Not all users of web services will want the gory details of SOAP, but for serious users, its a handy tool for understanding how any given web service works. This can be invaluable in working out what happened if, or more probably when, an individual service behaves unexpectedly. If you know of any other tools that make web services easier to use and debug, I’d be interested to hear about them.

Conclusions: It’s not rocket science

In my experience, small tools (like SOAPUI) can make a BIG difference. I’ve used a deliberately simple (and relatively reliable) BLAST service for demonstration purposes, but the interested reader / hacker might want to use this tool to play with more complex programs like the NCBI Web Services or InterProScan at the EBI. Using such services often requires good testing and debugging support, for example, when you compose (or “mashup”) services into complex workflows, using a client such as the Taverna workbench. This is where SOAP UI might just help you test and debug web services provided by other laboratories and data centres around the world, so you can use them reliably in your in silico experiments.

May 26, 2006

BioGrids: From Tim Bray to Jim Gray (via Seymour Cray)

Filed under: biotech — Duncan Hull @ 11:30 pm
Tags: BLAST, FASTA, Globus, Globus Toolkit, Grid, HPC, Jim Gray, myGrid, nodalpoint, sequence jockey, Seymour Cray, Tim Bray

Grid Computing already plays an important role in the life sciences, and will probably continue doing so for the forseeable future. BioGrid (Japan), ^myGrid (UK) and CoreGrid (Europe) are just three current examples, there are many more Grid and Super Duper Computer projects in the life sciences. So, is there an accessible Hitch Hikers Guide to the Grid for newbies, especially bioinformaticians?

Unfortunately much of the literature of Grid Computing is esoteric and inaccessible, liberally sprinkled with abstract and wooly concepts like “Virtual Organisations” with a large side-order of acronym soup. This makes it difficult or impossible for the everyday bioinformatican to understand or care about. Thankfully, Tim Bray from Sun Microsystems has a written an accessible review of the area, “Grids for dummies”, if you like. Its worth a read if you’re a bioinformatician with a need for more heavyweight distributed computing than the web currently provides, but you find Grid-speak is usually impenetrable nonsense.

One of the things Tims discusses in his review is Microsoftie Jim Gray, who is partly responsible for the 2020 computing initiative mentioned on nodalpoint earlier. Tim describes Jim’s article Distributed Computing Economics. In this, Jim uses wide variety of examples to illustrate the current economics of grids, from “Megaservices” like Google, Yahoo! and Hotmail to the bioinformaticians favourites, BLAST and FASTA. So how might Grids affect the average bioinformatician? There are many different applications of Grid computing, but two areas spring to mind:

Running your in silico experiments (genome annotation, sequence analysis, protein interactions etc), using someone elses memory, disk space, processors on the Grid. This could mean you will be able to do your experiments more quickly and reliably than you can using the plain ol’ Web.
Executing high-throughput and long-running experiments, e.g. you’ve got a ton of microarray data and it takes hours or possibly days to analyse computationally.

So if you deal with microarray data daily, you probably know all this stuff already, but Tims overview and Jims commentary are both accessible pieces to pass on to your colleagues in the lab. If this kind of stuff pushes your button, you might also be interested in the eProtein Scientific Meeting and Workshop Proceedings.

[This post was originally published on nodalpoint with comments.]

Leave a Comment

« Previous Page — Next Page »

October 27, 2006

October 20, 2006

Session 1: Bio-molecular machines, 9.00-11.00

Session 2: Biomolecular Structure and Dynamics, 11.00-12.40

Session 3: Systems and Information, 13.35-15.45

Session 4: Biocatalysis, 16.10-17.00

Session 5: Bionanoscience and engineering: 17.00-18.00

Session 6: Postgenomic Analytical Technologies, 18.00-19.10

References

October 8, 2006

References

July 25, 2006

Not particularly webby, but…

Turing’s dream

References

July 21, 2006

References

July 19, 2006

Further reading

June 28, 2006

Further reading

June 12, 2006

Football informatics, theory and practice: Germany 2006

References

June 2, 2006

Death by specification

Using SOAPUI

Conclusions: It’s not rocket science

Further reading

May 26, 2006

Meta / μετά