myGrid | O'Really?

December 10, 2008

Congratulations Carole Goble, e-Scientist

Filed under: awards — Duncan Hull @ 5:20 pm
Tags: bioinformatics, Carole Goble, e-science, Jim Gray, Joseph Tartakoff, Microsoft, myGrid, Savas Parastatidis, taverna, Tony Hey, workflow

At the Microsoft e-Science workshop in Indianapolis, earlier this week Carole Goble was awarded with the first Jim Gray 2008 e-Science award, pictured here collecting the prize from Tony Hey of Microsoft Research. You can read all about it in the Seattle Tech Report which says:

“As director of the U.K.’s ^myGrid project, Goble helped create Taverna, open source software that allows scientists to analyse complex data sets with a standard computer.”

It is very inspiring when colleagues win prizes and awards. Personally, I would not be here doing what I’m doing if it wasn’t for Carole and ^myGrid, and neither would many other people who work on (or have worked on) ^myGrid and related projects.

Carole, you are an inspiration to us all, congratulations! To celebrate your success, I’m off to commit some more of the seven deadly sins of bioinformatics [1]…

References

Carole Goble The Seven Deadly Sins of Bioinformatics
e-Science in Indianapolis: Carole Goble wins the 1st Jim Gray eScience Award
Joseph Tartakoff British professor given first Jim Gray Award, Seattle Post-Intelligencer, Tech Report
Todd Bishop UK prof receives Jim Gray award Tech Flash
Savas Parastatidis Carole Goble as the first recipient of the “Jim Gray eScience Award”
Microsoft Recognise Manchester e-Science Contribution
Deborah Gage Microsoft creates award in the name of Jim Gray San Francisco Chronicle, The Tech Chronicles
Microsoft New tools for Discovery on Display at e-Science workshop

Leave a Comment

September 5, 2007

WWW2007: Workflows on the Web

Filed under: web,web of science — Duncan Hull @ 10:01 pm
Tags: Anupriya Ankolekar, bioinformatics, climateprediction.net, Daniel Goodman, douglas adams, functional programming, martlet, myGrid, taverna, web, workflow, www, WWW2007

The Hitch-hiking novelist Douglas Noel Adams (DNA) once remarked that the World Wide Web (WWW) is the only thing whose shortened form – ‘double-you double-you double-you-dot’ – takes three times longer to say than what it’s “short” for [1]. If he were still with us today, there is plenty of stuff at the 16^th International World Wide Web conference (WWW2007), currently underway in Banff, that would interest him. Here are some short, abbreviated notes on a couple of interesting papers at this years conference. They are relevant to bioinformatics and worth reading, whichever type of DNA you’re most interested in.

One full paper [2] by Daniel Goodman describes a scientific workflow language called Martlet. The motivating example is taken from climateprediction.net but I suspect some of the points they make about scientific workflows are relevant to bioinformatics too. Just like the recent post by Boscoh about functional programming, the paper discusses an inspired-by-Haskell functional approach to building and running workflows. Comparisons with other workflow systems like Taverna / SCUFL are drawn. Despite what they say, Taverna already uses a functional model (not an imperative one), it just hasn’t been published yet. The paper also draws comparisons between Martlet and other functional systems, like Google’s Map-Reduce. It concludes that the (allegedly) new Martlet programming model “raises the interesting possibility of a whole set of new algorithms just waiting to be discovered once people start to think about programming in this new way”. Which is an exciting possibility.

Another position paper [3] (warning: position paper = arm waving) by Anupriya Ankolekar et al argues that the Semantic Web and Web-Two-Point-Oh are complementary, rather than competing. Their motivating examples are a bit lame (Blogging a movie? Can’t they think of something more original?) …but they make some interesting (and obvious) points. The authors think that aggregators like Yahoo! Pipes! will play an important role in the emerging Semantic Web. Currently, there don’t seem to be too many bioinformaticians using Yahoo! pipes, perhaps they just don’t share their pipes / workflows yet?

Running in parallel to all of the above is the Health Care and Life Sciences Data Integration for the Semantic Web workshop, where more detailed discussion on the bio semweb is underway. As its a workshop, there are no full or position papers, but take a look at The State of the Nation in Life Science Data integration to get a flavour of what is going on.

Wether functional, semantic, Web-enabled or just buzzword-friendly, there is plenty of action in the scientific workflow field right now. If you’re interested in the webby stuff, next years conference, WWW2008, is in Beijing, China. I wonder if they will mark the 10th anniversary of the publication of that Google paper at WWW7 back in 1998? The deadline for papers at WWW2008 will probably be sometime in November 2007, but around 90% of submitted papers will be rejected if previous years are anything to go by. If you’re thinking of doing a paper, DON’T PANIC about those intimidating statistics, because bioinformatics is bursting full of interesting and hard problems that challenge the state-of-the-art. The kind of stuff that will go down well at Dubya Dubya Dubya.

(Photo credit: Fire Monkey Fish)

References

Douglas Adams (1999) Beyond the Brochure: Build it and we will come
Daniel Goodman (2007) Introduction and Evaluation of Marlet, a Scientific Workflow Language for Abstracted Parallelisation doi:10.1145/1242572.1242705
Anupriya Ankolekar, Markus Krotzsch, Thanh Tran and Denny Vrandecic (2007) The Two Cultures: Mashing up Web 2.0 and the Semantic Web doi:10.1145/1242572.1242684

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

Leave a Comment

December 19, 2006

Taverna 1.5.0

Filed under: Uncategorized — Duncan Hull @ 8:26 pm
Tags: bioinformatics, biomart, feta, myGrid, semantic web, taverna, workflow

Happy Christmas from the myGrid team, who are pleased to announce the release of version 1.5.0 of the Open Source Taverna bioinformatics workflow toolkit [1]. This is now available for download on the Sourceforge site and includes some substantial changes to version 1.4.

Taverna 1.5.0 is a small download, but when first run it will then download and install the required packages which can take some time on slow networks. In the near future there will be a mechanism for downloading a bundle of core packages. There are some significant changes in the underlying architecture of Taverna and how it handles core packages and optional plugins, using a system called Raven, see release notes below.

The documentation is currently being updated and the user documentation should be complete very soon, with the technical documentation following shortly afterwards. The reason for this is to allow the software to be released with some time to spare before the Christmas holidays.

Release notes:

There have been a number of substantial changes in the underlying architecture of Taverna since the previous release. These include:

An overhaul of the User Interface (UI), replacing the unpopular Multiple Document Interface with a cleaner and simpler single document UI which can be customised using Perspectives. There are built in perspectives to allow the design and enactment of workflows, and plugins can integrate with the UI by providing perspectives of their own. Together with this, users are able to create their own layouts built from individual components.
Taverna now allows for multiple workflows to be open and enacted at the same time.
Support for the new BioMart data management system version 0.5, together with backward compatibility for old workflows that used Biomart 0.4.
Better provenance generation and browsing support, through a plugin now known as LogBook.
Better support for semantic service discovery through the Feta plugin [2].
Modulularisation of the Taverna code base.
Development and integration of an underlying architecture know as Raven. This allows for Apache Maven like declaration of dependencies which are discovered and incorporated into the Taverna system at runtime. Together with the modularisation of the Taverna code base, Raven gives the benefit that updates can be provided dynamically and incrementally, without the need for monolithic releases as in the past. This allows the provision of updates to bugs, and new features, within a very short timescale if necessary. It also provides plugin developers with a greater degree of autonomy and independance from the core Taverna code base.
Improved and more advanced plugin management with the ability to provide immediate updates, and for plugin providers to publish their plugins via xml descriptions.
Numerous bug-fixes including the removal of a number of memory leaks.

JIRA generated release notes and bug status reports can be found here and here

References

Leave a Comment

May 26, 2006

BioGrids: From Tim Bray to Jim Gray (via Seymour Cray)

Filed under: biotech — Duncan Hull @ 11:30 pm
Tags: BLAST, FASTA, Globus, Globus Toolkit, Grid, HPC, Jim Gray, myGrid, nodalpoint, sequence jockey, Seymour Cray, Tim Bray

Grid Computing already plays an important role in the life sciences, and will probably continue doing so for the forseeable future. BioGrid (Japan), ^myGrid (UK) and CoreGrid (Europe) are just three current examples, there are many more Grid and Super Duper Computer projects in the life sciences. So, is there an accessible Hitch Hikers Guide to the Grid for newbies, especially bioinformaticians?

Unfortunately much of the literature of Grid Computing is esoteric and inaccessible, liberally sprinkled with abstract and wooly concepts like “Virtual Organisations” with a large side-order of acronym soup. This makes it difficult or impossible for the everyday bioinformatican to understand or care about. Thankfully, Tim Bray from Sun Microsystems has a written an accessible review of the area, “Grids for dummies”, if you like. Its worth a read if you’re a bioinformatician with a need for more heavyweight distributed computing than the web currently provides, but you find Grid-speak is usually impenetrable nonsense.

One of the things Tims discusses in his review is Microsoftie Jim Gray, who is partly responsible for the 2020 computing initiative mentioned on nodalpoint earlier. Tim describes Jim’s article Distributed Computing Economics. In this, Jim uses wide variety of examples to illustrate the current economics of grids, from “Megaservices” like Google, Yahoo! and Hotmail to the bioinformaticians favourites, BLAST and FASTA. So how might Grids affect the average bioinformatician? There are many different applications of Grid computing, but two areas spring to mind:

Running your in silico experiments (genome annotation, sequence analysis, protein interactions etc), using someone elses memory, disk space, processors on the Grid. This could mean you will be able to do your experiments more quickly and reliably than you can using the plain ol’ Web.
Executing high-throughput and long-running experiments, e.g. you’ve got a ton of microarray data and it takes hours or possibly days to analyse computationally.

So if you deal with microarray data daily, you probably know all this stuff already, but Tims overview and Jims commentary are both accessible pieces to pass on to your colleagues in the lab. If this kind of stuff pushes your button, you might also be interested in the eProtein Scientific Meeting and Workshop Proceedings.

[This post was originally published on nodalpoint with comments.]

Leave a Comment

O'Really?

December 10, 2008

Congratulations Carole Goble, e-Scientist

References

September 5, 2007

WWW2007: Workflows on the Web

References

December 19, 2006

Taverna 1.5.0

Release notes:

References

May 26, 2006

BioGrids: From Tim Bray to Jim Gray (via Seymour Cray)

Meta / μετά