O'Really?

June 10, 2009

Kenjiro Taura on Parallel Workflows

Kenjiro TauraKenjiro Taura is visting Manchester next week from the Department of Information and Communication Engineering at the University of Tokyo. He will be doing a seminar, the details of which are below:

Title: Large scale text processing made simple by GXP make: A Unixish way to parallel workflow processing

Date-time: Monday, 15 June 2009 at 11:00 AM

Location: Room MLG.001, mib.ac.uk

In the first part of this talk, I will introduce a simple tool called GXP make. GXP is a general purpose parallel shell (a process launcher) for multicore machines, unmanaged clusters accessed via SSH, clusters or supercomputers managed by batch scheduler, distributed machines, or any mixture thereof. GXP make is a ‘make‘ execution engine that executes regular UNIX makefiles in parallel. Make, though typically used for software builds, is in fact a general framework to concisely describe workflows constituting sequential commands. Installation of GXP requires no root privileges and needs to be done only on the user’s home machine. GXP make easily scales to more than 1,000 CPU cores. The net result is that GXP make allows an easy migration of workflows from serial environments to clusters and to distributed environments. In the second part, I will talk about our experiences on running a complex text processing workflow developed by Natural Language Processing (NLP) experts. It is an entire workflow that processes MEDLINE abstracts with deep NLP tools (e.g., Enju parser [1]) to generate search indices of MEDIE, a semantic retrieval engine for MEDLINE. It was originally described in Makefile without a particular provision to parallel processing, yet GXP make was able to run it on clusters with almost no changes to the original Makefile. Time for processing abstracts published in a single day was reduced from approximately eight hours (with a single machine) to twenty minutes with a trivial amount of efforts. A larger scale experiment of processing all abstracts published so far and remaining challenges will also be presented.

References

  1. Miyao, Y., Sagae, K., Saetre, R., Matsuzaki, T., & Tsujii, J. (2008). Evaluating contributions of natural language parsers to protein-protein interaction extraction Bioinformatics, 25 (3), 394-400 DOI: 10.1093/bioinformatics/btn631

May 21, 2009

Upcoming Gig: The Italian Job at NETTAB

NETTAB: Network Tools and Applications in BiologyNetwork Tools and Applications in Biology (NETTAB) is a series of workshops in Bioinformatics. It focuses on the most promising and innovative ICT tools and their utility in Bioinformatics. These workshops aim to introduce participants to the evolving network standards and technologies that are being applied to the field of biology.

Since 2001, the NETTAB workshops have being doing a Giro d’Italia or  Grand Tour of Italy; Genova, Bologna, Naples, Sardinia, Lake Como and Pisa have all played host to the workshop. This year, NETTAB 2009 is in Catania at the Università degli Studi di Catania in Sicily close to Mount Etna.

There is special theme for this years workshop, held on June 10-13, on Technologies, Tools and Applications for Collaborative and Social Bioinformatics Research and Development. So I’m very pleased that Paolo Romano asked me to do a keynote presentation (w00t!) on the work we have been doing in the REFINE project and myExperiment. Grazie Paolo, grazie. And thanks Carole Goble too for the recommendation.

If you’re going to NETTAB this year, see you there. If you’d like to come, today is the last day for the early bird discount, sign up at the registration page. The scientific programme looks interesting, it will be good to meet Alex Bateman and Tim Clark and the rest of this years speakers.

Now, if my keynote presentation is going to (as Michael Caine once famously said [1]) “blow the bl**dy doors off” [2], it needs loads more work. So I’d better get back to it. Ciao!

[Update: See reports from day one, day two and day three of NETTAB 2009.]

References

  1. Peter Collinson and Troy Kennedy-Martin (1969) The Italian Job
  2. Michael Caine (1969) “You’re only supposed to blow the bl**dy doors off!”
  3. Cannata, N., Schröder, M., Marangoni, R., & Romano, P. (2008). A Semantic Web for bioinformatics: goals, tools, systems, applications BMC Bioinformatics, 9 (Suppl 4) DOI: 10.1186/1471-2105-9-S4-S1

May 6, 2009

Michel Dumontier on Representing Biochemistry

Michel Dumontier by Tom HeathMichel Dumontier is visiting Manchester this week, he will be doing a seminar on Monday 11th of May,  here are some details for anyone who is interested in attending:

Title: Increasingly Accurate Representation of Biochemistry

Speaker: Michel Dumontier, dumontierlab.com

Time: 14.00, Monday 11th May 2009
Venue: Atlas 1, Kilburn Building, University of Manchester, number 39 on the Google Campus Map

Abstract: Biochemical ontologies aim to capture and represent biochemical entities and the relations that exist between them in an accurate manner. A fundamental starting point is biochemical identity, but our current approach for generating identifiers is haphazard and consequently integrating data is error-prone. I will discuss plausible structure-based strategies for biochemical identity whether it be at molecular level or some part thereof (e.g. residues, collection of residues, atoms, collection of atoms, functional groups) such that identifiers may be generated in an automatic and curator/database independent manner. With structure-based identifiers in hand, we will be in a position to more accurately capture context-specific biochemical knowledge, such as how a set of residues in a binding site are involved in a chemical reaction including the fact that a key nitrogen atom must first be de-protonated. Thus, our current representation of biochemical knowledge may improve such that manual and automatic methods of biocuration are substantially more accurate.

Update: Slides are now available via SlideShare.

[Creative Commons licensed picture of Michel in action at ISWC 2008 from Tom Heath]

References

  1. Michel Dumontier and Natalia Villanueva-Rosales (2009) Towards pharmacogenomics knowledge discovery with the semantic web Briefings in Bioinformatics DOI:10.1093/bib/bbn056
  2. Doug Howe et al (2008) Big data: The future of biocuration Nature 455, 47-50 doi:10.1038/455047a

March 16, 2009

March 12, 2009

Defrosting the Digital Seminar

The Lecture by James M ThorneCasey Bergman suggested it, Jean-Marc Schwartz organised it, so now I’m going to do it: a seminar on our Defrosting the Digital Library paper as part of the Bioinformatics and Functional Genomics seminar series. Here is the abstract of the talk:

After centuries with little change, scientific libraries have recently experienced massive upheaval. From being almost entirely paper-based, most libraries are now almost completely digital. This information revolution has all happened in less than 20 years and has created many novel opportunities and threats for scientists, publishers and libraries.

Today, we are struggling with an embarrassing wealth of digital knowledge on the Web. Most scientists access this knowledge through some kind of digital library, however these places can be cold, impersonal, isolated, and inaccessible places. Many libraries are still clinging to obsolete models of identity, attribution, contribution, citation and publication.

Based on a review published in PLoS Computational Biology, http://pubmed.gov/18974831 this talk will discuss the current chilly state of digital libraries for biologists, chemists and informaticians, including PubMed and Google Scholar. We highlight problems and solutions to the coupling and decoupling of publication data and metadata, with a tool called http://www.citeulike.org. This software tool exploits the Web to make digital libraries “warmer”: more personal, sociable, integrated, and accessible places.

Finally issues that will help or hinder the continued warming of libraries in the future, particularly the accurate identity of authors and their publications, are briefly introduced. These are discussed in the context of the BBSRC funded REFINE project, at the National Centre for Text Mining (NaCTeM.ac.uk), which is linking biochemical pathway data with evidence for pathways from the PubMed database.

Date: Monday 16th March 2008, Time: 12.00 midday, Location: Michael Smith Building, Main lecture theatre, Faculty of Life Sciences, University of Manchester (number 71 on google map of the Manchester campus). Please come along if you are interested…

[CC licensed picture above, “The Lecture” at Speakers Corner by James M Thorne]

December 10, 2008

Congratulations Carole Goble, e-Scientist

Carole Goble wins first Jim Gray e-Science awardAt the Microsoft e-Science workshop in Indianapolis, earlier this week Carole Goble was awarded with the first Jim Gray 2008 e-Science award, pictured here collecting the prize from Tony Hey of Microsoft Research. You can read all about it in the Seattle Tech Report which says:

“As director of the U.K.’s myGrid project, Goble helped create Taverna, open source software that allows scientists to analyse complex data sets with a standard computer.”

It is very inspiring when colleagues win prizes and awards. Personally, I would not be here doing what I’m doing if it wasn’t for Carole and myGrid, and neither would many other people who work on (or have worked on) myGrid and related projects.

Carole, you are an inspiration to us all, congratulations! To celebrate your success, I’m off to commit some more of the seven deadly sins of bioinformatics [1]…

References

  1. Carole Goble The Seven Deadly Sins of Bioinformatics
  2. e-Science in Indianapolis: Carole Goble wins the 1st Jim Gray eScience Award
  3. Joseph Tartakoff British professor given first Jim Gray Award, Seattle Post-Intelligencer, Tech Report
  4. Todd Bishop UK prof receives Jim Gray award Tech Flash
  5. Savas Parastatidis Carole Goble as the first recipient of the “Jim Gray eScience Award”
  6. Microsoft Recognise Manchester e-Science Contribution
  7. Deborah Gage Microsoft creates award in the name of Jim Gray San Francisco Chronicle, The Tech Chronicles
  8. Microsoft New tools for Discovery on Display at e-Science workshop

July 25, 2008

How to spend a £400 million Science budget

A thought experiment with lots of money

The Queens Ahead by canonsnapperThe Biotechnology and Biological Sciences Research Council (BBSRC) is the United Kingdom’s funding agency for academic research and training in the non-clinical life sciences. It supports a total of around 1600 scientists and 2000 research students in universities and institutes in the UK. The head of our laboratory, Douglas Kell, has recently been appointed Chief Executive of the BBSRC [1]. Congratulations Doug, we wish you the very best in your new job. Now, according to bbsrc.ac.uk, their annual budget is a cool £400 million (just short of $800 million or €500 million). This has left me wondering, how would you spend a £400 million Science budget for the life sciences? For the purposes of this article, imagine it was you that had been put in charge of said budget, and Prime Minister Gordon Brown (texture like sun) had given you, yes YOU, a big bag of cash to distribute as you see fit. A mouth-watering prospect, I think you’ll agree. Here, is my personal opinion of how, in my dreams, I would spend the money. (more…)

April 25, 2008

WWW2008: The Great Firewall of China

Passage [The Great Wall / Beijing] by d'n'cThe seventeenth international World Wide Web conference (WWW2008.org) is currently finishing in Beijing, China. There are some interesting papers this year. Thankfully, the Great Firewall of China doesn’t prevent these papers reaching the rest of the world. It’s One World, One Web (allegedly). Here are some brief highlights from the conference. (more…)

April 14, 2008

Ensemblog: The Ensembl Weblog

Filed under: biotech — Duncan Hull @ 11:51 am
Tags: , , , , , ,

Pongo pygmaeus abeliiThe Ensembl Weblog provides news, views and announcements about the Ensembl Genome Browser. The blog has been going for a few years now, but I have only just become aware of it thanks to a recent Ensembl Genome Browser Tutorial by Bert Overduin. Catching up on posts from Ensemblians this year, Ewan Birney wrote a piece about The Gene Love-in last week and Paul Flicek briefly described the 1000 Genomes project back in January. The Ensembl Weblog is fairly low traffic, so if you don’t already read it, it’s worth considering subscribing to the feed.

And it’s good to see more scientists using blogs to communicate. Long may this trend continue!

(more…)

March 7, 2008

BioBlogs 19: Bioengineering

BioHazardBio::Blogs is a monthly bioinformatic-related blog journal. This issue, number 19, is hosted here at O’Really? and focuses on the the fascinating relationship between Biology and Engineering. Below, for your reading pleasure, is a brief roundup of blog posts during February-ish 2008, and a few other related Bioengineering resources.

(more…)

« Previous PageNext Page »

Blog at WordPress.com.