O'Really?

June 10, 2009

Kenjiro Taura on Parallel Workflows

Kenjiro TauraKenjiro Taura is visting Manchester next week from the Department of Information and Communication Engineering at the University of Tokyo. He will be doing a seminar, the details of which are below:

Title: Large scale text processing made simple by GXP make: A Unixish way to parallel workflow processing

Date-time: Monday, 15 June 2009 at 11:00 AM

Location: Room MLG.001, mib.ac.uk

In the first part of this talk, I will introduce a simple tool called GXP make. GXP is a general purpose parallel shell (a process launcher) for multicore machines, unmanaged clusters accessed via SSH, clusters or supercomputers managed by batch scheduler, distributed machines, or any mixture thereof. GXP make is a ‘make‘ execution engine that executes regular UNIX makefiles in parallel. Make, though typically used for software builds, is in fact a general framework to concisely describe workflows constituting sequential commands. Installation of GXP requires no root privileges and needs to be done only on the user’s home machine. GXP make easily scales to more than 1,000 CPU cores. The net result is that GXP make allows an easy migration of workflows from serial environments to clusters and to distributed environments. In the second part, I will talk about our experiences on running a complex text processing workflow developed by Natural Language Processing (NLP) experts. It is an entire workflow that processes MEDLINE abstracts with deep NLP tools (e.g., Enju parser [1]) to generate search indices of MEDIE, a semantic retrieval engine for MEDLINE. It was originally described in Makefile without a particular provision to parallel processing, yet GXP make was able to run it on clusters with almost no changes to the original Makefile. Time for processing abstracts published in a single day was reduced from approximately eight hours (with a single machine) to twenty minutes with a trivial amount of efforts. A larger scale experiment of processing all abstracts published so far and remaining challenges will also be presented.

References

  1. Miyao, Y., Sagae, K., Saetre, R., Matsuzaki, T., & Tsujii, J. (2008). Evaluating contributions of natural language parsers to protein-protein interaction extraction Bioinformatics, 25 (3), 394-400 DOI: 10.1093/bioinformatics/btn631

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: