October 17, 2007

The Luxuriant Flowing Hair Club for Scientists (LFHCfS)

Filed under: Uncategorized — Duncan Hull @ 9:18 pm
Tags: , , , , , ,

Falk Schuch, Andreas Linsner and Kai Jung
Calling all Scientists, is your hair luxuriant and flowing? Perhaps you’re a bouffant bioinformatician, a hairy hacker or share a lab with somebody who is? If this is you, its high-time you joined the Luxuriant Flowing Hair Club for Scientists.

To propose somebody for membership, send email to Marc Abrahams at Harvard University marca /ate/ chem2.harvard.edu. Your email needs to include evidence of your luxuriant, flowing hair (a photo) and your credentials as a scientist. Some current members have impressive hair, see Simon Gregory, Carlisle Landel and Sterling Paramore for examples. Honorary and historical members include Dr. Brian May (Queen guitarist / astrophysicist), Dimitry Mendleyev and Albert Einstein, “Physicist. Bon vivant. A bold experimentalist with hair”.

So, if you are a scientist with a copius coiffure, ask yourself, will you ever get another chance to be in such distinguished company?

September 5, 2007

Semantic Biomedical Mashups with Connotea

Mashup or Shutup

The Journal of Biomedical Informatics (JBI), will soon be publishing their special issue on Semantic Biomedical Mashups (can you fit any more buzzwords into a Call For Papers?!). Ben Good and friends have submitted a paper on their Entity Describer which extends connotea using some Semantic Web goodness. They’d appreciate your comments on their submitted manuscript over at i9606. As Ben says, their pre-publication turns out to be an interesting experiment “figuring out how blogging might fit into the academic publishing landscape”. If this interests you, get commenting now!

Update: Just spotted this interesting graphic of the Elsevier / Evilsevier logo (snigger), who are the publishers of JBI…

April 13, 2007

Collaboration, collaboration, collaboration!

Geldof Blair collaborationWhat should your three main priorities be as a Scientist? Collaboration, collaboration, collaboration. Quentin Vicens and Phil Bourne have just published Ten Simple Rules for a Successful Collaboration [1] to help you do just that, as part of a continuing series [2,3,4,5].

Tony Bliar once said “Ask me my three main priorities for government, and I tell you: education, education, education.” In Science, its not so much about education as collaboration, collaboration, collaboration. The advice in Ten Simple Rules is all useful stuff, but what caught my eye is the fact that collaboration is on the rise, at least according to the number of co-authors on papers published in PNAS. The average number of co-authors has risen from 3.9 in 1981 to 8.4 in 2001. So before you publish or perish, it seems likely that you’ll also need to collaborate or commiserate… less laboratory, more collaboratory!

Photo credit Garret Keogh


  1. Quentin Vicens and Phillip Bourne (2007) Ten Simple Rules for a Successful Collaboration PLOS Computational Biology
  2. Phillip Bourne (2006) Ten Simple Rules for Getting Published PLOS Computational Biology
  3. Philip Bourne and Iddo Friedberg (2006) Ten Simple Rules for Selecting a Postdoctoral Position PLOS Computational Biology
  4. Phillip Bourne and Leo Chalupa (2006) Ten Simple Rules for Getting Grants PLOS Computational Biology
  5. Phillip Bourne and Alon Korngreen (2006) Ten Simple Rules for Reviewers PLOS Computational Biology
  6. This post originally published on nodalpoint with comments

Creative Commons License

This work is licensed under a

Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

February 22, 2007

NSPNAS: Nature, Science or PNAS?

Filed under: publishing,Uncategorized — Duncan Hull @ 10:19 pm
Tags: , ,

A crude score for benchmarking scientists

TIM Have you ever wanted to compare different scientists by their publication record? It’s not always an easy task, but here is a crude and handy way to benchmark people by their journal publications in Nature, Science or PNAS using PubMed. Let’s call it the NSPNAS score, it’s not the h-index and it’s far from perfect, but it can be useful.

Imagine these scenarios:

  1. You’re a young scientist comtemplating who to do an undergraduate project, Masters degree or PhD with.
  2. You’ve finished your PhD and are wondering which lab could be your Stairway to PostDoc Heaven [1].
  3. You’re lucky enough to have landed a faculty position and you want to check the credibility of your new colleagues.
  4. You want to do some industrial espionage on your competitors in different labs around the world.
  5. You’re a Scientist dammit, and naturally you’re a curious person who just likes to measure things.

In any of these situations, you’ll probably want to look up the people concerned using Google Scholar which will give you a good idea of their research history. But you’re not interested in publications in the Journal of Few Subscribers or the Proceedings of the Boring Incomprehensible Nonsense Society (BINS), even if Google Scholar lists hundreds of their citations. Instead, you care about counting the Big Bang impact publications they have in the über-journals: Nature, Science and PNAS. You can find these publications in PubMed with this simple query:

Surname +Initials[au]+(nature[journal] or science[journal] or Proc Natl Acad Sci U S A[journal])

…and you can obviously modify this query to include popular journals from your own field as appropriate.

Where NSPNAS works

Note, NSPNAS scores were correct at the time of writring in 2007, but will change over time.

When you substitute an authors name and initials into the beginning of that query, you get your NSPNAS score. So Systems Biologist Douglas Kell for example, surname and initials “Kell+D[au]”, has an NSPNAS score of 6.

If the person in question has a unique or unusual surname and initials, its fairly easy to find their score: Nodalpointer Chris Mungall has an NSPNAS score of two while nodalpointer Jason Stajich has an NSPNAS score of three. These results suggest a positive correlation between Californian sunshine and NSPNAS. Meanwhile, back in rainy old Britain, Ensemblian Ewan Birney scores a formidable sixteen, which is just scary for a bloke in his thirties.

Where NSPNAS doesn’t work

Unfortunately, authors with common names like John Smith (who has more than 340 hits) can’t be easily benchmarked with this type of query, without trawling through hundreds of false positives. More importantly, some influential scientists score very low or zero, despite the fact that their work has been important in the world of biomedical science an beyond. This is especially true for Computer Scientists, Mathematicians and Informaticians, for example:

Many important members of the Dead Scientists Society also have low NSPNAS scores…


All these statistics remind us that many important ideas, techniques and results are not published in Nature, Science or PNAS and others are excluded from the PubMed index completely. It also confirms what we already know about peer-reviewed Journal publications not being the be-all and end-all of Engineering, Science or Medicine [3]. But NSPNAS still has its uses, provided the people you’re benchmarking have a rare name and didn’t snuff it before the PubMed index starts.

What is your NSPNAS score? If like me, you score a spectacular “nul points”, console yourself with the fact that you’re in good company with that score and given time, maybe you can change it.


  1. Jimmy Page and Robert Plant (1971) Stairway to Heaven
  2. Most of the Clay Mathematics Institute Millenium Prizes are still up for grabs if you get disillusioned with bioinformatics, fancy some fame and winning a million dollar fortune!
  3. Michael Seringhaus and Mark Gerstein (2007) Publishing perishing? Towards tomorrow’s information architecture BMC Bioinformatics 2007, 8:17 DOI:10.1186/1471-2105-8-17
  4. This post originally on nodalpoint, with comments

Creative Commons License

This work is licensed under a

Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

January 5, 2007

NAR Database Issue 2007: Not Waving But Drowning?

The 14th annual Nucleic Acids Research (NAR) database issue 2007 has just been published, open-access. This year is the largest yet (again) with 968 molecular biology databases listed, 110 more than the previous one (see figure below). In the world of biological databases, are we waving or drowning?

NAR Database Growth 2007

Nine hundred and sixty eight is a lot of databases, and even that mind-boggling number is not an exhaustive or comprehensive tally. But is counting all these databases waving or drowning [1]? Will we ever stop stamp-collecting the databases and tools we have in molecular biology? What prompted this is, an employee of the The Boeing Company once told me they have given up counting their databases because there were just too many. Just think of all the databases of design and technical documentation that accompanies the myriad of different aircraft that Boeing manufacture, like the iconic 747 jumbo jet. Now, combine that with all the supply chain, customer and employee information and you can begin to imagine the data deluge that a large multi-national corporation has to handle.

Like Boeing, in Biology we’ve clearly got more data than we know what to do with [2,3]. It won’t be news to bioinformaticians and its been said many times before but its worth repeating again here:

  • We know how many databases we have but we don’t know what a lot of the data in these databases means, think of all those mystery proteins of unknown function. It will obviously take time until we understand it all…
  • Most of the data only begins to make sense when it is integrated or mashed-up with other data. However, we still don’t know how to integrate all these databases, or as Lincoln Stein puts it “so far their integration has proved problematic” [4], a bit of an understatement. Many grandiose schemes for the “integration” of biological databases have been proposed over the years, but unfortunately none have been practical to the point of implementation [5]

Despite this, it is still useful to know how many molecular biology databases there are. At least we know how many databases we are drowning in. Thankfully, unlike Boeing, most biological data, algorithms and tools are open-source and more literature is becoming open access which will hopefully make progress more rapid. But biology is more complicated than a Boeing 747, so we’ve got a long-haul flight ahead of us. OK, I’ve managed to completely overstretch that aerospace analogy now so I’ll stop there.

Whatever databases you’ll be using in 2007, have a Happy New Year mining, exploring and understanding the data they contain, not drowning in it.


  1. Stevie Smith (1957) Not waving but drowning
  2. Michael Galperin (2007) The Molecular Biology Database Collection: 2007 update Nucleic Acids Research, Vol. 35, Database issue. DOI:10.1093/nar/gkl1008
  3. Alex Bateman (2007) Editorial: What makes a good database? Nucleic Acids Research, Vol. 35, Database issue. DOI:10.1093/nar/gkl1051
  4. Lincoln Stein (2003) Biological Database Integration Nature Reviews Genetics. 4 (5), 337-45. DOI:10.1038/nrg1065
  5. Michael Ashburner (2006) Keynote at the Pacific Symposium on Biocomputing (PSB2006) in Hawaii seeAlso Aloha: Biocomputing in Hawaii
  6. This post originally published on nodalpoint with comments

Creative Commons License
This work is licensed under a

Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

December 19, 2006

Taverna 1.5.0

Filed under: Uncategorized — Duncan Hull @ 8:26 pm
Tags: , , , , , ,

Happy Christmas from the myGrid team, who are pleased to announce the release of version 1.5.0 of the Open Source Taverna bioinformatics workflow toolkit [1]. This is now available for download on the Sourceforge site and includes some substantial changes to version 1.4.

IMGP4570Taverna 1.5.0 is a small download, but when first run it will then download and install the required packages which can take some time on slow networks. In the near future there will be a mechanism for downloading a bundle of core packages. There are some significant changes in the underlying architecture of Taverna and how it handles core packages and optional plugins, using a system called Raven, see release notes below.

The documentation is currently being updated and the user documentation should be complete very soon, with the technical documentation following shortly afterwards. The reason for this is to allow the software to be released with some time to spare before the Christmas holidays.

Release notes:

There have been a number of substantial changes in the underlying architecture of Taverna since the previous release. These include:

  • An overhaul of the User Interface (UI), replacing the unpopular Multiple Document Interface with a cleaner and simpler single document UI which can be customised using Perspectives. There are built in perspectives to allow the design and enactment of workflows, and plugins can integrate with the UI by providing perspectives of their own. Together with this, users are able to create their own layouts built from individual components.
  • Taverna now allows for multiple workflows to be open and enacted at the same time.
  • Support for the new BioMart data management system version 0.5, together with backward compatibility for old workflows that used Biomart 0.4.
  • Better provenance generation and browsing support, through a plugin now known as LogBook.
  • Better support for semantic service discovery through the Feta plugin [2].
  • Modulularisation of the Taverna code base.
  • Development and integration of an underlying architecture know as Raven. This allows for Apache Maven like declaration of dependencies which are discovered and incorporated into the Taverna system at runtime. Together with the modularisation of the Taverna code base, Raven gives the benefit that updates can be provided dynamically and incrementally, without the need for monolithic releases as in the past. This allows the provision of updates to bugs, and new features, within a very short timescale if necessary. It also provides plugin developers with a greater degree of autonomy and independance from the core Taverna code base.
  • Improved and more advanced plugin management with the ability to provide immediate updates, and for plugin providers to publish their plugins via xml descriptions.
  • Numerous bug-fixes including the removal of a number of memory leaks.

JIRA generated release notes and bug status reports can be found here and here


  1. Peer-reviewed publications about the Taverna workbench in PubMed
  2. Feta: A Light-Weight Architecture for User Oriented Semantic Service Discovery
  3. BioMoby extensions to the Taverna workflow management and enactment software

December 1, 2006

NAR Web Server Issue: Walking in a Webby Wonderland

Filed under: Uncategorized — Duncan Hull @ 3:18 pm
Tags: , , , , , , ,

WonderlandHave you recently built a bioinformatics web application useful to the wider community that you’d like to tell the world about? Are you also looking to score brownie points for a rigourously peer-reviewed publication that stands a reasonable chance of being well cited? If that’s you, then you have one month from today (December 1st) to sort your code out, and get your abstract in, for the fifth annual Nucleic Acids Research (NAR) Web Server issue published by Oxford University Press (OUP) in 2007. All articles in this issue are published under an open access model.

As regular visitors to nodalpoint will already know, every year NAR publishes two special issues: one on databases (annually in January since 1993) and the other on web servers (annually in July since 2003). Authors interested in pre-submitting abstracts for the 2007 Web Server Issue should read the Instructions to Authors for Web Server papers in NAR and send an abstract to Gary Benson at Boston University before December 31st 2006. The deadline for final submission of full articles is January 31st 2007. Gary Benson has taken over this year from previous web server issue editor, Nobel laureate and Ignobel participant, Richard Roberts [1].

One advantage of publishing your application paper in NAR, instead of alternative open access journals like Source Code for Biology and Medicine (SCFBM), is a listing in the bioinformatics links directory [2] and a bigger impact factor [3] of 7.6, if you care about these things. There are of course, disadvantages of publishing with OUP in NAR, like the expensive open access publishing fees of $1185 to $2370 per article which are debateable value-for-money. If you’re living in a ‘List A’ developing country these charges are waived, which makes it tempting to set up a laboratory in Malawi to evade payment…

Anyway, does anyone out there know how OUP prices compare with the complicated Biomed Central membership fees which are presumably required for publication in SCFBM? Another leading open access publisher, the Public Library of Science (PLOS) currently charges from $2000 to $2500 for open access publication. Maybe I’m missing something, but aren’t these charges a lot of money to pay an administrator to shuffle a few bits of paper around and run a web server? Don’t let that put you off submitting your paper though, because in Science and academia you will either publish or perish. This is where the web is your friend because free online web availability substantially increases a paper’s impact.

On a lighter note, and now that the festive season is upon us, I’ll hand over to the Christmas crooner Perry Como to sign off:

♫ Sleigh bells ring, are you listening? In the lane, snow is glistening. A beautiful sight, We’re happy tonight, Walking in a webby wonderland. ♫

Creative Commons License

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

June 28, 2006

Marginal Power

Filed under: Uncategorized — Duncan Hull @ 11:03 pm
Tags: , , , ,

Garage doorLISP Hacker and Painter Paul Graham writes entertaining essays about technology. His latest piece, discusses how important and sometimes lucrative ideas usually come from the “garage” outside rather than the inside, what he calls The Power of the Marginal. His essay rambles a bit in places, but has some interesting observations that are relevant to bioinformatics. For example…

“…if you’re an outsider you should actively seek out contrarian projects. Instead of working on things the eminent have made prestigious, work on things that could steal that prestige.”

Paul did a PhD in Computer Science and has fond memories of being a student which will ring true with anyone who has been there:

“That’s what I remember about grad school: apparently endless supplies of time, which I spent worrying about, but not writing, my dissertation.”

PhDs and obscurity go hand-in-hand and according to this essay, obscurity and marginality is good for you. It doesn’t taste as good as junk food but is allegedly “good for you”. Pauls personal choice of marginality is the relatively obscure language called LISP, and the people I’ve met who use this langugage are either crazy or at the top of their game, sometimes both. Does LISP turn people crazy or are crazy people attracted to the obscurity of LISP?

Either way, Paul Grahams occasionally crazy essays are worth a read if and when you have a moment to spare. Even better, read them when you don’t have the time and are procrastinating writing your PhD thesis or next Bioinformatics paper.

Further reading

  1. Structure and Interpretation of LISP programs
  2. Most grad students are stuck on problems they don’t like
  3. Startups and garages in bioinformatics: The effect of software patents
  4. Garage Genomics and bio-hackers
  5. Lisp as an Alternative to Java by Peter Norvig, Director of Research at Google
« Previous Page

Blog at WordPress.com.