O'Really?

February 22, 2007

NSPNAS: Nature, Science or PNAS?

Filed under: publishing,Uncategorized — Duncan Hull @ 10:19 pm
Tags: , ,

A crude score for benchmarking scientists

TIM Have you ever wanted to compare different scientists by their publication record? It’s not always an easy task, but here is a crude and handy way to benchmark people by their journal publications in Nature, Science or PNAS using PubMed. Let’s call it the NSPNAS score, it’s not the h-index and it’s far from perfect, but it can be useful.

Imagine these scenarios:

  1. You’re a young scientist comtemplating who to do an undergraduate project, Masters degree or PhD with.
  2. You’ve finished your PhD and are wondering which lab could be your Stairway to PostDoc Heaven [1].
  3. You’re lucky enough to have landed a faculty position and you want to check the credibility of your new colleagues.
  4. You want to do some industrial espionage on your competitors in different labs around the world.
  5. You’re a Scientist dammit, and naturally you’re a curious person who just likes to measure things.

In any of these situations, you’ll probably want to look up the people concerned using Google Scholar which will give you a good idea of their research history. But you’re not interested in publications in the Journal of Few Subscribers or the Proceedings of the Boring Incomprehensible Nonsense Society (BINS), even if Google Scholar lists hundreds of their citations. Instead, you care about counting the Big Bang impact publications they have in the über-journals: Nature, Science and PNAS. You can find these publications in PubMed with this simple query:

Surname +Initials[au]+(nature[journal] or science[journal] or Proc Natl Acad Sci U S A[journal])

…and you can obviously modify this query to include popular journals from your own field as appropriate.

Where NSPNAS works

Note, NSPNAS scores were correct at the time of writring in 2007, but will change over time.

When you substitute an authors name and initials into the beginning of that query, you get your NSPNAS score. So Systems Biologist Douglas Kell for example, surname and initials “Kell+D[au]”, has an NSPNAS score of 6.

If the person in question has a unique or unusual surname and initials, its fairly easy to find their score: Nodalpointer Chris Mungall has an NSPNAS score of two while nodalpointer Jason Stajich has an NSPNAS score of three. These results suggest a positive correlation between Californian sunshine and NSPNAS. Meanwhile, back in rainy old Britain, Ensemblian Ewan Birney scores a formidable sixteen, which is just scary for a bloke in his thirties.

Where NSPNAS doesn’t work

Unfortunately, authors with common names like John Smith (who has more than 340 hits) can’t be easily benchmarked with this type of query, without trawling through hundreds of false positives. More importantly, some influential scientists score very low or zero, despite the fact that their work has been important in the world of biomedical science an beyond. This is especially true for Computer Scientists, Mathematicians and Informaticians, for example:

Many important members of the Dead Scientists Society also have low NSPNAS scores…

Conclusions

All these statistics remind us that many important ideas, techniques and results are not published in Nature, Science or PNAS and others are excluded from the PubMed index completely. It also confirms what we already know about peer-reviewed Journal publications not being the be-all and end-all of Engineering, Science or Medicine [3]. But NSPNAS still has its uses, provided the people you’re benchmarking have a rare name and didn’t snuff it before the PubMed index starts.

What is your NSPNAS score? If like me, you score a spectacular “nul points”, console yourself with the fact that you’re in good company with that score and given time, maybe you can change it.

References

  1. Jimmy Page and Robert Plant (1971) Stairway to Heaven
  2. Most of the Clay Mathematics Institute Millenium Prizes are still up for grabs if you get disillusioned with bioinformatics, fancy some fame and winning a million dollar fortune!
  3. Michael Seringhaus and Mark Gerstein (2007) Publishing perishing? Towards tomorrow’s information architecture BMC Bioinformatics 2007, 8:17 DOI:10.1186/1471-2105-8-17
  4. This post originally on nodalpoint, with comments

Creative Commons License

This work is licensed under a

Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.


January 22, 2007

DNA mania

Filed under: bio — Duncan Hull @ 10:29 pm
Tags: , , , ,

What does DNA do when it’s not being transcribed into RNA? It causes DNA mania…

Quote of the Day

“DNA, you know, is Midas’ gold. Everyone who touches it goes mad.”

Maurice Wilkins

Read the rest in [1,2]

Do you or your colleagues ever suffer from DNA mania [3,4]? A biochemist friend of mine once semi-jokingly remarked that people’s manic obsession with DNA is a bit like buying some food and being more interested in the bar-code on the packaging, than the food inside. In his particular area of research, DNA is about as exciting as bar-codes, because it doesn’t even leave the nucleus of the cell, at least in Eukaryotes. I wonder what readers of nodalpoint think of this analogy? Anyway, as a result of this philosophy, most of his community have developed an unhealthy and manic interest in proteins rather than DNA. You could call this particular obsessive-compulsive disorder “protein mania”.

Depending on the scientific obsession(s) of your particular community, you might need to substitute Protein or RNA for DNA in the above quote, as appropriate. And if that is all too molecular for you, substitute any other of your favourite bioinformatics buzzwords.

References

  1. Horace Freeland Judson (1996) The Eighth Day of Creation: Makers of the Revolution in Biology
  2. John Sulston (2006) Won for All: How the Drosophila Genome was sequenced: a book by Michael Ashburner
  3. André Pichot (1999) Histoire de la notion de gène (one of the first documented uses of the phrase “DNA mania”)
  4. Denis Noble (2006) The Music of Life: Biology Beyond the Genome (an antidote to DNA mania and the Dawkinian gene-centric view of Life)
  5. DNA Photograph taken by Unapersona in Ciutat de les Arts i les Ciències, Calatrava building, Valencia, Spain.

January 5, 2007

NAR Database Issue 2007: Not Waving But Drowning?

The 14th annual Nucleic Acids Research (NAR) database issue 2007 has just been published, open-access. This year is the largest yet (again) with 968 molecular biology databases listed, 110 more than the previous one (see figure below). In the world of biological databases, are we waving or drowning?

NAR Database Growth 2007

Nine hundred and sixty eight is a lot of databases, and even that mind-boggling number is not an exhaustive or comprehensive tally. But is counting all these databases waving or drowning [1]? Will we ever stop stamp-collecting the databases and tools we have in molecular biology? What prompted this is, an employee of the The Boeing Company once told me they have given up counting their databases because there were just too many. Just think of all the databases of design and technical documentation that accompanies the myriad of different aircraft that Boeing manufacture, like the iconic 747 jumbo jet. Now, combine that with all the supply chain, customer and employee information and you can begin to imagine the data deluge that a large multi-national corporation has to handle.

Like Boeing, in Biology we’ve clearly got more data than we know what to do with [2,3]. It won’t be news to bioinformaticians and its been said many times before but its worth repeating again here:

  • We know how many databases we have but we don’t know what a lot of the data in these databases means, think of all those mystery proteins of unknown function. It will obviously take time until we understand it all…
  • Most of the data only begins to make sense when it is integrated or mashed-up with other data. However, we still don’t know how to integrate all these databases, or as Lincoln Stein puts it “so far their integration has proved problematic” [4], a bit of an understatement. Many grandiose schemes for the “integration” of biological databases have been proposed over the years, but unfortunately none have been practical to the point of implementation [5]


IMGP4592
Despite this, it is still useful to know how many molecular biology databases there are. At least we know how many databases we are drowning in. Thankfully, unlike Boeing, most biological data, algorithms and tools are open-source and more literature is becoming open access which will hopefully make progress more rapid. But biology is more complicated than a Boeing 747, so we’ve got a long-haul flight ahead of us. OK, I’ve managed to completely overstretch that aerospace analogy now so I’ll stop there.

Whatever databases you’ll be using in 2007, have a Happy New Year mining, exploring and understanding the data they contain, not drowning in it.

References

  1. Stevie Smith (1957) Not waving but drowning
  2. Michael Galperin (2007) The Molecular Biology Database Collection: 2007 update Nucleic Acids Research, Vol. 35, Database issue. DOI:10.1093/nar/gkl1008
  3. Alex Bateman (2007) Editorial: What makes a good database? Nucleic Acids Research, Vol. 35, Database issue. DOI:10.1093/nar/gkl1051
  4. Lincoln Stein (2003) Biological Database Integration Nature Reviews Genetics. 4 (5), 337-45. DOI:10.1038/nrg1065
  5. Michael Ashburner (2006) Keynote at the Pacific Symposium on Biocomputing (PSB2006) in Hawaii seeAlso Aloha: Biocomputing in Hawaii
  6. This post originally published on nodalpoint with comments

Creative Commons License
This work is licensed under a

Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.


December 19, 2006

Taverna 1.5.0

Filed under: Uncategorized — Duncan Hull @ 8:26 pm
Tags: , , , , , ,

Happy Christmas from the myGrid team, who are pleased to announce the release of version 1.5.0 of the Open Source Taverna bioinformatics workflow toolkit [1]. This is now available for download on the Sourceforge site and includes some substantial changes to version 1.4.

IMGP4570Taverna 1.5.0 is a small download, but when first run it will then download and install the required packages which can take some time on slow networks. In the near future there will be a mechanism for downloading a bundle of core packages. There are some significant changes in the underlying architecture of Taverna and how it handles core packages and optional plugins, using a system called Raven, see release notes below.

The documentation is currently being updated and the user documentation should be complete very soon, with the technical documentation following shortly afterwards. The reason for this is to allow the software to be released with some time to spare before the Christmas holidays.

Release notes:

There have been a number of substantial changes in the underlying architecture of Taverna since the previous release. These include:

  • An overhaul of the User Interface (UI), replacing the unpopular Multiple Document Interface with a cleaner and simpler single document UI which can be customised using Perspectives. There are built in perspectives to allow the design and enactment of workflows, and plugins can integrate with the UI by providing perspectives of their own. Together with this, users are able to create their own layouts built from individual components.
  • Taverna now allows for multiple workflows to be open and enacted at the same time.
  • Support for the new BioMart data management system version 0.5, together with backward compatibility for old workflows that used Biomart 0.4.
  • Better provenance generation and browsing support, through a plugin now known as LogBook.
  • Better support for semantic service discovery through the Feta plugin [2].
  • Modulularisation of the Taverna code base.
  • Development and integration of an underlying architecture know as Raven. This allows for Apache Maven like declaration of dependencies which are discovered and incorporated into the Taverna system at runtime. Together with the modularisation of the Taverna code base, Raven gives the benefit that updates can be provided dynamically and incrementally, without the need for monolithic releases as in the past. This allows the provision of updates to bugs, and new features, within a very short timescale if necessary. It also provides plugin developers with a greater degree of autonomy and independance from the core Taverna code base.
  • Improved and more advanced plugin management with the ability to provide immediate updates, and for plugin providers to publish their plugins via xml descriptions.
  • Numerous bug-fixes including the removal of a number of memory leaks.

JIRA generated release notes and bug status reports can be found here and here

References

  1. Peer-reviewed publications about the Taverna workbench in PubMed
  2. Feta: A Light-Weight Architecture for User Oriented Semantic Service Discovery
  3. BioMoby extensions to the Taverna workflow management and enactment software

December 12, 2006

Semantic Web for Life Sciences Book

Filed under: semweb — Duncan Hull @ 4:57 pm
Tags: ,

Revolutionizing Knowledge Discovery in the Life Sciences
All I want for Christmas is a book about the semantic web, written by people who are actually building and using it, rather than “visionaries” who don’t have to. Maybe this year I’ll be lucky…

A group of semantic webheads (aka HCLSIG the Health Care and Life Sciences Interest Group) led by Christopher J. Baker and Kei-Hoi Cheung and gathered together on public-semweb-lifesci@w3.org have written a book about the semantic web for life sciences.

I haven’t seen the final printed version of this book yet, but if you want to add it to your christmas amazon wishlist, its called Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences (ISBN:0387484361). The table of contents for the book (DOI:10.1007/978-0-387-48438-9) has more details if you are interested.

So what about other readers, what bioinformatics presents (not just books) would they like to find under the Christmas tree this year? If you don’t celebrate Christmas, what Solstice wishes do you have?

(see original post at nodalpoint for comments)

Buggotea: Redundant Links in Connotea

IMGP4570Dear Santa, all I want for Christmas* is a better version of Connotea, please can you sort out it’s duplicated redundant links? In my book this particular bug is “buggotea” number one. Here is the problem… [update: buggotea is partially fixed, see comments from Ian Mulvany at the nodalpoint link in the references below]

There is this handy bioinformatics web application called Connotea which I like to use, built by those nice people in the web team at Nature Publishing Group. Most readers of nodalpoint probably already know about it, but because you’re Santa and you’ve been busy lately, let me explain. Connotea can help scientists (not just bioinformaticians) to organise and share their bibliographic references, whilst discovering what other people with similar interests are reading. It’s good, but it has some bugs in it. Since it’s open-source software, anyone with the time, inclination and skills can get hold of the connotea source code and improve it. There is, however, one particularly nasty redundancy bug in Connotea that is bugging me [1]. I think it should be fixable, and that doing so would make Connotea a significantly better application than it already is. Let’s illustrate this bug with a little story…

(more…)

December 1, 2006

NAR Web Server Issue: Walking in a Webby Wonderland

Filed under: Uncategorized — Duncan Hull @ 3:18 pm
Tags: , , , , , , ,

WonderlandHave you recently built a bioinformatics web application useful to the wider community that you’d like to tell the world about? Are you also looking to score brownie points for a rigourously peer-reviewed publication that stands a reasonable chance of being well cited? If that’s you, then you have one month from today (December 1st) to sort your code out, and get your abstract in, for the fifth annual Nucleic Acids Research (NAR) Web Server issue published by Oxford University Press (OUP) in 2007. All articles in this issue are published under an open access model.

As regular visitors to nodalpoint will already know, every year NAR publishes two special issues: one on databases (annually in January since 1993) and the other on web servers (annually in July since 2003). Authors interested in pre-submitting abstracts for the 2007 Web Server Issue should read the Instructions to Authors for Web Server papers in NAR and send an abstract to Gary Benson at Boston University before December 31st 2006. The deadline for final submission of full articles is January 31st 2007. Gary Benson has taken over this year from previous web server issue editor, Nobel laureate and Ignobel participant, Richard Roberts [1].

One advantage of publishing your application paper in NAR, instead of alternative open access journals like Source Code for Biology and Medicine (SCFBM), is a listing in the bioinformatics links directory [2] and a bigger impact factor [3] of 7.6, if you care about these things. There are of course, disadvantages of publishing with OUP in NAR, like the expensive open access publishing fees of $1185 to $2370 per article which are debateable value-for-money. If you’re living in a ‘List A’ developing country these charges are waived, which makes it tempting to set up a laboratory in Malawi to evade payment…

Anyway, does anyone out there know how OUP prices compare with the complicated Biomed Central membership fees which are presumably required for publication in SCFBM? Another leading open access publisher, the Public Library of Science (PLOS) currently charges from $2000 to $2500 for open access publication. Maybe I’m missing something, but aren’t these charges a lot of money to pay an administrator to shuffle a few bits of paper around and run a web server? Don’t let that put you off submitting your paper though, because in Science and academia you will either publish or perish. This is where the web is your friend because free online web availability substantially increases a paper’s impact.

On a lighter note, and now that the festive season is upon us, I’ll hand over to the Christmas crooner Perry Como to sign off:

♫ Sleigh bells ring, are you listening? In the lane, snow is glistening. A beautiful sight, We’re happy tonight, Walking in a webby wonderland. ♫


Creative Commons License

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.


November 28, 2006

Postdoc Hell: Should I Stay Or Should I Go?

Postdoc Hell? "Please dispose thoughtfully of your used postdocs"Sometimes, being a PostDoctoral researcher is a tough life. Thankfully, help is at hand in Philip Bourne and Iddo Friedberg‘s guide Ten Simple Rules for Selecting a Post-Doctoral Position published in PLOS Computational Biology. This article is part of a series of editorials [1,2,3] which discuss various aspects of the weird and wonderful world of scientific research. They are worth reading if you’re at an early stage of your career, although you may not always agree with all the advice given. For example, the article advises PostDocs to:

Think very carefully before extending your graduate work into a postdoc in the same laboratory where you are now – to some professionals this raises a red flag when they look at your resume. Almost never does it maximise your gain of knowledge and experience, but that can be offset by rapid and important publications.

Do any experienced postdocs (or post-postdocs) out there have an opinions on the importance of moving labs after a PhD? What if you’re already in a great lab and like where you work? To what extent is it important to move, just to get new experience and skills? Or as The Clash once put it [4]:

♫ If I go there will be trouble, if I stay it will be double.
So come on and let me know, should I cool it or should I blow? ♫

References

  1. Phillip Bourne (2006) Ten Simple Rules for Getting Published PLOS Computational Biology
  2. Phillip Bourne and Leo Chalupa (2006) Ten Simple Rules for Getting Grants PLOS Computational Biology
  3. Phillip Bourne and Alon Korngreen (2006) Ten Simple Rules for Reviewers PLOS Computational Biology
  4. Joe Strummer and Mick Jones (1981) Should I stay or should I go?
  5. Jawahar Swaminathan (2006) A ten step plan for PostDoc training nodalpoint.org
  6. this post originally on nodalpoint with comments
  7. Postdoc Hell, a collection of articles describing the plight of the postdoctoral researcher on citeulike


Creative Commons License

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

update: Mysteriously, Nature jobs used the Clash as a theme to their careers supplement, two weeks after this post was published. See How to ask yourself questions about major career decisions and Should I Stay Or Should I Go?. Coincidence? I wonder if they read nodalpoint?

New, Improved SEMANTIC Web: Now with added meaning

Filed under: funny — Duncan Hull @ 5:59 pm
Tags: , , , ,

This amusing picture-parody of the semantic web is worth a thousand words, was conceived of by Mark Butler for a presentation [1] and drawn by Rachel Murphy of Rude Girl Designs.

view photos

References

  1. Mark Butler (2003) Is the semantic web hype? Hewlett Packard laboratories presentation at MMU, 2003-03-12
  2. Tim Berners-Lee (2006) Welcome to the Semantic Web The Economist: The World in 2007
  3. Eric Schmidt (2006) Why the web will win by Eric Schmidt, CEO of Google The Economist: The World in 2007
  4. The Romantic Web: Peter Norvig of Google vs Tim Berners-Lee of the Dubya-3-C
  5. Burn semantic web, burn!

November 7, 2006

People 2.0: Pioneers of the next generation Web

UK news-rag The Grauniad has a series of interviews with some of the people behind the next generation web, so-called Web 2.0. After reading these interviews, I can’t help wondering, who are the equivalent pioneers in bioinformatics?

The interviews include…

  1. Wikipedian Jimmy Wales
  2. WordPresser Matt Mullenweg
  3. Technorati’s Dave Sifry

…and several others too. Most of the interviews are worth reading, I particularly enjoyed Mullenweg’s which contains a wonderful quote:

Q: What is your big idea?

A: I don’t have big ideas. I sometimes have small ideas, which seem to work out.

So who is currently pioneering the “Web of Science”, Bioinformatics 2.0 if you like? Ensemblian Ewan Birney? Ian Holmes at Berkeley? Or somebody else?

[Image credit: Picture from Steve Jurvetson, this post originally published on nodalpoint with comments]

« Previous PageNext Page »

Blog at WordPress.com.