December 3, 2009

It’s Snowing (JavaScript)!

You know it’s December when it starts snowing in your web browser. Let it snow, let it snow, let it snow!

Or programmatically:

snowStorm = new SnowStorm();

There was a time, not so very long ago when JavaScript snow would have been “best viewed in browser x”. Thankfully now JavaScript much more reliable, the JBrowse [1] Genome Browser provides a nice example of this in bioinformatics. JBrowse is one of many proofs that JavaScript can be used to take some of the computing load off the server, and do it in the client (web browser) instead, while providing more sophisticated applications for users – not just gimmicks like snow.


  1. Skinner, M., Uzilov, A., Stein, L., Mungall, C., & Holmes, I. (2009). JBrowse: A next-generation genome browser Genome Research, 19 (9), 1630-1638 DOI: 10.1101/gr.094607.109

[Creative Commons licensed snowstorm picture by Atli Harðarson, JavaScript SnowStorm code by Scott Schiller, move your mouse around to guide the snowstorm.]

June 15, 2009

Andrea Wiggins on little e-Science

Andrea WigginsAndrea Wiggins [1,2] from Syracuse University, New York is visiting Manchester this week and will be doing a seminar on “Little e-Science“, the details of which are below.

Date, time: 12 – 2pm on Thursday 18th June

Location: Atlas 1&2, Kilburn building

Title: Little eScience

Abstract: An interdisciplinary community of researchers has started to coalesce around the study of free/libre open source software (FLOSS) development. The research community is in many ways a reflection of the phenomenon of FLOSS practices in both social and technological respects, as many share the open source community’s values that support transparency and democratic participation. As community ties develop, new collaborations have spurred the creation of shared research resources: several repositories provide access to curated research-ready data, working paper repositories provide a means for disseminating early results, and a variety of analysis scripts and workflows connecting the data sets and literature are freely available. Despite these apparently favorable conditions for research collaboration, adoption of the tools and practices associated with eResearch has been slow as yet.

The key issues observed to date seem to stem from the challenges of pre-paradigmatic little science research. Researchers from software engineering, information systems, and even anthropology may examine the same construct, such as FLOSS project success, but will likely proceed from different epistemologies, utilize different data sources, identify different independent variables with varying operationalizations, and employ different research methodologies. In the decentralized and phenomenologically-driven FLOSS research community, creating and maintaining cyberinfrastructure [3] is a substantial effort for a small number of participants. In the little sciences, achieving critical mass of participation may be the most significant factor in creating a viable community of practice around eScience methods.

Update Slides are embedded below:


  1. Andrea Wiggins (2009) Social Life of Information: We Are Who We Link Andrea’s blog
  2. Andrea Wiggins, James Howison, & Kevin Crowston (2008). Social dynamics of FLOSS team communication across channels Open Source Development, Communities and Quality
  3. Lincoln Stein (2008). Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges Nature Reviews Genetics, 9 (9), 678-688 DOI: 10.1038/nrg2414

June 2, 2009

Michael Ley on Digital Bibliographies

Michael Ley

Michael Ley is visiting Manchester this week, he will be doing a seminar on Wednesday 3rd June, here are some details for anyone who is interested in attending:

Date: 3rd Jun 2009

Title: DBLP: How the data get in

Speaker: Dr Michael Ley. University of Trier, Germany

Time & Location: 14:15, Lecture Theatre 1.4, Kilburn Building

Abstract: The DBLP (Digital Bibliography & Library Project) Computer Science Bibliography now includes more than 1.2 million bibliographic records. For Computer Science researchers the DBLP web site now is a popular tool to trace the work of colleagues and to retrieve bibliographic details when composing the lists of references for new papers. Ranking and profiling of persons, institutions, journals, or conferences is another usage of DBLP. Many scientists are aware of this and want their publications being listed as complete as possible.

The talk focuses on the data acquisition workflow for DBLP. To get ‘clean’ basic bibliographic information for scientific publications remains a chaotic puzzle.

Large publishers are either not interested to cooperate with open services like DBLP, or their policy is very inconsistent. In most cases they are not able or not willing to deliver basic data required for DBLP in a direct way, but they encourage us to crawl their Web sites. This indirection has two main problems:

  1. The organisation and appearance of Web sites changes from time to time, this forces a reimplementation of information extraction scripts. [1]
  2. In many cases manual steps are necessary to get ‘complete’ bibliographic information.

For many small information sources it is not worthwhile to develop information extraction scripts. Data acquisition is done manually. There is an amazing variety of small but interesting journals, conferences and workshops in Computer Science which are not under the umbrella of ACM, IEEE, Springer, Elsevier etc. How they get it often is decided very pragmatically.

The goal of the talk and my visit to Manchester is to start a discussion process: The EasyChair conference management system developed by Andrei Voronkov and DBLP are parts of scientific publication workflow. They should be connected for mutual benefit?


  1. Lincoln Stein (2002). Creating a bioinformatics nation: screen scraping is torture Nature, 417 (6885), 119-120 DOI: 10.1038/417119a

August 6, 2007

Scifoo day three: Genome Voyeurism with Lincoln Stein

On day three of Science Foo Camp (scifoo) biologist Lincoln Stein (picture right) gave a presenation on what he calls “genome voyeurism”, using Jim Watsons genome as an example. This session demonsrated the current and future possibilities of individuals having their own DNA sequenced, what has been called “personal genomics“.

Unlike the session on genomics yesterday on day two, where George Church, Eric Lander, 23andme, Sergey and Larry (and even Sergey’s pet dog) are all present, today they are conspicuously absent.

Lincolns presentation starts with a video (see youtube video below) of Jim Watson receiving his genome on a disk from Baylor College of Medicine, Houston. Lincoln tells how Jim puts his genome (stored on a hard drive) next to his Nobel prize medallion in his office. After all the press publicity, Jim deposits the data in GenBank, and it becomes available worldwide. (more…)

January 5, 2007

NAR Database Issue 2007: Not Waving But Drowning?

The 14th annual Nucleic Acids Research (NAR) database issue 2007 has just been published, open-access. This year is the largest yet (again) with 968 molecular biology databases listed, 110 more than the previous one (see figure below). In the world of biological databases, are we waving or drowning?

NAR Database Growth 2007

Nine hundred and sixty eight is a lot of databases, and even that mind-boggling number is not an exhaustive or comprehensive tally. But is counting all these databases waving or drowning [1]? Will we ever stop stamp-collecting the databases and tools we have in molecular biology? What prompted this is, an employee of the The Boeing Company once told me they have given up counting their databases because there were just too many. Just think of all the databases of design and technical documentation that accompanies the myriad of different aircraft that Boeing manufacture, like the iconic 747 jumbo jet. Now, combine that with all the supply chain, customer and employee information and you can begin to imagine the data deluge that a large multi-national corporation has to handle.

Like Boeing, in Biology we’ve clearly got more data than we know what to do with [2,3]. It won’t be news to bioinformaticians and its been said many times before but its worth repeating again here:

  • We know how many databases we have but we don’t know what a lot of the data in these databases means, think of all those mystery proteins of unknown function. It will obviously take time until we understand it all…
  • Most of the data only begins to make sense when it is integrated or mashed-up with other data. However, we still don’t know how to integrate all these databases, or as Lincoln Stein puts it “so far their integration has proved problematic” [4], a bit of an understatement. Many grandiose schemes for the “integration” of biological databases have been proposed over the years, but unfortunately none have been practical to the point of implementation [5]

Despite this, it is still useful to know how many molecular biology databases there are. At least we know how many databases we are drowning in. Thankfully, unlike Boeing, most biological data, algorithms and tools are open-source and more literature is becoming open access which will hopefully make progress more rapid. But biology is more complicated than a Boeing 747, so we’ve got a long-haul flight ahead of us. OK, I’ve managed to completely overstretch that aerospace analogy now so I’ll stop there.

Whatever databases you’ll be using in 2007, have a Happy New Year mining, exploring and understanding the data they contain, not drowning in it.


  1. Stevie Smith (1957) Not waving but drowning
  2. Michael Galperin (2007) The Molecular Biology Database Collection: 2007 update Nucleic Acids Research, Vol. 35, Database issue. DOI:10.1093/nar/gkl1008
  3. Alex Bateman (2007) Editorial: What makes a good database? Nucleic Acids Research, Vol. 35, Database issue. DOI:10.1093/nar/gkl1051
  4. Lincoln Stein (2003) Biological Database Integration Nature Reviews Genetics. 4 (5), 337-45. DOI:10.1038/nrg1065
  5. Michael Ashburner (2006) Keynote at the Pacific Symposium on Biocomputing (PSB2006) in Hawaii seeAlso Aloha: Biocomputing in Hawaii
  6. This post originally published on nodalpoint with comments

Creative Commons License
This work is licensed under a

Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

November 1, 2006

Bioinformatics Impact Factors

B of the Bang (in Big Bangchester)There are all sorts of flaws with using impact factors for judging the quality of biomedical research. Love them or hate them, just getting hold of impact factors for journals in bioinformatics and related fields is much harder than it should be, so I thought I’d reproduce some statistics I gathered here. The rankings, which you should use with caution [1,2], are correct as of June 2006 (and apply to citations in 2005) courtesy of Journal Citation Reports®, part of Thomson ISI Web of Knowledge. JCR has a pretty horrible clunky web interface when compared to some of its rivals [3,4], maybe one day they’ll make it better. Anyway, this is not a comprehensive list, just a fairly random selection of bioinformatics and computer science journals that publish articles I’ve been reading the last few years.

Journal ISI impact factor
Science 30.927
Cell 29.431
Nature Reviews Molecular Cell Biology 29.852
Nature 29.273
Nature Genetics 25.797
Nature Biotechnology 22.378
Nature Reviews Drug Discovery 18.775
PLOS Biology 14.672
PNAS 10.231
Genome Research 10.139
Genome Biology 9.712
Drug Discovery Today 7.755
Nucleic Acids Research 7.552
Bioessays 6.787
Plant Physiology 6.114
Bioinformatics (OUP) 6.019
BMC Bioinformatics 4.958
BMC Genomics 4.092
Proteins: structure, function and bioinformatics 4.684
IEEE Intelligent Systems 2.560
Journal of Computational Biology 2.446
Journal of Biomedical Informatics 2.388
IEEE Internet Computing 2.304
Artificial Intelligence in Medicine 1.882
Comparative and Functional Genomics 0.992
Concurrency and Computation: Practice and experience 0.535
Briefings in Bioinformatics (OUP) not listed
PLOS Computational Biology not listed
Journal of Web Semantics not listed

One point of interest, cheeky young upstart BioMed Central Bioinformatics (going since 2000) seems to be catching up on traditional old-school favourite OUP Bioinformatics (going since 1985), which as mentioned on nodalpoint, has been publishing some dodgy parser papers lately.

Blog at WordPress.com.