April 17, 2009

The Unreasonable Effectiveness of Google

GoogleVia the Official Google Research Blog at the University of Google, Alon Halevy, Peter Norvig and Fernando Pereira have published an interesting expert opinion piece in the  March/April 2009 edition of IEEE Intelligent Systems: computer.org/intelligent. The paper talks about embracing complexity and making use of the “the unreasonable effectiveness of data” [1] drawing analogies with the “unreasonable effectiveness of mathematics” [2]. There is plenty to agree and disagree with in this provocative article which makes it an entertaining read. So what can we learn from those expert Googlers in the Googleplex? (more…)

March 16, 2009

March 12, 2009

Defrosting the Digital Seminar

The Lecture by James M ThorneCasey Bergman suggested it, Jean-Marc Schwartz organised it, so now I’m going to do it: a seminar on our Defrosting the Digital Library paper as part of the Bioinformatics and Functional Genomics seminar series. Here is the abstract of the talk:

After centuries with little change, scientific libraries have recently experienced massive upheaval. From being almost entirely paper-based, most libraries are now almost completely digital. This information revolution has all happened in less than 20 years and has created many novel opportunities and threats for scientists, publishers and libraries.

Today, we are struggling with an embarrassing wealth of digital knowledge on the Web. Most scientists access this knowledge through some kind of digital library, however these places can be cold, impersonal, isolated, and inaccessible places. Many libraries are still clinging to obsolete models of identity, attribution, contribution, citation and publication.

Based on a review published in PLoS Computational Biology, http://pubmed.gov/18974831 this talk will discuss the current chilly state of digital libraries for biologists, chemists and informaticians, including PubMed and Google Scholar. We highlight problems and solutions to the coupling and decoupling of publication data and metadata, with a tool called http://www.citeulike.org. This software tool exploits the Web to make digital libraries “warmer”: more personal, sociable, integrated, and accessible places.

Finally issues that will help or hinder the continued warming of libraries in the future, particularly the accurate identity of authors and their publications, are briefly introduced. These are discussed in the context of the BBSRC funded REFINE project, at the National Centre for Text Mining (NaCTeM.ac.uk), which is linking biochemical pathway data with evidence for pathways from the PubMed database.

Date: Monday 16th March 2008, Time: 12.00 midday, Location: Michael Smith Building, Main lecture theatre, Faculty of Life Sciences, University of Manchester (number 71 on google map of the Manchester campus). Please come along if you are interested…

[CC licensed picture above, “The Lecture” at Speakers Corner by James M Thorne]

October 31, 2008

Defrosting the Digital Library

Bibliographic Tools for the Next Generation Web

Sunset Ice Sculptures by Mark K.We started writing this paper [1] over a year ago, so it’s great to see it finally published today. Here is the abstract:

“Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as “thought in cold storage,” and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.”

Biotechnology and Biological Sciences Research CouncilThanks to Kevin Emamy, Richard Cameron, Martin Flack, and Ian Mulvany for answering questions on the CiteULike and Connotea mailing lists; and Greg Tyrelle for ongoing discussion about metadata and the semantic Web nodalpoint.org. Also thanks to Timo Hannay and Tim O’Reilly for an invitation to scifoo, where some of the issues described in this publication were discussed. Last but not least, thanks to Douglas Kell and Steve Pettifer for helping me write it and the BBSRC for funding it (grant code BB/E004431/1 REFINE project). We hope it is a useful review, and that you enjoy reading it as much as we enjoyed writing it.


  1. Duncan Hull, Steve Pettifer and Douglas B. Kell (2008). Defrosting the digital library: Bibliographic tools for the next generation web. PLoS Computational Biology, 4(10):e1000204+. DOI:10.1371/journal.pcbi.1000204, pmid:18974831, pmcid:2568856, citeulike:3467077
  2. Also mentioned (in no particular order) by NCESS, Wowter, Twine, Stephen Abram, Rod Page, Digital Koans, Twitter, Bora Zivkovic, Digg, reddit, Library Intelligencer, OpenHelix, Delicious, friendfeed, Dr. Shock, GribbleLab, Nature Blogs, Ben Good, Rafael Sidi, Scholarship 2.0, Subio, up2date, SecondBrain, Hubmed, BusinessExchange, CiteGeist, Connotea and Google

[Sunrise Ice Sculptures picture from Mark K.]

June 20, 2008

A Brief Review of RefWorks

Philosophiæ Naturalis Principia MathematicaThere is no shortage of bibliographic management tools out there, which ultimately aim to save your time managing the papers and books in your personal library. I’ve just been to a demo and sales pitch for one of them, a tool called RefWorks. Refworks claims to be “an online research management, writing and collaboration tool — designed to help researchers easily gather, manage, store and share all types of information, as well as generate citations and bibliographies”. It looks like a pretty good tool, similar to the likes of EndNote but with more web-based features that are common with Citeulike and Connotea. Here are some ultra-brief notes. RefWorks in five minutes, the good, the bad and the ugly.

The Good…

Refworks finer features

  • Refworks is web based, you can use it from any computer with an internet connection, without having to install any software. Platform independent, Mac, Windows, Linux, Blackberry, iPhone, Woteva. This feature is becoming increasingly common, see Martin Fenner’s Online reference managers, not quite there yet article at Nature Network.
  • Share selected references and bibliographies on the Web via RefShare
  • It imports and exports all the things you would expect, Endnote (definitely), XML, Feeds (RSS), flat files, BibTeX (check?), RIS (check?) and several others via the screenscraping tool RefGrab-It
  • Interfaces with PubMed and Scopus (and many other databases) closely, e.g. you can search these directly from your RefWorks library. You can also export from Scopus to Refworks…
  • Not part of the Reed-Elsevier global empire (yet), currently part of ProQuest, based in California.
  • Free 30 day trial is available
  • Just like EndNote, it can be closely integrated with Microsoft Word, to cite-while-you-write


October 27, 2006


MedieMEDIE is an “intelligent” semantic search engine that retrieves biomedical correlations from over 14 million articles in MEDLINE. You can find abstracts and sentences in MEDLINE by specifying the semantics of correlations; for example, What activates tumour suppressor protein p53? So just how useful is MEDIE and is it at the cutting edge?

At the Manchester Interdisciplinary Biocentre (MIB) launch yesterday, Professor Jun’ichi Tsujii gave a presentation on Linking text with knowledge – challenges for Text Mining in Biology. As part of this presentation he gave a demonstration of Medie: an intelligent search engine for Medline. This tool looks quite impressive if you experiment with some sample queries. I wonder what nodalpointers, especially hardened text-miners, natural language processing (NLP) nerds and computational linguists, make of Medie?

[This post was originally published on nodalpoint, with comments]

