October 31, 2008

Defrosting the Digital Library

Bibliographic Tools for the Next Generation Web

Sunset Ice Sculptures by Mark K.We started writing this paper [1] over a year ago, so it’s great to see it finally published today. Here is the abstract:

“Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as “thought in cold storage,” and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.”

Biotechnology and Biological Sciences Research CouncilThanks to Kevin Emamy, Richard Cameron, Martin Flack, and Ian Mulvany for answering questions on the CiteULike and Connotea mailing lists; and Greg Tyrelle for ongoing discussion about metadata and the semantic Web nodalpoint.org. Also thanks to Timo Hannay and Tim O’Reilly for an invitation to scifoo, where some of the issues described in this publication were discussed. Last but not least, thanks to Douglas Kell and Steve Pettifer for helping me write it and the BBSRC for funding it (grant code BB/E004431/1 REFINE project). We hope it is a useful review, and that you enjoy reading it as much as we enjoyed writing it.


  1. Duncan Hull, Steve Pettifer and Douglas B. Kell (2008). Defrosting the digital library: Bibliographic tools for the next generation web. PLoS Computational Biology, 4(10):e1000204+. DOI:10.1371/journal.pcbi.1000204, pmid:18974831, pmcid:2568856, citeulike:3467077
  2. Also mentioned (in no particular order) by NCESS, Wowter, Twine, Stephen Abram, Rod Page, Digital Koans, Twitter, Bora Zivkovic, Digg, reddit, Library Intelligencer, OpenHelix, Delicious, friendfeed, Dr. Shock, GribbleLab, Nature Blogs, Ben Good, Rafael Sidi, Scholarship 2.0, Subio, up2date, SecondBrain, Hubmed, BusinessExchange, CiteGeist, Connotea and Google

[Sunrise Ice Sculptures picture from Mark K.]


  1. Thank you, Duncan – great review. I’m happy that you included Mendeley in it! May I just point to a minor inaccuracy in your description: By writing that Mendeley can only extract metadata from PDFs where it “is available in an amenable format” and by citing Howison & Goodrum, you seem to imply that Mendeley reads the PDF files’ embedded metadata fields for its automatic document recognition.

    This is not the case: Mendeley doesn’t rely on the embedded metadata fields, since (as Howison & Goodrum point out) they are usually empty. Instead, Mendeley extracts the full text of the document and, using regular expression and Hidden Markov Model algorithms, tries to “guess” the correct metadata based on the layout, formatting, and text.

    It’s true, though, that the recognition quality is much better for journal articles formatted in a certain way (e.g. Elsevier, Kluwer, or Wiley journals) than for others. Improving this is one of our main development priorities in November/December.


    Comment by Victor — October 31, 2008 @ 11:44 am | Reply

  2. Very nice paper, congratulations!

    I especially agree with the bit on publishing models instead of papers. I really think that publishing some kind of expressive model (OWL ontologies???) that already links to other models at publication time is the key. The core publication would be the model, and the related paper supplementary material. So journals would be more like giant knowledge bases, linked to each other. Nice.

    For example, if my paper says that A phosphorilates B, I would publish a little ontology in OntoMedCentral stating that A phosphorilates B, of course importing A and B from uniprot and phosphorilates from the IntAct interactions ontology. In the same ontology, the paper would be codified in the annotations. A second paper says that as result of A phosphorilating B, I get cancer. The author of the second paper need only to link to the model I have just published, and so on.


    Comment by Mikel Egaña — October 31, 2008 @ 3:16 pm | Reply

  3. Congrats on the paper Duncan!

    Very nice, and timely. I will start a series of Bioinformatics and Informatics seminars in the department and the first one is about reference management. I will definitely use your paper and mention it.


    Comment by Paulo Nuin — November 1, 2008 @ 11:29 am | Reply

  4. […] Duncan Hull, Steve R. Pettifer and Douglas B. Kell (2008) wrote an interesting review on the current state of personal digital libraries. It is perhaps important to stress the fact that in the end the review focused on personal digital libraries, where a lot can also be written on digital libraries at higher aggregation levels. But including those digital libraries at higher aggregation levels would take another review. Anyway, many of the observations for building personal digital libraries they describe are right and come straight from the workbench of the practicing systems biologist. But still some additional observations could have been addressed in this review as well. […]

    Pingback by Defrosting the digital library at WoW! Wouter on the Web — December 27, 2008 @ 1:23 pm | Reply

  5. This is a great review. I feel like we’ve come a long way since 2008 though. Many of these tools reviewed here have advanced and now offer an array of new features.

    As a researcher, I’ve personally gone digital for my collection of research articles. I have a iMAC in my research laboratory and use Papers to classify my documents and I’ve been using PDF Stacks program to manage my research papers and references on my PC at home.

    There are many good resources available though and glad things are moving towards digital. Save the environment 🙂

    Comment by Danielm Mcgravey — November 27, 2010 @ 9:49 pm | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: