A quick search on pubmed.gov today reveals that the freely available American database of biomedical literature has just passed the 20 million citations mark*. Should we celebrate or commiserate passing this landmark figure? Is it a triumph or a tragedy that PubMed® is the size it is? (more…)
July 27, 2010
February 12, 2010
February 5, 2010
January 15, 2010
Bio2RDF: Large Scale, Distributed Biological Knowledge Discovery
Michel Dumontier was visiting the EBI this week, here’s the details of his seminar Bio2RDF and Beyond! Large Scale, Distributed Biological Knowledge Discovery (slides embedded below) for anyone interested who missed it:
Abstract: The Bio2RDF.org [1] project aims to transform silos of bioinformatics data into a distributed platform for biological knowledge discovery. Initial work focused on building a public database of open-linked data with web-resolvable identifiers that provides information about named entities. This involved a syntactic normalization to convert open data represented in a variety of formats (flatfile, tab, xml, web services) to RDF-based linked data with normalized names (HTTP URIs) and basic typing from source databases. Bio2RDF entities also make reference to other open linked data networks (e.g. dbPedia) thus facilitating traversal across information spaces. However, a significant problem arises when attempting to undertake more sophisticated knowledge discovery approaches such as question answering or symbolic data mining. This is because knowledge is represented in a fundamentally different manner, requiring one to know the underlying data model and reconcile the artefactual differences when they arise. In this talk, we describe our data integration strategy that makes use of both syntactic and semantic normalization to consistently marshal knowledge to a common data model while leveraging explicit logic-based mappings with community ontologies to further enhance the biological knowledgescope. Coupled with the web-service based Semantic Automated Discovery and Integration (SADI) framework, Bio2RDF is well placed to serve up biological data for prediction and analysis.
Some quick notes: Bio2RDF is currently indexing around 5 billion triples, and is built with the open source Virtuoso database. There are some scalability issues in making the system cope with up to a total of 15+ billion triples currently required. There is nothing in Bio2RDF yet that deals with the redundancy problem, e.g. “buggotea” and its friends.
References
- Belleau, F., Nolin, M., Tourigny, N., Rigault, P., & Morissette, J. (2008). Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Journal of Biomedical Informatics, 41 (5), 706-716 DOI: 10.1016/j.jbi.2008.03.004
December 11, 2009
The Semantic Biochemical Journal experiment
There is an interesting review [1] (and special issue) in the Biochemical Journal today, published by Portland Press Ltd. It provides (quote) “a whirlwind tour of recent projects to transform scholarly publishing paradigms, culminating in Utopia and the Semantic Biochemical Journal experiment”. Here is a quick outline of the publishing projects the review describes and discusses:
- Blogs for biomedical science
- Biomedical Ontologies – OBO etc
- Project Prospect and the Royal Society of Chemistry
- The Chemspider Journal of Chemistry
- The FEBS Letters experiment
- PubMedCentral and BioLit [2]
- Public Library of Science (PLoS) Neglected Tropical Diseases (NTD) [3]
- The Elsevier Grand Challenge [4]
- Liquid Publications
- The PDF debate: Is PDF a hamburger? Or can we build more useful applications on top of it?
- The Semantic Biochemical Journal project with Utopia Documents [5]
The review asks what advances these projects have made and what obstacles to progress still exist. It’s an entertaining tour, dotted with enlightening observations on what is broken in scientific publishing and some of the solutions involving various kinds of semantics.
One conclusion made is that many of the experiments described above are expensive and difficult, but that the costs of not improving scientific publishing with various kinds of semantic markup is high, or as the authors put it:
“If the cost of semantic publishing seems high, then we also need to ask, what is the price of not doing it? From the results of the experiments we have seen to date, there is clearly a need to move forward and still a great deal of scope to innovate. If we fail to move forward in a collaborative way, if we fail to engage the key players, the price will be high. We will continue to bury scientific knowledge, as we routinely do now, in static, unconnected journal articles; to sequester fragments of that knowledge in disparate databases that are largely inaccessible from journal pages; to further waste countless hours of scientists’ time either repeating experiments they didn’t know had been performed before, or worse, trying to verify facts they didn’t know had been shown to be false. In short, we will continue to fail to get the most from our literature, we will continue to fail to know what we know, and will continue to do science a considerable disservice.”
It’s well worth reading the review, and downloading the Utopia software to experience all of the interactive features demonstrated in this special issue, especially the animated molecular viewers and sequence alignments.
Enjoy… the Utopia team would be interested to know what people think, see commentary on friendfeed, the digital curation blog and youtube video below for more information.
References
- Attwood, T., Kell, D., McDermott, P., Marsh, J., Pettifer, S., & Thorne, D. (2009). Calling International Rescue: knowledge lost in literature and data landslide! Biochemical Journal, 424 (3), 317-333 DOI: 10.1042/BJ20091474
- Fink, J., Kushch, S., Williams, P., & Bourne, P. (2008). BioLit: integrating biological literature with databases Nucleic Acids Research, 36 (Web Server) DOI: 10.1093/nar/gkn317
- Shotton, D., Portwin, K., Klyne, G., & Miles, A. (2009). Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article PLoS Computational Biology, 5 (4) DOI: 10.1371/journal.pcbi.1000361
- Pafilis, E., O’Donoghue, S., Jensen, L., Horn, H., Kuhn, M., Brown, N., & Schneider, R. (2009). Reflect: augmented browsing for the life scientist Nature Biotechnology, 27 (6), 508-510 DOI: 10.1038/nbt0609-508
- Pettifer, S., Thorne, D., McDermott, P., Marsh, J., Villéger, A., Kell, D., & Attwood, T. (2009). Visualising biological data: a semantic approach to tool and database integration BMC Bioinformatics, 10 (Suppl 6) DOI: 10.1186/1471-2105-10-S6-S19
November 24, 2009
September 4, 2009
XML training in Oxford
The XML Summer School returns this year at St. Edmund Hall, Oxford from 20th-25th September 2009. As always, it’s packed with high quality technical training for every level of expertise, from the Hands-on Introduction for beginners through to special classes devoted to XQuery and XSLT, Semantic Technologies, Open Source Applications, Web 2.0, Web Services and Identity. The Summer School is also a rare opportunity to experience what life is like as a student in one of the world’s oldest university cities while enjoying a range of social events that are a part of the unique summer school experience.
This year, classes and sessions are taught and chaired by:
- Tony Coates, Londata Ltd., blogs at kontrawize
- John Chelsom, City University and Eleven Informatics LLP.
- Neil Cowles, Tolven Inc.
- Leigh Dodds, Talis Information Ltd., blogs at Lost Boy.
- Paul Downey, Osmosoft (Open Source applications from British Telecom) blogs at whatfettle
- Bob DuCharme, TopQuadrant Inc., blogs at snee.com
- Peter Flynn, blogs at silmaril.ie
- Marc Hadley, Sun Microsystems, blogs at java.net
- Duncan Hull, yours truly, blogs here.
- Michael Kay, Saxonica Ltd., home of the Saxon XSLT and XQuery Processor blogs at blogharbor
- Debbie Lapeyre, Mulberry Technologies,
- Eve Maler, PayPal Inc., blogs at Pushing String.
- Simon Phipps, Sun Microsystems Inc., blogs at sun.com
- Adam Retter, blogs at adamretter.org.uk
- Rich Salz, IBM, blogs at developerWorks
- Andy Seaborne, Hewlett-Packard laboratories, blogs at ARQtick
- Michael Sperberg-McQueen, Black Mesa Technologies LLC., blogs at Messages in a Bottle
- Ron Summers, Loughborough University
- Jeni Tennison, Jeni Tennison Consulting Ltd., blogs at jenitennison.com
- Norm Walsh, Mark Logic, blogs at norman.walsh.name
- Priscilla Walmsley, Datypic consulting
- Lauren Wood blogs at laurenwood.org
The Extensible Markup Language (XML) has been around for just over ten years, quickly and quietly finding its niche in many different areas of science and technology. It has been used in everything from modelling biochemical networks in systems biology [1], to electronic health records [2], scientific publishing, the provision of the PubMed service (which talks XML) [3] and many other areas. As a crude measure of its importance in biomedical science, PubMed currently has no fewer than 800 peer-reviewed publications on XML. It’s hard to imagine life without it. So whether you’re a complete novice looking to learn more about XML or a seasoned veteran wanting to improve your knowledge, register your place and find out more by visiting xmlsummerschool.com. I hope to see you there…
References
- Hucka, M. (2003). The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models Bioinformatics, 19 (4), 524-531 DOI: 10.1093/bioinformatics/btg015
- Bunduchi R, Williams R, Graham I, & Smart A (2006). XML-based clinical data standardisation in the National Health Service Scotland. Informatics in primary care, 14 (4) PMID: 17504574
- Sayers, E., Barrett, T., Benson, D., Bryant, S., Canese, K., Chetvernin, V., Church, D., DiCuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L., Helmberg, W., Kapustin, Y., Landsman, D., Lipman, D., Madden, T., Maglott, D., Miller, V., Mizrachi, I., Ostell, J., Pruitt, K., Schuler, G., Sequeira, E., Sherry, S., Shumway, M., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusova, T., Wagner, L., Yaschenko, E., & Ye, J. (2009). Database resources of the National Center for Biotechnology Information Nucleic Acids Research, 37 (Database) DOI: 10.1093/nar/gkn741
June 4, 2009
June 1, 2009
Scott Marshall on Interoperability
Scott Marshall is visiting Manchester this week, he will be doing a seminar on Friday 5th June, here are some details for anyone who is interested in attending:
Speaker: Dr. M. Scott Marshall, The University of Amsterdam
Date/Time: 5th June 2009, 11:00
Location: Room MLG.001 (Lecture Theatre), MIB building, (number 16 on campus map)
Title: Standards Enabled Interoperability: W3C Semantic Web for Health Care and Life Sciences Interest Group
Abstract: The W3C Semantic Web for Health Care and Life Sciences Interest Group (HCLS) has the mission of developing, advocating for, and supporting the use of Semantic Web technologies for biological science, translational medicine and health care. HCLS covers hot topics including data integration and federation, bridging commonly used domain standards such as CDISC and HL7, and the applications of medical terminologies. This talk will introduce the HCLS, as well as provide an overview of the activities that are currently ongoing within the task forces, as well as new developments and the recent Face2Face meeting. The role of information extraction and the current interest in Shared Identifiers will also be discussed.
References
- Ruttenberg, A., Rees, J., Samwald, M., & Marshall, M. (2009). Life sciences on the Semantic Web: the Neurocommons and beyond Briefings in Bioinformatics, 10 (2), 193-204 DOI: 10.1093/bib/bbp004
May 13, 2009
XML Summer School, Oxford
After a brief absence, it is good to see the XML Summer School is back again this September (20th-25th) at St. Edmund Hall, Oxford. This is “a unique event for everyone using, designing or implementing solutions using XML and related technologies.” I’ve been both a delegate and a speaker here over the years; back in 2005, with Nick Drummond we presented the Protégé and OWL tutorial which was good fun. So here is what I.M.H.O. makes the XML summer school worth a look: (more…)



