O'Really?

January 15, 2010

Bio2RDF: Large Scale, Distributed Biological Knowledge Discovery

Filed under: ChEBI — Duncan Hull @ 2:11 pm
Tags: , , , , , , ,

Bio2RDFMichel Dumontier was visiting the EBI this week, here’s the details of his seminar Bio2RDF and Beyond! Large Scale, Distributed Biological Knowledge Discovery (slides embedded below) for anyone interested who missed it:

Abstract: The Bio2RDF.org [1] project aims to transform silos of bioinformatics data into a distributed platform for biological knowledge discovery. Initial work focused on building a public database of open-linked data with web-resolvable identifiers that provides information about named entities. This involved a syntactic normalization to convert open data represented in a variety of formats (flatfile, tab, xml, web services) to RDF-based linked data with normalized names (HTTP URIs) and basic typing from source databases. Bio2RDF entities also make reference to other open linked data networks (e.g. dbPedia) thus facilitating traversal across information spaces. However, a significant problem arises when attempting to undertake more sophisticated knowledge discovery approaches such as question answering or symbolic data mining. This is because knowledge is represented in a fundamentally different manner, requiring one to know the underlying data model and reconcile the artefactual differences when they arise. In this talk, we describe our data integration strategy that makes use of both syntactic and semantic normalization to consistently marshal knowledge to a common data model while leveraging explicit logic-based mappings with community ontologies to further enhance the biological knowledgescope. Coupled with the web-service based Semantic Automated Discovery and Integration (SADI) framework, Bio2RDF is well placed to serve up biological data for prediction and analysis.

Some quick notes: Bio2RDF is currently indexing around 5 billion triples, and is built with the open source Virtuoso database. There are some scalability issues in making the system cope with up to a total of 15+ billion triples currently required. There is nothing in Bio2RDF yet that deals with the redundancy problem, e.g. “buggotea” and its friends.

References

  1. Belleau, F., Nolin, M., Tourigny, N., Rigault, P., & Morissette, J. (2008). Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Journal of Biomedical Informatics, 41 (5), 706-716 DOI: 10.1016/j.jbi.2008.03.004

June 1, 2009

Scott Marshall on Interoperability

M. Scott MarshallScott Marshall is visiting Manchester this week, he will be doing a seminar on Friday 5th June, here are some details for anyone who is interested in attending:

Speaker: Dr. M. Scott Marshall, The University of Amsterdam

Date/Time: 5th June 2009, 11:00

Location: Room MLG.001 (Lecture Theatre), MIB building, (number 16 on campus map)

Title: Standards Enabled Interoperability: W3C Semantic Web for Health Care and Life Sciences Interest Group

Abstract: The W3C Semantic Web for Health Care and Life Sciences Interest Group (HCLS) has the mission of developing, advocating for, and supporting the use of Semantic Web technologies for biological science, translational medicine and health care. HCLS covers hot topics including data integration and federation, bridging commonly used domain standards such as CDISC and HL7, and the applications of medical terminologies. This talk will introduce the HCLS, as well as provide an overview of the activities that are currently ongoing within the task forces, as well as new developments and the recent Face2Face meeting. The role of information extraction and the current interest in Shared Identifiers will also be discussed.

References

  1. Ruttenberg, A., Rees, J., Samwald, M., & Marshall, M. (2009). Life sciences on the Semantic Web: the Neurocommons and beyond Briefings in Bioinformatics, 10 (2), 193-204 DOI: 10.1093/bib/bbp004

Blog at WordPress.com.