nactem | O'Really?

July 6, 2009

Fabio Rinaldi on OntoGene

Filed under: Uncategorized — Duncan Hull @ 8:26 am
Tags: biocreative, Fabio Rinaldi, nactem, OntoGene

Fabio Rinaldi is currently visiting Manchester from the University of Zurich, he will be doing a seminar on Monday 6th July, the details of which are below.

Title : OntoGene in the BioNLP shared task and in BioCreative II.5

Speaker: Dr Fabio Rinaldi, University of Zurich

Date: Monday 6th July 2009

Time: 14:00

Location: Lecture Theatre – MLG.001, MIB building

Abstract In this talk I will describe our participation to the BioNLP shared task and the BioCreative II.5 competitions [1]. Our approach is based on a common core: a pipeline of NLP tools and a dependency parser. The adaptation for the BioNLP shared task consisted of suitable input filters and a transformation-based approach which maps syntactic dependencies to event structures. Despite the very simple approach, results were satisfactory (34.78 F-score). The adaptation for BioCreative requires the detection and disambiguation of domain entities, while candidate interactions are proposed on the basis of a simple learning approach.

If time allows I will then describe our approach to finding the ‘focus organisms’ i.e. the organisms in which the experiments have been conducted or which are the source of the interacting proteins. This information is of crucial importance for the correct disambiguation of other entities mentioned in the article.

References

Rinaldi, F., Kappeler, T., Kaljurand, K., Schneider, G., Klenner, M., Clematide, S., Hess, M., von Allmen, J., Parisot, P., Romacker, M., & Vachon, T. (2008). OntoGene in BioCreative II Genome Biology, 9 (Suppl 2) DOI: 10.1186/gb-2008-9-s2-s13

Leave a Comment

June 10, 2009

Kenjiro Taura on Parallel Workflows

Filed under: informatics,seminars — Duncan Hull @ 7:24 am
Tags: bioinformatics, dmake, dsh, EC2, enju, falkon, Globus, gluepy, GXP, gxp make, Kenjiro Taura, make, makefile, MEDIE, Medline, nactem, NLP, pdsh, pubmed, qmake, ssh, taktuk, University of Tokyo, unixish, workflow

Kenjiro Taura is visting Manchester next week from the Department of Information and Communication Engineering at the University of Tokyo. He will be doing a seminar, the details of which are below:

Title: Large scale text processing made simple by GXP make: A Unixish way to parallel workflow processing

Date-time: Monday, 15 June 2009 at 11:00 AM

Location: Room MLG.001, mib.ac.uk

In the first part of this talk, I will introduce a simple tool called GXP make. GXP is a general purpose parallel shell (a process launcher) for multicore machines, unmanaged clusters accessed via SSH, clusters or supercomputers managed by batch scheduler, distributed machines, or any mixture thereof. GXP make is a ‘make‘ execution engine that executes regular UNIX makefiles in parallel. Make, though typically used for software builds, is in fact a general framework to concisely describe workflows constituting sequential commands. Installation of GXP requires no root privileges and needs to be done only on the user’s home machine. GXP make easily scales to more than 1,000 CPU cores. The net result is that GXP make allows an easy migration of workflows from serial environments to clusters and to distributed environments. In the second part, I will talk about our experiences on running a complex text processing workflow developed by Natural Language Processing (NLP) experts. It is an entire workflow that processes MEDLINE abstracts with deep NLP tools (e.g., Enju parser [1]) to generate search indices of MEDIE, a semantic retrieval engine for MEDLINE. It was originally described in Makefile without a particular provision to parallel processing, yet GXP make was able to run it on clusters with almost no changes to the original Makefile. Time for processing abstracts published in a single day was reduced from approximately eight hours (with a single machine) to twenty minutes with a trivial amount of efforts. A larger scale experiment of processing all abstracts published so far and remaining challenges will also be presented.

References

Miyao, Y., Sagae, K., Saetre, R., Matsuzaki, T., & Tsujii, J. (2008). Evaluating contributions of natural language parsers to protein-protein interaction extraction Bioinformatics, 25 (3), 394-400 DOI: 10.1093/bioinformatics/btn631

Leave a Comment

June 4, 2009

Improving the OBO Foundry Principles

Filed under: biocuration,data mining,informatics,semweb — Duncan Hull @ 1:48 pm
Tags: Alan Ruttenberg, Allyson Lister, Barry Smith, bbsrc, Bioportal, ChEBI, Chris Mungall, ebi, Frank Gibson, frolleague, Gene Ontology, Mark Musen, Melanie Courtot, Michael Ashburner, Michel Dumontier, nactem, OBO, OBO Foundry, OBO Smithy, OBO Workshop, obology, old smithy, ontology, ontolojoke, owl, principles, pubmed, REFINE, Richard Scheuermann, sbml, Suzi Lewis, ten commandments, workshop

The Open Biomedical Ontologies (OBO) are a set of reference ontologies for describing all kinds of biomedical data, see [1-5] for examples. Every year, users and developers of these ontologies gather from around the globe for a workshop at the EBI near Cambridge, UK. Following on from the first workshop last year, the 2nd OBO workshop 2009 is fast approaching.

In preparation, I’ve been revisiting the OBO Foundry documentation, part of which establishes a set of principles for ontology development. I’m wondering how they could be improved because these principles are fundamental to the whole effort. We’ve been using one of the OBO ontologies (called Chemical Entities of Biological Interest (ChEBI)) in the REFINE project to mine data from the PubMed database. OBO Ontologies like ChEBI and the Gene Ontology are really crucial to making sense of the massive data which are now common in biology and medicine – so this is stuff that matters.

The OBO Foundry Principles, a sort of Ten Commandments of Ontology (or Obology if you prefer) currently look something like this (copied directly from obofoundry.org/crit.shtml):

The ontology must be open and available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed under the original name or with the same identifiers.The OBO ontologies are for sharing and are resources for the entire community. For this reason, they must be available to all without any constraint or license on their use or redistribution. However, it is proper that their original source is always credited and that after any external alterations, they must never be redistributed under the same name or with the same identifiers.
The ontology is in, or can be expressed in, a common shared syntax. This may be either the OBO syntax, extensions of this syntax, or OWL. The reason for this is that the same tools can then be usefully applied. This facilitates shared software implementations. This criterion is not met in all of the ontologies currently listed, but we are working with the ontology developers to have them available in a common OBO syntax.
The ontologies possesses a unique identifier space within the OBO Foundry. The source of a term (i.e. class) from any ontology can be immediately identified by the prefix of the identifier of each term. It is, therefore, important that this prefix be unique.
The ontology provider has procedures for identifying distinct successive versions.
The ontology has a clearly specified and clearly delineated content. The ontology must be orthogonal to other ontologies already lodged within OBO. The major reason for this principle is to allow two different ontologies, for example anatomy and process, to be combined through additional relationships. These relationships could then be used to constrain when terms could be jointly applied to describe complementary (but distinguishable) perspectives on the same biological or medical entity. As a corollary to this, we would strive for community acceptance of a single ontology for one domain, rather than encouraging rivalry between ontologies.
The ontologies include textual definitions for all terms. Many biological and medical terms may be ambiguous, so terms should be defined so that their precise meaning within the context of a particular ontology is clear to a human reader.
The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
The ontology is well documented.
The ontology has a plurality of independent users.
The ontology will be developed collaboratively with other OBO Foundry members.

I’ve been asking all my frolleagues what they think of these principles and have got some lively responses, including some here from Allyson Lister, Mélanie Courtot, Michel Dumontier and Frank Gibson. So what do you think? How could these guidelines be improved? Do you have any specific (and preferably constructive) criticisms of these ambitious (and worthy) goals? Be bold, be brave and be polite. Anything controversial or “off the record” you can email it to me… I’m all ears.

CC-licensed picture above of the Old Smithy (pub) by Loop Oh. Inspired by Michael Ashburner‘s standing OBO joke (Ontolojoke) which goes something like this: Because Barry Smith is one of the leaders of OBO, should the project be called the OBO Smithy or the OBO Foundry? 🙂

References

Noy, N., Shah, N., Whetzel, P., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D., Storey, M., Chute, C., & Musen, M. (2009). BioPortal: ontologies and integrated data resources at the click of a mouse Nucleic Acids Research DOI: 10.1093/nar/gkp440
Côté, R., Jones, P., Apweiler, R., & Hermjakob, H. (2006). The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries BMC Bioinformatics, 7 (1) DOI: 10.1186/1471-2105-7-97
Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L., Eilbeck, K., Ireland, A., Mungall, C., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S., Scheuermann, R., Shah, N., Whetzel, P., & Lewis, S. (2007). The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration Nature Biotechnology, 25 (11), 1251-1255 DOI: 10.1038/nbt1346
Smith, B., Ceusters, W., Klagges, B., Köhler, J., Kumar, A., Lomax, J., Mungall, C., Neuhaus, F., Rector, A., & Rosse, C. (2005). Relations in biomedical ontologies Genome Biology, 6 (5) DOI: 10.1186/gb-2005-6-5-r46
Bada, M., & Hunter, L. (2008). Identification of OBO nonalignments and its implications for OBO enrichment Bioinformatics, 24 (12), 1448-1455 DOI: 10.1093/bioinformatics/btn194

Comments (3)

June 1, 2009

Scott Marshall on Interoperability

Filed under: biocuration,seminars,semweb — Duncan Hull @ 9:36 am
Tags: Bio2RDF, caBIG, CDISC, Concept Web Alliance, HCLS, hclsig, HL7, myexperiment, nactem, NCBO, neurocommons, ontology, Ontotext, PRISM, Scott Marshall, w3c, word wide web consortium

Scott Marshall is visiting Manchester this week, he will be doing a seminar on Friday 5th June, here are some details for anyone who is interested in attending:

Speaker: Dr. M. Scott Marshall, The University of Amsterdam

Date/Time: 5th June 2009, 11:00

Location: Room MLG.001 (Lecture Theatre), MIB building, (number 16 on campus map)

Title: Standards Enabled Interoperability: W3C Semantic Web for Health Care and Life Sciences Interest Group

Abstract: The W3C Semantic Web for Health Care and Life Sciences Interest Group (HCLS) has the mission of developing, advocating for, and supporting the use of Semantic Web technologies for biological science, translational medicine and health care. HCLS covers hot topics including data integration and federation, bridging commonly used domain standards such as CDISC and HL7, and the applications of medical terminologies. This talk will introduce the HCLS, as well as provide an overview of the activities that are currently ongoing within the task forces, as well as new developments and the recent Face2Face meeting. The role of information extraction and the current interest in Shared Identifiers will also be discussed.

References

Ruttenberg, A., Rees, J., Samwald, M., & Marshall, M. (2009). Life sciences on the Semantic Web: the Neurocommons and beyond Briefings in Bioinformatics, 10 (2), 193-204 DOI: 10.1093/bib/bbp004

Leave a Comment

May 19, 2009

Defrosting the John Rylands University Library

Filed under: seminars — Duncan Hull @ 4:14 pm
Tags: bbsrc, citeulike, connotea, digital library, dystopia, John Rylands, JRUL, JRULM, nactem, pubmed, REFINE, scopus, utopia

For anyone who missed the original bioinformatics seminar I’ll be doing a repeat of the “Defrosting the Digital Library” talk, this time for the staff in the John Rylands University Library (JRUL) . This is the main academic library in Manchester with (quote) “more than 4 million printed books and manuscripts, over 41,000 electronic journals and 500,000 electronic books, as well as several hundred databases, the John Rylands University Library is one of the best-resourced academic libraries in the country.” The journal subscription budget of the library is currently around £4 million per year, that’s before they’ve even bought any books! Here is the abstract for the talk:

After centuries with little change, scientific libraries have recently experienced massive upheaval. From being almost entirely paper-based, most libraries are now almost completely digital. This information revolution has all happened in less than 20 years and has created many novel opportunities and threats for scientists, publishers and libraries.

Today, we are struggling with an embarrassing wealth of digital knowledge on the Web. Most scientists access this knowledge through some kind of digital library, however these places can be cold, impersonal, isolated, and inaccessible places. Many libraries are still clinging to obsolete models of identity, attribution, contribution, citation and publication.

Based on a review published in PLoS Computational Biology, pubmed.gov/18974831 this talk will discuss the current chilly state of digital libraries for biologists, chemists and informaticians, including PubMed and Google Scholar. We highlight problems and solutions to the coupling and decoupling of publication data and metadata, with a tool called citeulike.org. This software tool (and many other tools just like it) exploit the Web to make digital libraries “warmer”: more personal, sociable, integrated, and accessible places.

Finally issues that will help or hinder the continued warming of libraries in the future, particularly the accurate identity of authors and their publications, are briefly introduced. These are discussed in the context of the BBSRC funded REFINE project, at the National Centre for Text Mining (NaCTeM.ac.uk), which is linking biochemical pathway data with evidence for pathways from the PubMed database.

Date: Thursday 21st May 2009, Time: 13.00, Location: John Rylands University (Main) Library Oxford Road, Parkinson Room (inside main entrance, first on right) University of Manchester (number 55 on google map of the Manchester campus). Please come along if you are interested…

References

Hull, D., Pettifer, S., & Kell, D. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web PLoS Computational Biology, 4 (10) DOI: 10.1371/journal.pcbi.1000204

[CC licensed picture above, the John Rylands Library on Deansgate by dpicker: David Picker]

Comments (2)

April 9, 2009

Upcoming Gig: The Scholarly Communication Landscape

Filed under: informatics — Duncan Hull @ 12:35 pm
Tags: Ben Stebbing, Bill Hubbard, BioMed Central, Carole Goble, CIBER, citeulike, david booton, eScholar, flickr, friendfeed, Institutional Repository, jan wilkinson, John Rylands, JRULM, library, Manchester, Michael Daw, Michael Jubb, Mike Daw, MIMAS, myexperiment, nactem, Research Information Network, Robin Hunt, SHERPA, simon gaskell, slideshare, Sophia Ananiadou, stell butler, Terri Attwood, upcoming gig, wordpress

Details of an upcoming gig, The Scholarly Communication Landscape in Manchester on the 23rd of April 2009. If you are interested in coming, you need to register by Monday the 13th April at the official symposium pages.

Why? To help University staff and researchers understand some of the more complex issues embedded in the developments in digital scholarly communication, and to launch Manchester eScholar, the University of Manchester’s new Institutional Repository.

How? Information will be presented by invited speakers, and views and experience exchanged via plenary sessions.

Who For? University researchers (staff and students), research support staff, librarians, research managers, and anyone with an active interest in the field will find this symposium helpful to their developing use and provision of research digital formats. The programme for the symposium currently looks like this:

Welcome and Introduction by Jan Wilkinson, University Librarian and Director of The John Rylands Library.

Session I Chaired by Jan Wilkinson

Is the Knowledge Society a ‘social’ Network? Robin Hunt, CIBER, University College London
National Perspectives, Costs and Benefits Michael Jubb, Director, Research Information Network
The Economics of Scholarly Communication – how open access is changing the landscape Deborah Kahn, Acting Editorial Director Biology, BioMed Central

Session II Chaired by Dr Stella Butler

Information wants to be free. So … ? Dr David Booton, School of Law, University of Manchester
Putting Repositories in Their Place – the changing landscape of scholarly communication Bill Hubbard, SHERPA, University of Nottingham
The Year of Blogging Dangerously – lessons from the blogosphere, by Dr Duncan Hull (errr, thats me!), mib.ac.uk. This talk will describe how to build an institutional repository using free (or cheap) web-based and blogging tools including flickr.com, slideshare.net, citeulike.org, wordpress.com, myexperiment.org and friendfeed.com. We will discuss some strengths and limitations of these tools and what Institutional Repositories can learn from them.

Session III Chaired by Professor Simon Gaskell

The University Press and Digital Publishing Ben Stebbing, Manchester University Press
MIMAS’ role in Supporting the Repository Landscape Vic Lyte, MIMAS
Defrosting the Digital Library (hmmmm, nice title) Professor Terri Attwood, Faculty of Life Sciences
Research Computing at Manchester, Dr Mike Daw, Head of Research Computing, IT Services Division
Enhancing User Experience of Scholarly Communication through Text Mining, Dr Sophia Anianadou, Director, National Centre for Text Mining (NaCTeM.ac.uk)
Manchester eScholar – what, why and when Professor Carole Goble, School of Computer Science

Sumary and close by Professor Simon Gaskell, Vice-President for Research

Leave a Comment

March 12, 2009

Defrosting the Digital Seminar

Filed under: bio,biotech — Duncan Hull @ 8:37 am
Tags: bbsrc, bioinformatics, Casey Bergman, citeulike, google scholar, Jean-Marc Schwartz, Lecture, life sciences, nactem, pubmed, REFINE, seminar, text mining, University of Manchester

Casey Bergman suggested it, Jean-Marc Schwartz organised it, so now I’m going to do it: a seminar on our Defrosting the Digital Library paper as part of the Bioinformatics and Functional Genomics seminar series. Here is the abstract of the talk:

After centuries with little change, scientific libraries have recently experienced massive upheaval. From being almost entirely paper-based, most libraries are now almost completely digital. This information revolution has all happened in less than 20 years and has created many novel opportunities and threats for scientists, publishers and libraries.

Today, we are struggling with an embarrassing wealth of digital knowledge on the Web. Most scientists access this knowledge through some kind of digital library, however these places can be cold, impersonal, isolated, and inaccessible places. Many libraries are still clinging to obsolete models of identity, attribution, contribution, citation and publication.

Based on a review published in PLoS Computational Biology, http://pubmed.gov/18974831 this talk will discuss the current chilly state of digital libraries for biologists, chemists and informaticians, including PubMed and Google Scholar. We highlight problems and solutions to the coupling and decoupling of publication data and metadata, with a tool called http://www.citeulike.org. This software tool exploits the Web to make digital libraries “warmer”: more personal, sociable, integrated, and accessible places.

Finally issues that will help or hinder the continued warming of libraries in the future, particularly the accurate identity of authors and their publications, are briefly introduced. These are discussed in the context of the BBSRC funded REFINE project, at the National Centre for Text Mining (NaCTeM.ac.uk), which is linking biochemical pathway data with evidence for pathways from the PubMed database.

Date: Monday 16th March 2008, Time: 12.00 midday, Location: Michael Smith Building, Main lecture theatre, Faculty of Life Sciences, University of Manchester (number 71 on google map of the Manchester campus). Please come along if you are interested…

[CC licensed picture above, “The Lecture” at Speakers Corner by James M Thorne]

Comments (2)

March 20, 2008

Genomes to Systems 2008: Summary

Filed under: sysbio — Duncan Hull @ 4:23 pm
Tags: aaas, AstraZeneca, bbsrc, college hill, epsrc, G2S, genomics, illumina, nactem, nerc, Roche, systems biology

Genomes to Systems is a biannual conference held in Manchester covering the latest post-genome developments. The final programme for Genomes to Systems 2008 is available here. To supplement this with a little more information, the following briefly overviews sessions during the three days of the 2008 conference. (more…)

Comments (3)

March 19, 2008

Genomes to Systems 2008: Day Two

Filed under: sysbio — Duncan Hull @ 10:15 am
Tags: Alfonso Valencia, ArrayExpress, biocreative, copasi, elixir, enfin, ensembl, G2S, genomics, Hiroaki Kitano, Michael Hucka, nactem, nasa, Nicolas le Novère, pathtext, payao, sbgn, sbml, sbo, Sophia Ananiadou, systems biology, text mining

Genomes to Systems is a biannual conference held in Manchester covering the latest post-genome developments. Here are some brief and incomplete notes on some of the speakers and topics from day two of the 2008 conference. (more…)

Comments (2)

March 5, 2008

Cheminformatics 2.0

Filed under: informatics — Duncan Hull @ 2:35 pm
Tags: Andrew Hopkins, atomium, Douglas Kell, Henry Rzepa, informatics, mcisb, nactem, Peter Willett, Sophia Ananiadou, workshop

Some brief notes on and links to presentations from the MCISB / NaCTeM workshop on Chemical Informatics and Data-driven Science, held at the MIB in Manchester, 4th March 2008. (more…)

Leave a Comment

July 6, 2009

References

June 10, 2009

References

June 4, 2009

References

June 1, 2009

References

May 19, 2009

References

April 9, 2009

Session I Chaired by Jan Wilkinson

Session II Chaired by Dr Stella Butler

Session III Chaired by Professor Simon Gaskell

March 12, 2009

March 20, 2008

March 19, 2008

March 5, 2008

Meta / μετά