O'Really?

February 5, 2010

Classic paper: Montagues and Capulets in Science

Romeo and Juliet by HappyHippoSnacksIn preparation for a joint seminar I’ll be doing with Midori Harris here at the EBI, here’s a classic paper [1,2] on the social problems of building biomedical ontologies. This paper is worth reading (or re-reading) because it makes lots of relevant points about the use and abuse of research and how people misunderstand each other [3]. It’s funny (and available Open Access too) plus how many papers do you read with an abstract written in the style of Big Bard Bill Shakespeare?

ABSTRACT: Two households, both alike in dignity, In fair Genomics, where we lay our scene, (One, comforted by its logic’s rigour, Claims ontology for the realm of pure, The other, with blessed scientist’s vigour, Acts hastily on models that endure), From ancient grudge break to new mutiny, When ‘being’ drives a fly-man to blaspheme. From forth the fatal loins of these two foes, Researchers to unlock the book of life; Whole misadventured piteous overthrows, Can with their work bury their clans’ strife. The fruitful passage of their GO-mark’d love, And the continuance of their studies sage, Which, united, yield ontologies undreamed-of, Is now the hour’s traffic of our stage; The which if you with patient ears attend, What here shall miss, our toil shall strive to mend.

So if you read the paper, you have to ask yourself, are you a Montague or a Capulet?

References

  1. Carole Goble and Chris Wroe (2004). The Montagues and the Capulets Comparative and Functional Genomics, 5 (8), 623-632 DOI: 10.1002/cfg.442
  2. Carole Goble (2004) The Capulets and Montagues: A plague on both your houses?, SOFG: Standards and Ontologies for Functional Genomics
  3. William Shakespeare (1596) Romeo and Juliet

[Romeo and Juliet picture via Happy Hippo Snacks]

December 21, 2009

Happy Christmas Lectures 2009

Sue Hartley: Christmas lecturerIf you weren’t able to attend this years Christmas lectures in person, they are being televised tonight in the UK on More4 from 7pm. This year, they are given by Professor Sue Hartley [1] (pictured right) from the University of Sussex. Here is some blurb on the series from the Royal Institution called “The 300 million year war“.

Plants might seem passive, defenceless and almost helpless. But they are most definitely not! Thanks to a war with animals that’s lasted over 300 million years, they’ve developed many terrifying and devious ways to defend themselves and attack their enemies. Vicious poisons, lethal materials and even cunning forms of communicating with unlikely allies are just some of the weapons in their armoury. Using these and other tactics, plants have seen off everything from dinosaurs to caterpillars.

In the 2009 Royal Institution Christmas Lectures, Professor Sue Hartley will show you plants as you’ve never seen them before. They are complicated, cunning, beautiful and with plenty of tricks up their sleeve. And what’s more, we humans are dependent on them in ways you’d never imagine. As well as much of our food, our drugs, medicines and materials are all by-products of this epic 300 million year war.

So if you’re festively feasting this holiday, those brussel sprouts, carrots, potatoes won’t look so innocent now. The lectures are aimed at children, but can be enjoyed by kids of all ages (including grown ups). You can follow some of the action on twitter: hashtag #xmaslectures and @rigb_science. Speaking of Brussel sprouts, the related Royal Institution video How Much Methane Does A Cow Produce In An Hour? might also be of interest.

Since it’s the end of the year, happy holidays to you all (thanks for visiting O’Really?) hope to see you again in 2010.

References

  1. Hartley, S., & Gange, A. (2009). Impacts of Plant Symbiotic Fungi on Insect Herbivores: Mutualism in a Multitrophic Context Annual Review of Entomology, 54 (1), 323-342 DOI: 10.1146/annurev.ento.54.110807.090614

June 15, 2009

Andrea Wiggins on little e-Science

Andrea WigginsAndrea Wiggins [1,2] from Syracuse University, New York is visiting Manchester this week and will be doing a seminar on “Little e-Science“, the details of which are below.

Date, time: 12 – 2pm on Thursday 18th June

Location: Atlas 1&2, Kilburn building

Title: Little eScience

Abstract: An interdisciplinary community of researchers has started to coalesce around the study of free/libre open source software (FLOSS) development. The research community is in many ways a reflection of the phenomenon of FLOSS practices in both social and technological respects, as many share the open source community’s values that support transparency and democratic participation. As community ties develop, new collaborations have spurred the creation of shared research resources: several repositories provide access to curated research-ready data, working paper repositories provide a means for disseminating early results, and a variety of analysis scripts and workflows connecting the data sets and literature are freely available. Despite these apparently favorable conditions for research collaboration, adoption of the tools and practices associated with eResearch has been slow as yet.

The key issues observed to date seem to stem from the challenges of pre-paradigmatic little science research. Researchers from software engineering, information systems, and even anthropology may examine the same construct, such as FLOSS project success, but will likely proceed from different epistemologies, utilize different data sources, identify different independent variables with varying operationalizations, and employ different research methodologies. In the decentralized and phenomenologically-driven FLOSS research community, creating and maintaining cyberinfrastructure [3] is a substantial effort for a small number of participants. In the little sciences, achieving critical mass of participation may be the most significant factor in creating a viable community of practice around eScience methods.

Update Slides are embedded below:

References

  1. Andrea Wiggins (2009) Social Life of Information: We Are Who We Link Andrea’s blog
  2. Andrea Wiggins, James Howison, & Kevin Crowston (2008). Social dynamics of FLOSS team communication across channels Open Source Development, Communities and Quality
  3. Lincoln Stein (2008). Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges Nature Reviews Genetics, 9 (9), 678-688 DOI: 10.1038/nrg2414

June 10, 2009

Kenjiro Taura on Parallel Workflows

Kenjiro TauraKenjiro Taura is visting Manchester next week from the Department of Information and Communication Engineering at the University of Tokyo. He will be doing a seminar, the details of which are below:

Title: Large scale text processing made simple by GXP make: A Unixish way to parallel workflow processing

Date-time: Monday, 15 June 2009 at 11:00 AM

Location: Room MLG.001, mib.ac.uk

In the first part of this talk, I will introduce a simple tool called GXP make. GXP is a general purpose parallel shell (a process launcher) for multicore machines, unmanaged clusters accessed via SSH, clusters or supercomputers managed by batch scheduler, distributed machines, or any mixture thereof. GXP make is a ‘make‘ execution engine that executes regular UNIX makefiles in parallel. Make, though typically used for software builds, is in fact a general framework to concisely describe workflows constituting sequential commands. Installation of GXP requires no root privileges and needs to be done only on the user’s home machine. GXP make easily scales to more than 1,000 CPU cores. The net result is that GXP make allows an easy migration of workflows from serial environments to clusters and to distributed environments. In the second part, I will talk about our experiences on running a complex text processing workflow developed by Natural Language Processing (NLP) experts. It is an entire workflow that processes MEDLINE abstracts with deep NLP tools (e.g., Enju parser [1]) to generate search indices of MEDIE, a semantic retrieval engine for MEDLINE. It was originally described in Makefile without a particular provision to parallel processing, yet GXP make was able to run it on clusters with almost no changes to the original Makefile. Time for processing abstracts published in a single day was reduced from approximately eight hours (with a single machine) to twenty minutes with a trivial amount of efforts. A larger scale experiment of processing all abstracts published so far and remaining challenges will also be presented.

References

  1. Miyao, Y., Sagae, K., Saetre, R., Matsuzaki, T., & Tsujii, J. (2008). Evaluating contributions of natural language parsers to protein-protein interaction extraction Bioinformatics, 25 (3), 394-400 DOI: 10.1093/bioinformatics/btn631

June 2, 2009

Michael Ley on Digital Bibliographies

Michael Ley

Michael Ley is visiting Manchester this week, he will be doing a seminar on Wednesday 3rd June, here are some details for anyone who is interested in attending:

Date: 3rd Jun 2009

Title: DBLP: How the data get in

Speaker: Dr Michael Ley. University of Trier, Germany

Time & Location: 14:15, Lecture Theatre 1.4, Kilburn Building

Abstract: The DBLP (Digital Bibliography & Library Project) Computer Science Bibliography now includes more than 1.2 million bibliographic records. For Computer Science researchers the DBLP web site now is a popular tool to trace the work of colleagues and to retrieve bibliographic details when composing the lists of references for new papers. Ranking and profiling of persons, institutions, journals, or conferences is another usage of DBLP. Many scientists are aware of this and want their publications being listed as complete as possible.

The talk focuses on the data acquisition workflow for DBLP. To get ‘clean’ basic bibliographic information for scientific publications remains a chaotic puzzle.

Large publishers are either not interested to cooperate with open services like DBLP, or their policy is very inconsistent. In most cases they are not able or not willing to deliver basic data required for DBLP in a direct way, but they encourage us to crawl their Web sites. This indirection has two main problems:

  1. The organisation and appearance of Web sites changes from time to time, this forces a reimplementation of information extraction scripts. [1]
  2. In many cases manual steps are necessary to get ‘complete’ bibliographic information.

For many small information sources it is not worthwhile to develop information extraction scripts. Data acquisition is done manually. There is an amazing variety of small but interesting journals, conferences and workshops in Computer Science which are not under the umbrella of ACM, IEEE, Springer, Elsevier etc. How they get it often is decided very pragmatically.

The goal of the talk and my visit to Manchester is to start a discussion process: The EasyChair conference management system developed by Andrei Voronkov and DBLP are parts of scientific publication workflow. They should be connected for mutual benefit?

References

  1. Lincoln Stein (2002). Creating a bioinformatics nation: screen scraping is torture Nature, 417 (6885), 119-120 DOI: 10.1038/417119a

June 1, 2009

Scott Marshall on Interoperability

M. Scott MarshallScott Marshall is visiting Manchester this week, he will be doing a seminar on Friday 5th June, here are some details for anyone who is interested in attending:

Speaker: Dr. M. Scott Marshall, The University of Amsterdam

Date/Time: 5th June 2009, 11:00

Location: Room MLG.001 (Lecture Theatre), MIB building, (number 16 on campus map)

Title: Standards Enabled Interoperability: W3C Semantic Web for Health Care and Life Sciences Interest Group

Abstract: The W3C Semantic Web for Health Care and Life Sciences Interest Group (HCLS) has the mission of developing, advocating for, and supporting the use of Semantic Web technologies for biological science, translational medicine and health care. HCLS covers hot topics including data integration and federation, bridging commonly used domain standards such as CDISC and HL7, and the applications of medical terminologies. This talk will introduce the HCLS, as well as provide an overview of the activities that are currently ongoing within the task forces, as well as new developments and the recent Face2Face meeting. The role of information extraction and the current interest in Shared Identifiers will also be discussed.

References

  1. Ruttenberg, A., Rees, J., Samwald, M., & Marshall, M. (2009). Life sciences on the Semantic Web: the Neurocommons and beyond Briefings in Bioinformatics, 10 (2), 193-204 DOI: 10.1093/bib/bbp004

May 19, 2009

Defrosting the John Rylands University Library

Filed under: seminars — Duncan Hull @ 4:14 pm
Tags: , , , , , , , , , , , ,

http://www.flickr.com/photos/dpicker/3107856991/For anyone who missed the original bioinformatics seminar I’ll be doing a repeat of the “Defrosting the Digital Library” talk, this time for the staff in the John Rylands University Library (JRUL) . This is the main academic library in Manchester with (quote) “more than 4 million printed books and manuscripts, over 41,000 electronic journals and 500,000 electronic books, as well as several hundred databases, the John Rylands University Library is one of the best-resourced academic libraries in the country.” The journal subscription budget of the library is currently around £4 million per year, that’s before they’ve even bought any books! Here is the abstract for the talk:

After centuries with little change, scientific libraries have recently experienced massive upheaval. From being almost entirely paper-based, most libraries are now almost completely digital. This information revolution has all happened in less than 20 years and has created many novel opportunities and threats for scientists, publishers and libraries.

Today, we are struggling with an embarrassing wealth of digital knowledge on the Web. Most scientists access this knowledge through some kind of digital library, however these places can be cold, impersonal, isolated, and inaccessible places. Many libraries are still clinging to obsolete models of identity, attribution, contribution, citation and publication.

Based on a review published in PLoS Computational Biology, pubmed.gov/18974831 this talk will discuss the current chilly state of digital libraries for biologists, chemists and informaticians, including PubMed and Google Scholar. We highlight problems and solutions to the coupling and decoupling of publication data and metadata, with a tool called citeulike.org. This software tool (and many other tools just like it) exploit the Web to make digital libraries “warmer”: more personal, sociable, integrated, and accessible places.

Finally issues that will help or hinder the continued warming of libraries in the future, particularly the accurate identity of authors and their publications, are briefly introduced. These are discussed in the context of the BBSRC funded REFINE project, at the National Centre for Text Mining (NaCTeM.ac.uk), which is linking biochemical pathway data with evidence for pathways from the PubMed database.

Date: Thursday 21st May 2009, Time: 13.00, Location: John Rylands University (Main) Library Oxford Road, Parkinson Room (inside main entrance, first on right) University of Manchester (number 55 on google map of the Manchester campus). Please come along if you are interested…

References

  1. Hull, D., Pettifer, S., & Kell, D. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web PLoS Computational Biology, 4 (10) DOI: 10.1371/journal.pcbi.1000204

[CC licensed picture above, the John Rylands Library on Deansgate by dpicker: David Picker]

May 6, 2009

Michel Dumontier on Representing Biochemistry

Michel Dumontier by Tom HeathMichel Dumontier is visiting Manchester this week, he will be doing a seminar on Monday 11th of May,  here are some details for anyone who is interested in attending:

Title: Increasingly Accurate Representation of Biochemistry

Speaker: Michel Dumontier, dumontierlab.com

Time: 14.00, Monday 11th May 2009
Venue: Atlas 1, Kilburn Building, University of Manchester, number 39 on the Google Campus Map

Abstract: Biochemical ontologies aim to capture and represent biochemical entities and the relations that exist between them in an accurate manner. A fundamental starting point is biochemical identity, but our current approach for generating identifiers is haphazard and consequently integrating data is error-prone. I will discuss plausible structure-based strategies for biochemical identity whether it be at molecular level or some part thereof (e.g. residues, collection of residues, atoms, collection of atoms, functional groups) such that identifiers may be generated in an automatic and curator/database independent manner. With structure-based identifiers in hand, we will be in a position to more accurately capture context-specific biochemical knowledge, such as how a set of residues in a binding site are involved in a chemical reaction including the fact that a key nitrogen atom must first be de-protonated. Thus, our current representation of biochemical knowledge may improve such that manual and automatic methods of biocuration are substantially more accurate.

Update: Slides are now available via SlideShare.

[Creative Commons licensed picture of Michel in action at ISWC 2008 from Tom Heath]

References

  1. Michel Dumontier and Natalia Villanueva-Rosales (2009) Towards pharmacogenomics knowledge discovery with the semantic web Briefings in Bioinformatics DOI:10.1093/bib/bbn056
  2. Doug Howe et al (2008) Big data: The future of biocuration Nature 455, 47-50 doi:10.1038/455047a

Blog at WordPress.com.