O'Really?

June 23, 2009

Impact Factor Boxing 2009

Filed under: data mining,informatics,publishing — Duncan Hull @ 12:59 pm
Tags: Abhishek Tiwari, Alan Fersht, bibliometrics, box, carl bergstrom, eigenfactor, featherweight, fight, flyweight, heavyweight, impact factor, impact factor boxing, Jason Hoyt, Journal Citation Reports, Matthew Cockerill, Mendeley, NSPNAS, Peter Binfield, Peter Murray-Rust, punch, Rupert Murdoch, Sergey Brin, Zoe Corbyn

[This post is part of an ongoing series about impact factors]

The latest results from the annual impact factor boxing world championship contest are out. This is a combat sport where scientific journals are scored according to their supposed influence and impact in Science. This years competition rankings include the first-ever update to the newly introduced Five Year Impact Factor and Eigenfactor™ Metrics [1,2] in Journal Citation Reports (JCR) on the Web (see www.isiknowledge.com/JCR warning: clunky website requires subscription*), presumably in response to widespread criticism of impact factors. The Eigenfactor™ seems to correlate quite closely with the impact factor scores, both of which work at the level of the journal, although they use different methods for measuring a given journals impact. However, what many authors are often more interested in is the impact of an individual article, not the journal where it was published. So it would be interesting to see how the figures below tally with Google Scholar, see also comments by Abhishek Tiwari. I’ve included a table below of bioinformatics impact factors, updated for June 2009. Of course, when I say 2009 (today), I mean 2008 (these are the latest figures available based on data from 2007) – so this shiny new information published this week is already out of date [3] and flawed [4,5] but here is a selection of the data anyway: [update: see figures published in June 2010.]

Journal Title	2008 data from isiknowledge.com/JCR						Eigenfactor™ Metrics
Journal Title	Total Cites	Impact Factor	5-Year Impact Factor	Immediacy Index	Articles	Cited Half-life	Eigenfactor™ Score	Article Influence™ Score
BMC Bionformatics	8141	3.781	4.246	0.664	607	2.8	0.06649	1.730
OUP Bioinformatics	30344	4.328	6.481	0.566	643	4.8	0.18204	2.593
Briefings in Bioinformatics	2908	4.627		1.273	44	4.5	0.02188
PLoS Computational Biology	2730	5.895	6.144	0.826	253	2.1	0.03063	3.370
Genome Biology	9875	6.153	7.812	0.961	229	4.4	0.07930	3.858
Nucleic Acids Research	86787	6.878	6.968	1.635	1070	6.5	0.37108	2.963
PNAS	416018	9.380	10.228	1.635	3508	7.4	1.69893	4.847
Science	409290	28.103	30.268	6.261	862	8.4	1.58344	16.283
Nature	443967	31.434	31.210	8.194	899	8.5	1.76407	17.278

The internet is radically changing the way we communicate and this includes scientific publishing, as media mogul Rupert Murdoch once pointed out big will not beat small any more – it will be the fast beating the slow. An interesting question for publishers and scientists is, how can the Web help the faster flyweight and featherweight boxers (smaller journals) compete and punch-above-their-weight with the reigning world champion heavyweights (Nature, Science and PNAS)? Will the heavyweight publishers always have the killer knockout punches? If you’ve got access to the internet, then you already have a ringside seat from which to watch all the action. This fight should be entertaining viewing and there is an awful lot of money riding on the outcome [6-11].

Seconds away, round two…

References

Fersht, A. (2009). The most influential journals: Impact Factor and Eigenfactor Proceedings of the National Academy of Sciences, 106 (17), 6883-6884 DOI: 10.1073/pnas.0903307106
Bergstrom, C., & West, J. (2008). Assessing citations with the Eigenfactor Metrics Neurology, 71 (23), 1850-1851 DOI: 10.1212/01.wnl.0000338904.37585.66
Cockerill, M. (2004). Delayed impact: ISI’s citation tracking choices are keeping scientists in the dark. BMC Bioinformatics, 5 (1) DOI: 10.1186/1471-2105-5-93
Allen, L., Jones, C., Dolby, K., Lynn, D., & Walport, M. (2009). Looking for Landmarks: The Role of Expert Review and Bibliometric Analysis in Evaluating Scientific Publication Outputs PLoS ONE, 4 (6) DOI: 10.1371/journal.pone.0005910
Grant, R.P. (2009) On article-level metrics and other animals Nature Network
Corbyn, Z. (2009) Do academic journals pose a threat to the advancement of Science? Times Higher Education
Fenner, M. (2009) PLoS ONE: Interview with Peter Binfield Gobbledygook blog at Nature Network
Hoyt, J. (2009) Who is killing science on the Web? Publishers or Scientists? Mendeley Blog
Hull, D. (2009) Escape from the Impact Factor: The Great Escape? O’Really? blog
Murray-Rust, P. (2009) THE article: Do academic journals pose a threat to the advancement of science? Peter Murray-Rust’s blog: A Scientist and the Web
Wu, S. (2009) The evolution of Scientific Impact shirleywho.wordpress.com

* This important data should be freely available (e.g. no subscription), since crucial decisions about the allocation of public money depend on it, but that’s another story.

[More commentary on this post over at friendfeed. CC-licensed Fight Night Punch Test by djclear904]

Comments (7)

June 19, 2009

Nettab 2009 Day Three: Semantic Integration

Filed under: conferences — Duncan Hull @ 11:26 am
Tags: alzforum, Alzheimer's Disease, bioinformatics, controlled vocabulary, Cytoscape, DNA mania, Doron Betel, drupal, Gabriel Valiente, Leaf Bioscience, linked data, moderna, NETTAB, Parkinson's Disease, rna, rna mania, SCF, Science Collaboration Framework, stembook, Tim Clark

A brief report (well just some scribbled notes, bullet points and links really) on the third and final day of Network Applications and Tools in Biology (NETTAB) 2009 in Catania, Sicily. There was a special section on Methods and Tools for RNA Structure and Functional Analysis. Disclaimer: RNA mania isn’t really my thing – so the RNA presentations and papers are grossly under-represented in this mini-report (sorry).

Keynote: Semantically Integrated eCommunities in Biomedicine: Next-Generation Models of Biomedical Communication, Tim Clark Massachusetts General Hospital and Harvard Medical School, Boston. His presentation opened by asking: What do the following have in common?
1. Alzheimer’s Disease
2. Huntington’s Disease
3. Nicotine Addiction
4. Schizophrenia
5. Bipolar Disorder
6. Autism
7. Parkinson’s Disease
8. ALS (Amyotrophic lateral sclerosis)
9. Neuropathic Pain
10. Major Depressive Disorder
11. Cancer (multiple forms)
Answer:
1. Highly complex disorders
2. Much information, incomplete understanding
3. Inadequate treatment options
4. Huge cost in human suffering
5. Multi-factorial causality
6. Require multi-disciplinary collaboration for progress to understanding and cure
Tim discussed using The Science Collaboration Framework (SCF) a reusable, semantically-aware toolkit for building on-line communities. These make heavy use of Open Linked Data, controlled vocabularies and Drupal to build websites to tackle the above disorders. For example pdonlineresearch.org (Parkinson’s Disease), StemBook.org (Harvard Stem Cell Institute) and alzforum.org (Alzheimers) [1]. The controlled vocabulary and ontology approach works well for understood stuff (where named entities are known) but not so good at the outer boundaries of our knowledge. Reusable framework for building web communities, Uses shared ontologies/vocabularies, Open source, freely available.
Michaela Guendel (Leaf Bioscience) presented DC-THERA Directory: A Knowledge Management System to Support Collaboration on Dendritic Cell and Immunology Research, using cell type ontology, dendritic cell ontology, chebi, obi. Project involves Andrea Splendiani, Ciro Scognamiglio and Marco Brandizi
GePh-CARD: an information exchange application for an Hub & Spoke Network for Skeletal Dysplasias was presented by M. Mordenti & L. Sangiorgi
Panel Discussion: Collaborative and Social Bioinformatics Research and Development: Why, When, Who and How? Alex Bateman, Tim Clark, Duncan Hull and all participants. This panel discussion concentrated on Who? (experts vs. non experts, crowds vs. individuals, how to motivate and reward people to contribute to online communities. community annotation of data only possible when curators cede control of data) and then Where? (open wikis vs. closed ones, private vs. public data, wikis often not suitable for highly structured data, centralised vs. distributed systems)
Keynote: Bacterial Phylogeny and Taxonomy in the High-Throughput Sequencing World, Gabriel Valiente
Magdalena Musielak (has worked with Piotr Byzia) presented RNA tertiary structure prediction with ModeRNA,
Olivier Perriquet presented Improved heuristic for pairwise RNA secondary structure prediction,
Giampaolo Bella talked about Analysing microRNA by Theorem Proving. qualitative logic proving before quantitative experimental measures e.g. “shall we go to restaurant” before “how much does it cost”?
Mapping miRNA genes on human fragile sites and translocation breakpoints Alfredo Ferro et al.
Keynote: Computational challenages in the study of small RNAs Doron Betel, memorial sloan-kettering cancer center
microrna.gr. a suite of web based tools for elucidating microrna function was presented by Giorgo L. Papadopoulous, DIANA bioinformatics lab, biomedical Science research center, Alexander Fleming, Vari, Athens, Greece
Last but not least there was miRScape: a cytoscape plugin to annotate biological networks with microRNAs

The Tenth NETTAB (2010) Workshop will be in Rome, where the theme will be Oncology Bioinformatics and will be held at the end of May or beginning of June 2010.

References

Das, S., Girard, L., Green, T., Weitzman, L., Lewis-Bowen, A., & Clark, T. (2009). Building biomedical web communities using a semantically aware content management system Briefings in Bioinformatics, 10 (2), 129-138 DOI: 10.1093/bib/bbn052

Leave a Comment

June 18, 2009

Ooh aah Cantona! Welcome back Eric…

Filed under: football — Duncan Hull @ 5:12 pm
Tags: alex ferguson, Cantona, Chorlton-cum-Hardy, Eric Cantona, Ken Loach, Keppel Road, Manchester United Football Club, Mark Kennedy, MUFC, old trafford, Ooh Aah Cantona, Professor Alex Ferguson, sardine, seagull, The Manchester Guardian, trawler, University of Old Trafford

It is great to see the eminent french football philosopher and scientist Eric Cantona back in his adopted hometown of Manchester. As well as visiting in person during production of the latest Ken Loach film (on the famous Keppel Road, Chorlton) and appearing at the premiere, Eric is currently gracing silver screens in cinemas all over Manchester (and across the world), thanks to his role in Looking for Eric where he stars as lui-même [1].

It is a little known fact that Eric actually has a PhD, with a thesis titled (roughly translated from french):

Making it count with nonchalant gallic passing and scoring.

This prize winning thesis was awarded on graduation from The University of Old Trafford back in the summer of 1997, by the Faculty of Football Science under the supervision of Professor Alex Ferguson. The thesis hasn’t been published in a peer-reviewed scientific journal yet but a lot of the raw data is available on youtube. Eric knows a thing or two about the art and science of timing in football [2].

As for the film, it is not really about football (thank God, footy flicks have an atrocious track record in cinema) or Manchester United Football Club (too divisive) but a touching story about the power of the human imagination in overcoming adversity. Worth watching and very enjoyable, IMHO, you can read all about it in the local newsrag, The Manchester Guardian [3].

So whether you’re red, blue, white, black, seagull, sardine or a trawler – there is something for everyone in this film.

C’est bon or is it c’est bien? Je ne sais pas [gallic shrug]. Bienvenue à la maison Eric!

References

Ken Loach et al (2009). Looking For Eric , Eric Cantona mosaic above by Mark Kennedy (markkennedy.co.uk)
Michael Hopkin (2006). Goal fever at the World Cup: Why the first strike counts. Nature, 441 (7095), 793-793 DOI: 10.1038/441793a
Simon Hattenstone (2009). The awkward squad: Ken Loach and Eric Cantona The Guardian

Leave a Comment

June 17, 2009

Nettab 2009 Day Two: Wikis ‘n’ Workflows

Filed under: conferences,informatics — Duncan Hull @ 11:29 pm
Tags: Alex Bateman, citation bot, Dietlind Gerloff, eela-2, Muggle, myexperiment, Netab, Open Knowledge, openid, openk, pfam, rfam, rfambot, SBMM, smackbot, taverna, Tim Bray, wikipedia, wikirank, workflow, xfam

This is a brief report and some links from the second day of Network Applications and Tools in Biology (NETTAB 2009) in Catania, Sicily. There were two keynotes on the RNA WikiProject [1] by Alex Bateman and myExperiment [2] (by me) as as well as presentations by (I think but I wasn’t concentrating enough) Dietlind Gerloff, Guiliano Armano, Frédéric Cadier and Leandro Ciuffo.

Alex Bateman (wikipedia user:Alexbateman) did an entertaining talk on the RNA wikiproject: Community annotation of RNA families where they have taken data from the Rfam database [3], and put it all into regular wikipedia. This project got quite a lot of media attention back in February. In this case, the primary advantages of “letting go of data” by giving it to wikipedia are that it is read by everyone who uses Google (where pages are frequently the top search result) and wikipedia gets lots more traffic than biological databases like rfam.sanger.ac.uk do. Thanks to wikirank which tells you what is popular on wikipedia, it is also possible to quickly compare the popularity of pages, see RNA vs. Ribosomal RNA vs Micro RNA vs SnoRNA for an example. The Rfam project have some interesting stats on who makes the most edits to the Rfam pages, it isn’t always the scientists who make important contributions, but anonymous users and machines (e.g. like Rfambot, Smackbot and Citation bot) who are often doing most of the hard work. There is a very long tail of contributors who make small contributions – which supports the 90% of users in on-line communities are lurkers who never contribute rule and is reminiscent of Citizen Science and Muggles. I wanted to put the slides from this talk on slideshare, but they contain some unpublished data. You can, however, subscribe to the feed of the Rfam and Pfam blog at xfam.wordpress.com, if you’d like to keep up to date on developments in this area.

After the keynote there were presentations by Dietlind Gerloff on Open Knowledge (a new agent-based infrastructure for bioinformatics experimentation – nice pictorial intro using lego here) and Guiliano Armano? on ProDaMa-C – a collaborative web application to generate specialised protein structure datasets.

The next keynote was on myexperiment.org, “Where Experimental Work Flows” – my slides on Who are you, Managing collaborative digital identities in bioinformatics with myexperiment are embedded below.

I followed this presentation with a live 30 minute demonstration and discussion of myexperiment. The most interesting question people asked was Why use OpenID instead of full blown Public Key Infrastructure? (answer: OpenID is currently a lot easier and provides good-enough security). The rest of the day is a bit of a blur, I’m with Tim Bray in enjoying the monster adrenaline high of public speaking, but with all that ChEBI:28918 coursing through my veins it can be difficult to think straight (immediately before, during or after a talk)… so you’ll have to take a look at the proceedings for the full details of what happened in the afternoon – but they included Make Histri (great name!), SBMM: Systems Biology Metabolic Modeling Assistant [4] by Ismael Navas-Delgado and Biomedical Applications of the EELA-2 project.

By the evening time, there was some Opera dei Pupi (traditional sicilian puppet theatre), a trip to Acireale and a delicious italian feast in a ristorante (the name of which I can’t remember) to round off an enjoyable day.

References

Daub, J., Gardner, P., Tate, J., Ramskold, D., Manske, M., Scott, W., Weinberg, Z., Griffiths-Jones, S., & Bateman, A. (2008). The RNA WikiProject: Community annotation of RNA families RNA, 14 (12), 2462-2464 DOI: 10.1261/rna.1200508
De Roure, D., & Goble, C. (2009). Software Design for Empowering Scientists IEEE Software, 26 (1), 88-95 DOI: 10.1109/MS.2009.22
Gardner, P., Daub, J., Tate, J., Nawrocki, E., Kolbe, D., Lindgreen, S., Wilkinson, A., Finn, R., Griffiths-Jones, S., Eddy, S., & Bateman, A. (2009). Rfam: updates to the RNA families database Nucleic Acids Research, 37 (Database) DOI: 10.1093/nar/gkn766
Reyes-Palomares, A., Montanez, R., Real-Chicharro, A., Chniber, O., Kerzazi, A., Navas-Delgado, I., Medina, M., Aldana-Montes, J., & Sanchez-Jimenez, F. (2009). Systems biology metabolic modeling assistant: an ontology-based tool for the integration of metabolic data in kinetic modeling Bioinformatics, 25 (6), 834-835 DOI: 10.1093/bioinformatics/btp061

Leave a Comment

June 16, 2009

OBO Foundry workshop outcomes 2009

Filed under: conferences — Duncan Hull @ 4:28 pm
Tags: Cotto, ebi, OBO, OBO Foundry, Susanna-Assunta Sansone

Well I was going to blog about last weeks Open Biomedical Ontologies workshop, but Susanna-Assunta Sansone at the EBI has already done it via some very detailed minutes. See her notes for the:

Thanks to the organisers of this workshop for hosting another well run event, I’m only sorry I had to miss the delicious looking dinner at Cotto in Cambridge (and entertaining company) on the last day… Hope to see you again next year.

References

Schober, D., Smith, B., Lewis, S., Kusnierczyk, W., Lomax, J., Mungall, C., Taylor, C., Rocca-Serra, P., & Sansone, S. (2009). Survey-based naming conventions for use in OBO Foundry ontology development BMC Bioinformatics, 10 (1) DOI: 10.1186/1471-2105-10-125

[CC-licensed Picture of Haystack OWL by dullhunk].

Leave a Comment

June 15, 2009

Nettab 2009 Day One: Bio-wikis (and football)

Filed under: conferences,football — Duncan Hull @ 4:19 pm
Tags: Alessandro Lagana, Alfredo Ferro, Alfredo Pulvirenti, Andrea Splendiani, Catenaccio, Elda Rossi, Giampaolo Bella, Leandro Ciuffo, Luca Bortolussi, NETTAB, Paolo Maldini, Paolo Romano, Rosalba Giugno, Stefania Parodi

A brief wiki-report and some wiki-links from the first short and introductory day of Network Applications and Tools in Biology (NETTAB 2009) in Sicily where there was a tutorial on Technologies of wiki resources and bio-wikis delivered by Paolo Romano and Elda Rossi. This covered Gene Wiki, Wikiproteins, Wikigenes and Wikipathways [1-4].

There is already a bewildering array of different wikitechnology, thankfully wikimatrix (“compare them all”) gives wikicomparisons on some of the wikisolutions are already out there (open vs. closed – more on this later).

The theme of the workshop this year has been Technologies, Tools and Applications for Collaborative and Social Bioinformatics Research and Development. So wikis seems like an obvious place to start.

Since user-driven social software is becoming increasingly important, here is a list of of few of the people involved in this years workshop,

I don’t know about you, but those names sound deliciously exotic to my non-italian speaking Inglese ears. When I read the list of names above, it sounds like an elite squad of the Azzurri (football team). You would have Romano as capitano in the middle of the park, joined by Ferro, Ciuffo and Rossi. Then at the back you’ve got the famous italian Catenaccio (locking defence: Paolo Maldini style), the kind that wins world cups (remember 2006?) – there’s nothing getting past Parodi, Giugno, Pulvirenti and Bortolussi in defence. Last but not least, I’d put Splendiani and Bella up front, they sound like strikers to me, mostly because of their surnames.

What all this footballing nonsense has to do with NETTAB and wikis I don’t know. There’s probably some obvious-but-cliched link between Football and Science (by virtue of them both being collaborative and competitive team sports). But, really I just couldn’t resist a little Italian-inspired post about football, I hope to post some more notes on days two and three of the NETTAB workshop later… where most of the action took place.

References

Mons, B., Ashburner, M., Chichester, C., van Mulligen, E., Weeber, M., den Dunnen, J., van Ommen, G., Musen, M., Cockerill, M., Hermjakob, H., Mons, A., Packer, A., Pacheco, R., Lewis, S., Berkeley, A., Melton, W., Barris, N., Wales, J., Meijssen, G., Moeller, E., Roes, P., Borner, K., & Bairoch, A. (2008). Calling on a million minds for community annotation in WikiProteins Genome Biology, 9 (5) DOI: 10.1186/gb-2008-9-5-r89
Hoffmann, R. (2008). A wiki for the life sciences where authorship matters Nature Genetics, 40 (9), 1047-1051 DOI: 10.1038/ng.f.217
Huss, J., Orozco, C., Goodale, J., Wu, C., Batalov, S., Vickers, T., Valafar, F., & Su, A. (2008). A Gene Wiki for Community Annotation of Gene Function PLoS Biology, 6 (7) DOI: 10.1371/journal.pbio.0060175
Pico, A., Kelder, T., van Iersel, M., Hanspers, K., Conklin, B., & Evelo, C. (2008). WikiPathways: Pathway Editing for the People PLoS Biology, 6 (7) DOI: 10.1371/journal.pbio.0060184

Leave a Comment

Andrea Wiggins on little e-Science

Filed under: seminars — Duncan Hull @ 12:41 pm
Tags: Andrea Wiggins, cyberinfrastructure, e-science, FLOSS, Lincoln Stein, myexperiment, Syracuse University

Andrea Wiggins [1,2] from Syracuse University, New York is visiting Manchester this week and will be doing a seminar on “Little e-Science“, the details of which are below.

Date, time: 12 – 2pm on Thursday 18th June

Location: Atlas 1&2, Kilburn building

Title: Little eScience

Abstract: An interdisciplinary community of researchers has started to coalesce around the study of free/libre open source software (FLOSS) development. The research community is in many ways a reflection of the phenomenon of FLOSS practices in both social and technological respects, as many share the open source community’s values that support transparency and democratic participation. As community ties develop, new collaborations have spurred the creation of shared research resources: several repositories provide access to curated research-ready data, working paper repositories provide a means for disseminating early results, and a variety of analysis scripts and workflows connecting the data sets and literature are freely available. Despite these apparently favorable conditions for research collaboration, adoption of the tools and practices associated with eResearch has been slow as yet.

The key issues observed to date seem to stem from the challenges of pre-paradigmatic little science research. Researchers from software engineering, information systems, and even anthropology may examine the same construct, such as FLOSS project success, but will likely proceed from different epistemologies, utilize different data sources, identify different independent variables with varying operationalizations, and employ different research methodologies. In the decentralized and phenomenologically-driven FLOSS research community, creating and maintaining cyberinfrastructure [3] is a substantial effort for a small number of participants. In the little sciences, achieving critical mass of participation may be the most significant factor in creating a viable community of practice around eScience methods.

Update Slides are embedded below:

References

Andrea Wiggins (2009) Social Life of Information: We Are Who We Link Andrea’s blog
Andrea Wiggins, James Howison, & Kevin Crowston (2008). Social dynamics of FLOSS team communication across channels Open Source Development, Communities and Quality
Lincoln Stein (2008). Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges Nature Reviews Genetics, 9 (9), 678-688 DOI: 10.1038/nrg2414

Leave a Comment

June 10, 2009

Kenjiro Taura on Parallel Workflows

Filed under: informatics,seminars — Duncan Hull @ 7:24 am
Tags: bioinformatics, dmake, dsh, EC2, enju, falkon, Globus, gluepy, GXP, gxp make, Kenjiro Taura, make, makefile, MEDIE, Medline, nactem, NLP, pdsh, pubmed, qmake, ssh, taktuk, University of Tokyo, unixish, workflow

Kenjiro Taura is visting Manchester next week from the Department of Information and Communication Engineering at the University of Tokyo. He will be doing a seminar, the details of which are below:

Title: Large scale text processing made simple by GXP make: A Unixish way to parallel workflow processing

Date-time: Monday, 15 June 2009 at 11:00 AM

Location: Room MLG.001, mib.ac.uk

In the first part of this talk, I will introduce a simple tool called GXP make. GXP is a general purpose parallel shell (a process launcher) for multicore machines, unmanaged clusters accessed via SSH, clusters or supercomputers managed by batch scheduler, distributed machines, or any mixture thereof. GXP make is a ‘make‘ execution engine that executes regular UNIX makefiles in parallel. Make, though typically used for software builds, is in fact a general framework to concisely describe workflows constituting sequential commands. Installation of GXP requires no root privileges and needs to be done only on the user’s home machine. GXP make easily scales to more than 1,000 CPU cores. The net result is that GXP make allows an easy migration of workflows from serial environments to clusters and to distributed environments. In the second part, I will talk about our experiences on running a complex text processing workflow developed by Natural Language Processing (NLP) experts. It is an entire workflow that processes MEDLINE abstracts with deep NLP tools (e.g., Enju parser [1]) to generate search indices of MEDIE, a semantic retrieval engine for MEDLINE. It was originally described in Makefile without a particular provision to parallel processing, yet GXP make was able to run it on clusters with almost no changes to the original Makefile. Time for processing abstracts published in a single day was reduced from approximately eight hours (with a single machine) to twenty minutes with a trivial amount of efforts. A larger scale experiment of processing all abstracts published so far and remaining challenges will also be presented.

References

Miyao, Y., Sagae, K., Saetre, R., Matsuzaki, T., & Tsujii, J. (2008). Evaluating contributions of natural language parsers to protein-protein interaction extraction Bioinformatics, 25 (3), 394-400 DOI: 10.1093/bioinformatics/btn631

Leave a Comment

June 4, 2009

Improving the OBO Foundry Principles

Filed under: biocuration,data mining,informatics,semweb — Duncan Hull @ 1:48 pm
Tags: Alan Ruttenberg, Allyson Lister, Barry Smith, bbsrc, Bioportal, ChEBI, Chris Mungall, ebi, Frank Gibson, frolleague, Gene Ontology, Mark Musen, Melanie Courtot, Michael Ashburner, Michel Dumontier, nactem, OBO, OBO Foundry, OBO Smithy, OBO Workshop, obology, old smithy, ontology, ontolojoke, owl, principles, pubmed, REFINE, Richard Scheuermann, sbml, Suzi Lewis, ten commandments, workshop

The Open Biomedical Ontologies (OBO) are a set of reference ontologies for describing all kinds of biomedical data, see [1-5] for examples. Every year, users and developers of these ontologies gather from around the globe for a workshop at the EBI near Cambridge, UK. Following on from the first workshop last year, the 2nd OBO workshop 2009 is fast approaching.

In preparation, I’ve been revisiting the OBO Foundry documentation, part of which establishes a set of principles for ontology development. I’m wondering how they could be improved because these principles are fundamental to the whole effort. We’ve been using one of the OBO ontologies (called Chemical Entities of Biological Interest (ChEBI)) in the REFINE project to mine data from the PubMed database. OBO Ontologies like ChEBI and the Gene Ontology are really crucial to making sense of the massive data which are now common in biology and medicine – so this is stuff that matters.

The OBO Foundry Principles, a sort of Ten Commandments of Ontology (or Obology if you prefer) currently look something like this (copied directly from obofoundry.org/crit.shtml):

The ontology must be open and available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed under the original name or with the same identifiers.The OBO ontologies are for sharing and are resources for the entire community. For this reason, they must be available to all without any constraint or license on their use or redistribution. However, it is proper that their original source is always credited and that after any external alterations, they must never be redistributed under the same name or with the same identifiers.
The ontology is in, or can be expressed in, a common shared syntax. This may be either the OBO syntax, extensions of this syntax, or OWL. The reason for this is that the same tools can then be usefully applied. This facilitates shared software implementations. This criterion is not met in all of the ontologies currently listed, but we are working with the ontology developers to have them available in a common OBO syntax.
The ontologies possesses a unique identifier space within the OBO Foundry. The source of a term (i.e. class) from any ontology can be immediately identified by the prefix of the identifier of each term. It is, therefore, important that this prefix be unique.
The ontology provider has procedures for identifying distinct successive versions.
The ontology has a clearly specified and clearly delineated content. The ontology must be orthogonal to other ontologies already lodged within OBO. The major reason for this principle is to allow two different ontologies, for example anatomy and process, to be combined through additional relationships. These relationships could then be used to constrain when terms could be jointly applied to describe complementary (but distinguishable) perspectives on the same biological or medical entity. As a corollary to this, we would strive for community acceptance of a single ontology for one domain, rather than encouraging rivalry between ontologies.
The ontologies include textual definitions for all terms. Many biological and medical terms may be ambiguous, so terms should be defined so that their precise meaning within the context of a particular ontology is clear to a human reader.
The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
The ontology is well documented.
The ontology has a plurality of independent users.
The ontology will be developed collaboratively with other OBO Foundry members.

I’ve been asking all my frolleagues what they think of these principles and have got some lively responses, including some here from Allyson Lister, Mélanie Courtot, Michel Dumontier and Frank Gibson. So what do you think? How could these guidelines be improved? Do you have any specific (and preferably constructive) criticisms of these ambitious (and worthy) goals? Be bold, be brave and be polite. Anything controversial or “off the record” you can email it to me… I’m all ears.

CC-licensed picture above of the Old Smithy (pub) by Loop Oh. Inspired by Michael Ashburner‘s standing OBO joke (Ontolojoke) which goes something like this: Because Barry Smith is one of the leaders of OBO, should the project be called the OBO Smithy or the OBO Foundry? 🙂

References

Noy, N., Shah, N., Whetzel, P., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D., Storey, M., Chute, C., & Musen, M. (2009). BioPortal: ontologies and integrated data resources at the click of a mouse Nucleic Acids Research DOI: 10.1093/nar/gkp440
Côté, R., Jones, P., Apweiler, R., & Hermjakob, H. (2006). The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries BMC Bioinformatics, 7 (1) DOI: 10.1186/1471-2105-7-97
Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L., Eilbeck, K., Ireland, A., Mungall, C., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S., Scheuermann, R., Shah, N., Whetzel, P., & Lewis, S. (2007). The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration Nature Biotechnology, 25 (11), 1251-1255 DOI: 10.1038/nbt1346
Smith, B., Ceusters, W., Klagges, B., Köhler, J., Kumar, A., Lomax, J., Mungall, C., Neuhaus, F., Rector, A., & Rosse, C. (2005). Relations in biomedical ontologies Genome Biology, 6 (5) DOI: 10.1186/gb-2005-6-5-r46
Bada, M., & Hunter, L. (2008). Identification of OBO nonalignments and its implications for OBO enrichment Bioinformatics, 24 (12), 1448-1455 DOI: 10.1093/bioinformatics/btn194

Comments (3)

June 2, 2009

Who Are You? Digital Identity in Science

Filed under: conferences,data mining,lyrical — Duncan Hull @ 7:50 am
Tags: author-id, Bruno Harbulot, contributor-id, crossref, digital identity, FOAF, google scholar, Henry Story, LinkedIn, openid, pubmed, researcherid, sciblog, scopus, Simon Willison, soloconf, soloconf_09, SSL, SWOT, The Who, TLS

The organisers of the Science Online London 2009 conference are asking people to propose their own session ideas (see some examples here), so here is a proposal:

Title: Who Are You? Digital Identity in Science

Many important decisions in Science are based on identifying scientists and their contributions. From selecting reviewers for grants and publications, to attributing published data and deciding who is funded, hired or promoted, digital identity is at the heart of Science on the Web.

Despite the importance of digital identity, identifying scientists online is an unsolved problem [1]. Consequently, a significant amount of scientific and scholarly work is not easily cited or credited, especially digital contributions: from blogs and wikis, to source code, databases and traditional peer-reviewed publications on the Web. This (proposed) session will look at current mechanisms for identifying scientists digitally including contributor-id (CrossRef), researcher-id (Thomson), Scopus Author ID (Elsevier), OpenID, Google Scholar [2], Single Sign On, PubMed, Google Scholar [2], FOAF+SSL, LinkedIn, Shared Identifiers (URIs) and the rest. We will introduce and discuss each via a SWOT analysis (Strengths, Weaknesses, Opportunities and Threats). Is digital identity even possible and ethical? Beside the obvious benefits of persistent, reliable and unique identifiers, what are the privacy and security issues with personal digital identity?

If this is a successful proposal, I’ll need some help. Any offers? If you are interested in joining in the fun, more details are at scienceonlinelondon.org

References

Bourne, P., & Fink, J. (2008). I Am Not a Scientist, I Am a Number PLoS Computational Biology, 4 (12) DOI: 10.1371/journal.pcbi.1000247
Various Publications about unique author identifiers bookmarked in citeulike
Yours Truly (2009) Google thinks I’m Maurice Wilkins
The Who (1978) Who Are You? Who, who, who, who? (Thanks to Jan Aerts for the reference!)

Comments (1)

« Previous Page — Next Page »

June 23, 2009

References

June 19, 2009

References

June 18, 2009

References

June 17, 2009

References

June 16, 2009

References

June 15, 2009

References

References

June 10, 2009

References

June 4, 2009

References

June 2, 2009

References

Meta / μετά