O'Really?

June 19, 2009

Nettab 2009 Day Three: Semantic Integration

Catania ElephantA brief report (well just some scribbled notes, bullet points and links really) on the third and final day of Network Applications and Tools in Biology (NETTAB) 2009 in Catania, Sicily. There was a special section on Methods and Tools for RNA Structure and Functional Analysis. Disclaimer: RNA mania isn’t really my thing – so the RNA presentations and papers are grossly under-represented in this mini-report (sorry).

  • Keynote: Semantically Integrated eCommunities in Biomedicine: Next-Generation Models of Biomedical Communication, Tim Clark Massachusetts General Hospital and Harvard Medical School, Boston. His presentation opened by asking: What do the following have in common?

    1. Alzheimer’s Disease
    2. Huntington’s Disease
    3. Nicotine Addiction
    4. Schizophrenia
    5. Bipolar Disorder
    6. Autism
    7. Parkinson’s Disease
    8. ALS (Amyotrophic lateral sclerosis)
    9. Neuropathic Pain
    10. Major Depressive Disorder
    11. Cancer (multiple forms)

    Answer:

    1. Highly complex disorders
    2. Much information, incomplete understanding
    3. Inadequate treatment options
    4. Huge cost in human suffering
    5. Multi-factorial causality
    6. Require multi-disciplinary collaboration for progress to understanding and cure

    Tim discussed using The Science Collaboration Framework (SCF) a reusable, semantically-aware toolkit for building on-line communities. These make heavy use of Open Linked Data, controlled vocabularies and  Drupal to build websites to tackle the above disorders. For example pdonlineresearch.org (Parkinson’s Disease), StemBook.org (Harvard Stem Cell Institute) and alzforum.org (Alzheimers) [1]. The controlled vocabulary and ontology approach works well for understood stuff (where named entities are known) but not so good at the outer boundaries of our knowledge. Reusable framework for building web communities, Uses shared ontologies/vocabularies, Open source, freely available.

  • Michaela Guendel (Leaf Bioscience) presented DC-THERA Directory: A Knowledge Management System to Support Collaboration on Dendritic Cell and Immunology Research,  using cell type ontology, dendritic cell ontology, chebi, obi. Project involves Andrea Splendiani, Ciro Scognamiglio and Marco Brandizi
  • GePh-CARD: an information exchange application for an Hub & Spoke Network for Skeletal Dysplasias was presented by M. Mordenti & L. Sangiorgi
  • Panel Discussion: Collaborative and Social Bioinformatics Research and Development: Why, When, Who and How? Alex Bateman, Tim Clark, Duncan Hull and all participants. This panel discussion concentrated on Who? (experts vs. non experts, crowds vs. individuals, how to motivate and reward people to contribute to online communities. community annotation of data only possible when curators cede control of data) and then Where? (open wikis vs. closed ones, private vs. public data, wikis often not suitable for highly structured data, centralised vs. distributed systems)
  • Keynote: Bacterial Phylogeny and Taxonomy in the High-Throughput Sequencing World, Gabriel Valiente
  • Magdalena Musielak (has worked with Piotr Byzia) presented RNA tertiary structure prediction with ModeRNA,
  • Olivier Perriquet presented Improved heuristic for pairwise RNA secondary structure prediction,
  • Giampaolo Bella talked about Analysing microRNA by Theorem Proving. qualitative logic proving before quantitative experimental measures e.g. “shall we go to restaurant” before “how much does it cost”?
  • Mapping miRNA genes on human fragile sites and translocation breakpoints Alfredo Ferro et al.
  • Keynote: Computational challenages in the study of small RNAs Doron Betel, memorial sloan-kettering cancer center
  • microrna.gr. a suite of web based tools for elucidating microrna function was presented by Giorgo L. Papadopoulous, DIANA bioinformatics lab, biomedical Science research center, Alexander Fleming, Vari, Athens, Greece
  • Last but not least there was miRScape: a cytoscape plugin to annotate biological networks with microRNAs

The Tenth NETTAB (2010) Workshop will be in Rome, where the theme will be Oncology Bioinformatics and will be held at the end of  May or beginning of  June 2010.

References

  1. Das, S., Girard, L., Green, T., Weitzman, L., Lewis-Bowen, A., & Clark, T. (2009). Building biomedical web communities using a semantically aware content management system Briefings in Bioinformatics, 10 (2), 129-138 DOI: 10.1093/bib/bbn052

June 18, 2009

Ooh aah Cantona! Welcome back Eric…

Eric Cantona by Mark KennedyIt is great to see the eminent french football philosopher and scientist Eric Cantona back in his adopted hometown of Manchester. As well as visiting in person during production of the latest Ken Loach film (on the famous Keppel Road, Chorlton) and appearing at the premiere, Eric is currently gracing silver screens in cinemas all over Manchester (and across the world), thanks to his role in Looking for Eric where he stars as lui-même [1].

It is a little known fact that Eric actually has a PhD, with a thesis titled (roughly translated from french):

Making it count with nonchalant gallic passing and scoring.

This prize winning thesis was awarded on graduation from The University of Old Trafford back in the summer of 1997, by the Faculty of Football Science under the supervision of Professor Alex Ferguson. The thesis hasn’t been published in a peer-reviewed scientific journal yet but a lot of the raw data is available on youtube. Eric knows a thing or two about the art and science of timing in football [2].

As for the film, it is not really about football (thank God, footy flicks have an atrocious track record in cinema) or Manchester United Football Club (too divisive) but a touching story about the power of the human imagination in overcoming adversity. Worth watching and very enjoyable, IMHO, you can read all about it in the local newsrag, The Manchester Guardian [3].

So whether you’re red, blue, white, black, seagull, sardine or a trawler – there is something for everyone in this film.

C’est bon or is it c’est bien? Je ne sais pas [gallic shrug]. Bienvenue à la maison Eric!

References

  1. Ken Loach et al (2009). Looking For Eric , Eric Cantona mosaic above by Mark Kennedy (markkennedy.co.uk)
  2. Michael Hopkin (2006). Goal fever at the World Cup: Why the first strike counts. Nature, 441 (7095), 793-793 DOI: 10.1038/441793a
  3. Simon Hattenstone (2009). The awkward squad: Ken Loach and Eric Cantona The Guardian

June 17, 2009

Nettab 2009 Day Two: Wikis ‘n’ Workflows

Alex Bateman on the RNA WikiprojectThis is a  brief report and some links from the second day of Network Applications and Tools in Biology (NETTAB 2009) in Catania, Sicily. There were two keynotes on the RNA WikiProject [1] by Alex Bateman and myExperiment [2] (by me) as as well as presentations by (I think but I wasn’t concentrating enough) Dietlind Gerloff, Guiliano Armano, Frédéric Cadier and Leandro Ciuffo.

Alex Bateman (wikipedia user:Alexbateman) did an entertaining talk on the RNA wikiproject: Community annotation of RNA families where they have taken data from the Rfam database [3], and put it all into regular wikipedia. This project got quite a lot of media attention back in February. In this case, the primary advantages of “letting go of data” by giving it to wikipedia are that it is read by everyone who uses Google (where pages are frequently the top search result) and wikipedia gets lots more traffic than biological databases like rfam.sanger.ac.uk do. Thanks to wikirank which tells you what is popular on wikipedia, it is also possible to quickly compare the popularity of pages, see RNA vs. Ribosomal RNA vs Micro RNA vs SnoRNA for an example. The Rfam project have some interesting stats on who makes the most edits to the Rfam pages, it isn’t always the scientists who make important contributions, but anonymous users and machines (e.g. like Rfambot, Smackbot and Citation bot) who are often doing most of the hard work. There is a very long tail of contributors who make small contributions – which supports the 90% of users in on-line communities are lurkers who never contribute rule and is reminiscent of Citizen Science and Muggles. I wanted to put the slides from this talk on slideshare, but they contain some unpublished data. You can, however, subscribe to the feed of the Rfam and Pfam blog at xfam.wordpress.com, if you’d like to keep up to date on developments in this area.

After the keynote there were presentations by Dietlind Gerloff on Open Knowledge (a new agent-based infrastructure for bioinformatics experimentation – nice pictorial intro using lego here) and Guiliano Armano? on ProDaMa-C – a collaborative web application to generate specialised protein structure datasets.

The next keynote was on myexperiment.org, “Where Experimental Work Flows” – my slides on Who are you, Managing collaborative digital identities in bioinformatics with myexperiment are embedded below.

I followed this presentation with a live 30 minute demonstration and discussion of myexperiment. The most interesting question people asked was Why use OpenID instead of full blown Public Key Infrastructure? (answer: OpenID is currently a lot easier and provides good-enough security). The rest of the day is a bit of a blur, I’m with Tim Bray in enjoying the monster adrenaline high of public speaking, but with all that ChEBI:28918 coursing through my veins it can be difficult to think straight (immediately before, during or after a talk)… so you’ll have to take a look at the proceedings for the full details of what happened in the afternoon – but they included Make Histri (great name!), SBMM: Systems Biology Metabolic Modeling Assistant [4] by Ismael Navas-Delgado and Biomedical Applications of the EELA-2 project.

By the evening time, there was some Opera dei Pupi (traditional sicilian puppet theatre), a trip to Acireale and a delicious italian feast in a ristorante (the name of which I can’t remember) to round off an enjoyable day.

References

  1. Daub, J., Gardner, P., Tate, J., Ramskold, D., Manske, M., Scott, W., Weinberg, Z., Griffiths-Jones, S., & Bateman, A. (2008). The RNA WikiProject: Community annotation of RNA families RNA, 14 (12), 2462-2464 DOI: 10.1261/rna.1200508
  2. De Roure, D., & Goble, C. (2009). Software Design for Empowering Scientists IEEE Software, 26 (1), 88-95 DOI: 10.1109/MS.2009.22
  3. Gardner, P., Daub, J., Tate, J., Nawrocki, E., Kolbe, D., Lindgreen, S., Wilkinson, A., Finn, R., Griffiths-Jones, S., Eddy, S., & Bateman, A. (2009). Rfam: updates to the RNA families database Nucleic Acids Research, 37 (Database) DOI: 10.1093/nar/gkn766
  4. Reyes-Palomares, A., Montanez, R., Real-Chicharro, A., Chniber, O., Kerzazi, A., Navas-Delgado, I., Medina, M., Aldana-Montes, J., & Sanchez-Jimenez, F. (2009). Systems biology metabolic modeling assistant: an ontology-based tool for the integration of metabolic data in kinetic modeling Bioinformatics, 25 (6), 834-835 DOI: 10.1093/bioinformatics/btp061

June 16, 2009

OBO Foundry workshop outcomes 2009

Filed under: conferences — Duncan Hull @ 4:28 pm
Tags: , , , ,

Haystack OWL by dullhunkWell I was going to blog about last weeks Open Biomedical Ontologies workshop, but Susanna-Assunta Sansone at the EBI has already done it via some very detailed minutes. See her notes for the:

  1. Overview
  2. Outcomes from day one
  3. Outcomes from day two

Thanks to the organisers of this workshop for hosting another well run event, I’m only sorry I had to miss the delicious looking dinner at Cotto in Cambridge (and entertaining company) on the last day…  Hope to see you again next year.

References

  1. Schober, D., Smith, B., Lewis, S., Kusnierczyk, W., Lomax, J., Mungall, C., Taylor, C., Rocca-Serra, P., & Sansone, S. (2009). Survey-based naming conventions for use in OBO Foundry ontology development BMC Bioinformatics, 10 (1) DOI: 10.1186/1471-2105-10-125

[CC-licensed Picture of Haystack OWL by dullhunk].

June 15, 2009

Nettab 2009 Day One: Bio-wikis (and football)

Drogba, Eto'o, Ronalda, Beckham, Messi, Ibrahimovic, Del Piero and KakaA brief wiki-report and some wiki-links from the first short and introductory day of Network Applications and Tools in Biology (NETTAB 2009) in Sicily where there was a tutorial on Technologies of wiki resources and bio-wikis delivered by Paolo Romano and Elda Rossi. This covered Gene Wiki, Wikiproteins, Wikigenes and Wikipathways [1-4].

There is already a bewildering array of different wikitechnology, thankfully wikimatrix (“compare them all”) gives wikicomparisons on some of the wikisolutions are already out there (open vs. closed – more on this later).

The theme of the workshop this year has been Technologies, Tools and Applications for Collaborative and Social Bioinformatics Research and Development. So wikis seems like an obvious place to start.

Since user-driven social software is becoming increasingly important, here is a list of of few of the people involved in this years workshop,

  1. Giampaolo Bella
  2. Luca Bortolussi
  3. Leandro Ciuffo
  4. Alfredo Ferro
  5. Rosalba Giugno
  6. Alessandro Lagana
  7. Stefania Parodi
  8. Alfredo Pulvirenti
  9. Paolo Romano
  10. Elda Rossi
  11. Andrea Splendiani

I don’t know about you, but those names sound deliciously exotic to my non-italian speaking Inglese ears. When I read the list of names above, it sounds like an elite squad of the Azzurri (football team). You would have Romano as capitano in the middle of the park, joined by Ferro, Ciuffo and Rossi. Then at the back you’ve got the famous italian Catenaccio (locking defence: Paolo Maldini style), the kind that wins world cups (remember 2006?) – there’s nothing getting past Parodi, Giugno, Pulvirenti and Bortolussi in defence. Last but not least, I’d put Splendiani and Bella up front, they sound like strikers to me, mostly because of their surnames.

What all this footballing nonsense has to do with NETTAB and wikis I don’t know. There’s probably some obvious-but-cliched link between Football and Science (by virtue of them both being collaborative and competitive team sports). But, really I just couldn’t resist a little Italian-inspired post about football, I hope to post some more notes on days two and three of the NETTAB workshop later… where most of the action took place.

References

  1. Mons, B., Ashburner, M., Chichester, C., van Mulligen, E., Weeber, M., den Dunnen, J., van Ommen, G., Musen, M., Cockerill, M., Hermjakob, H., Mons, A., Packer, A., Pacheco, R., Lewis, S., Berkeley, A., Melton, W., Barris, N., Wales, J., Meijssen, G., Moeller, E., Roes, P., Borner, K., & Bairoch, A. (2008). Calling on a million minds for community annotation in WikiProteins Genome Biology, 9 (5) DOI: 10.1186/gb-2008-9-5-r89
  2. Hoffmann, R. (2008). A wiki for the life sciences where authorship matters Nature Genetics, 40 (9), 1047-1051 DOI: 10.1038/ng.f.217
  3. Huss, J., Orozco, C., Goodale, J., Wu, C., Batalov, S., Vickers, T., Valafar, F., & Su, A. (2008). A Gene Wiki for Community Annotation of Gene Function PLoS Biology, 6 (7) DOI: 10.1371/journal.pbio.0060175
  4. Pico, A., Kelder, T., van Iersel, M., Hanspers, K., Conklin, B., & Evelo, C. (2008). WikiPathways: Pathway Editing for the People PLoS Biology, 6 (7) DOI: 10.1371/journal.pbio.0060184

Andrea Wiggins on little e-Science

Andrea WigginsAndrea Wiggins [1,2] from Syracuse University, New York is visiting Manchester this week and will be doing a seminar on “Little e-Science“, the details of which are below.

Date, time: 12 – 2pm on Thursday 18th June

Location: Atlas 1&2, Kilburn building

Title: Little eScience

Abstract: An interdisciplinary community of researchers has started to coalesce around the study of free/libre open source software (FLOSS) development. The research community is in many ways a reflection of the phenomenon of FLOSS practices in both social and technological respects, as many share the open source community’s values that support transparency and democratic participation. As community ties develop, new collaborations have spurred the creation of shared research resources: several repositories provide access to curated research-ready data, working paper repositories provide a means for disseminating early results, and a variety of analysis scripts and workflows connecting the data sets and literature are freely available. Despite these apparently favorable conditions for research collaboration, adoption of the tools and practices associated with eResearch has been slow as yet.

The key issues observed to date seem to stem from the challenges of pre-paradigmatic little science research. Researchers from software engineering, information systems, and even anthropology may examine the same construct, such as FLOSS project success, but will likely proceed from different epistemologies, utilize different data sources, identify different independent variables with varying operationalizations, and employ different research methodologies. In the decentralized and phenomenologically-driven FLOSS research community, creating and maintaining cyberinfrastructure [3] is a substantial effort for a small number of participants. In the little sciences, achieving critical mass of participation may be the most significant factor in creating a viable community of practice around eScience methods.

Update Slides are embedded below:

References

  1. Andrea Wiggins (2009) Social Life of Information: We Are Who We Link Andrea’s blog
  2. Andrea Wiggins, James Howison, & Kevin Crowston (2008). Social dynamics of FLOSS team communication across channels Open Source Development, Communities and Quality
  3. Lincoln Stein (2008). Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges Nature Reviews Genetics, 9 (9), 678-688 DOI: 10.1038/nrg2414

June 10, 2009

Kenjiro Taura on Parallel Workflows

Kenjiro TauraKenjiro Taura is visting Manchester next week from the Department of Information and Communication Engineering at the University of Tokyo. He will be doing a seminar, the details of which are below:

Title: Large scale text processing made simple by GXP make: A Unixish way to parallel workflow processing

Date-time: Monday, 15 June 2009 at 11:00 AM

Location: Room MLG.001, mib.ac.uk

In the first part of this talk, I will introduce a simple tool called GXP make. GXP is a general purpose parallel shell (a process launcher) for multicore machines, unmanaged clusters accessed via SSH, clusters or supercomputers managed by batch scheduler, distributed machines, or any mixture thereof. GXP make is a ‘make‘ execution engine that executes regular UNIX makefiles in parallel. Make, though typically used for software builds, is in fact a general framework to concisely describe workflows constituting sequential commands. Installation of GXP requires no root privileges and needs to be done only on the user’s home machine. GXP make easily scales to more than 1,000 CPU cores. The net result is that GXP make allows an easy migration of workflows from serial environments to clusters and to distributed environments. In the second part, I will talk about our experiences on running a complex text processing workflow developed by Natural Language Processing (NLP) experts. It is an entire workflow that processes MEDLINE abstracts with deep NLP tools (e.g., Enju parser [1]) to generate search indices of MEDIE, a semantic retrieval engine for MEDLINE. It was originally described in Makefile without a particular provision to parallel processing, yet GXP make was able to run it on clusters with almost no changes to the original Makefile. Time for processing abstracts published in a single day was reduced from approximately eight hours (with a single machine) to twenty minutes with a trivial amount of efforts. A larger scale experiment of processing all abstracts published so far and remaining challenges will also be presented.

References

  1. Miyao, Y., Sagae, K., Saetre, R., Matsuzaki, T., & Tsujii, J. (2008). Evaluating contributions of natural language parsers to protein-protein interaction extraction Bioinformatics, 25 (3), 394-400 DOI: 10.1093/bioinformatics/btn631

June 4, 2009

Improving the OBO Foundry Principles

The Old Smithy Pub by loop ohThe Open Biomedical Ontologies (OBO) are a set of reference ontologies for describing all kinds of biomedical data, see [1-5] for examples. Every year, users and developers of these ontologies gather from around the globe for a workshop at the EBI near Cambridge, UK. Following on from the first workshop last year, the 2nd OBO workshop 2009 is fast approaching.

In preparation, I’ve been revisiting the OBO Foundry documentation, part of which establishes a set of principles for ontology development. I’m wondering how they could be improved because these principles are fundamental to the whole effort. We’ve been using one of the OBO ontologies (called Chemical Entities of Biological Interest (ChEBI)) in the REFINE project to mine data from the PubMed database. OBO Ontologies like ChEBI and the Gene Ontology are really crucial to making sense of the massive data which are now common in biology and medicine – so this is stuff that matters.

The OBO Foundry Principles, a sort of Ten Commandments of Ontology (or Obology if you prefer) currently look something like this (copied directly from obofoundry.org/crit.shtml):

  1. The ontology must be open and available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed under the original name or with the same identifiers.The OBO ontologies are for sharing and are resources for the entire community. For this reason, they must be available to all without any constraint or license on their use or redistribution. However, it is proper that their original source is always credited and that after any external alterations, they must never be redistributed under the same name or with the same identifiers.
  2. The ontology is in, or can be expressed in, a common shared syntax. This may be either the OBO syntax, extensions of this syntax, or OWL. The reason for this is that the same tools can then be usefully applied. This facilitates shared software implementations. This criterion is not met in all of the ontologies currently listed, but we are working with the ontology developers to have them available in a common OBO syntax.
  3. The ontologies possesses a unique identifier space within the OBO Foundry. The source of a term (i.e. class) from any ontology can be immediately identified by the prefix of the identifier of each term. It is, therefore, important that this prefix be unique.
  4. The ontology provider has procedures for identifying distinct successive versions.
  5. The ontology has a clearly specified and clearly delineated content. The ontology must be orthogonal to other ontologies already lodged within OBO. The major reason for this principle is to allow two different ontologies, for example anatomy and process, to be combined through additional relationships. These relationships could then be used to constrain when terms could be jointly applied to describe complementary (but distinguishable) perspectives on the same biological or medical entity. As a corollary to this, we would strive for community acceptance of a single ontology for one domain, rather than encouraging rivalry between ontologies.
  6. The ontologies include textual definitions for all terms. Many biological and medical terms may be ambiguous, so terms should be defined so that their precise meaning within the context of a particular ontology is clear to a human reader.
  7. The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
  8. The ontology is well documented.
  9. The ontology has a plurality of independent users.
  10. The ontology will be developed collaboratively with other OBO Foundry members.

ResearchBlogging.orgI’ve been asking all my frolleagues what they think of these principles and have got some lively responses, including some here from Allyson Lister, Mélanie Courtot, Michel Dumontier and Frank Gibson. So what do you think? How could these guidelines be improved? Do you have any specific (and preferably constructive) criticisms of these ambitious (and worthy) goals? Be bold, be brave and be polite. Anything controversial or “off the record” you can email it to me… I’m all ears.

CC-licensed picture above of the Old Smithy (pub) by Loop Oh. Inspired by Michael Ashburner‘s standing OBO joke (Ontolojoke) which goes something like this: Because Barry Smith is one of the leaders of OBO, should the project be called the OBO Smithy or the OBO Foundry? 🙂

References

  1. Noy, N., Shah, N., Whetzel, P., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D., Storey, M., Chute, C., & Musen, M. (2009). BioPortal: ontologies and integrated data resources at the click of a mouse Nucleic Acids Research DOI: 10.1093/nar/gkp440
  2. Côté, R., Jones, P., Apweiler, R., & Hermjakob, H. (2006). The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries BMC Bioinformatics, 7 (1) DOI: 10.1186/1471-2105-7-97
  3. Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L., Eilbeck, K., Ireland, A., Mungall, C., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S., Scheuermann, R., Shah, N., Whetzel, P., & Lewis, S. (2007). The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration Nature Biotechnology, 25 (11), 1251-1255 DOI: 10.1038/nbt1346
  4. Smith, B., Ceusters, W., Klagges, B., Köhler, J., Kumar, A., Lomax, J., Mungall, C., Neuhaus, F., Rector, A., & Rosse, C. (2005). Relations in biomedical ontologies Genome Biology, 6 (5) DOI: 10.1186/gb-2005-6-5-r46
  5. Bada, M., & Hunter, L. (2008). Identification of OBO nonalignments and its implications for OBO enrichment Bioinformatics, 24 (12), 1448-1455 DOI: 10.1093/bioinformatics/btn194

June 2, 2009

Who Are You? Digital Identity in Science

The Who by The WhoThe organisers of the Science Online London 2009 conference are asking people to propose their own session ideas (see some examples here), so here is a proposal:

Title: Who Are You? Digital Identity in Science

Many important decisions in Science are based on identifying scientists and their contributions. From selecting reviewers for grants and publications, to attributing published data and deciding who is funded, hired or promoted, digital identity is at the heart of Science on the Web.

Despite the importance of digital identity, identifying scientists online is an unsolved problem [1]. Consequently, a significant amount of scientific and scholarly work is not easily cited or credited, especially digital contributions: from blogs and wikis, to source code, databases and traditional peer-reviewed publications on the Web. This (proposed) session will look at current mechanisms for identifying scientists digitally including contributor-id (CrossRef), researcher-id (Thomson), Scopus Author ID (Elsevier), OpenID, Google Scholar [2], Single Sign On, PubMed, Google Scholar [2], FOAF+SSL, LinkedIn, Shared Identifiers (URIs) and the rest. We will introduce and discuss each via a SWOT analysis (Strengths, Weaknesses, Opportunities and Threats). Is digital identity even possible and ethical? Beside the obvious benefits of persistent, reliable and unique identifiers, what are the privacy and security issues with personal digital identity?

If this is a successful proposal, I’ll need some help. Any offers? If you are interested in joining in the fun, more details are at scienceonlinelondon.org

References

  1. Bourne, P., & Fink, J. (2008). I Am Not a Scientist, I Am a Number PLoS Computational Biology, 4 (12) DOI: 10.1371/journal.pcbi.1000247
  2. Various Publications about unique author identifiers bookmarked in citeulike
  3. Yours Truly (2009) Google thinks I’m Maurice Wilkins
  4. The Who (1978) Who Are You? Who, who, who, who? (Thanks to Jan Aerts for the reference!)

Michael Ley on Digital Bibliographies

Michael Ley

Michael Ley is visiting Manchester this week, he will be doing a seminar on Wednesday 3rd June, here are some details for anyone who is interested in attending:

Date: 3rd Jun 2009

Title: DBLP: How the data get in

Speaker: Dr Michael Ley. University of Trier, Germany

Time & Location: 14:15, Lecture Theatre 1.4, Kilburn Building

Abstract: The DBLP (Digital Bibliography & Library Project) Computer Science Bibliography now includes more than 1.2 million bibliographic records. For Computer Science researchers the DBLP web site now is a popular tool to trace the work of colleagues and to retrieve bibliographic details when composing the lists of references for new papers. Ranking and profiling of persons, institutions, journals, or conferences is another usage of DBLP. Many scientists are aware of this and want their publications being listed as complete as possible.

The talk focuses on the data acquisition workflow for DBLP. To get ‘clean’ basic bibliographic information for scientific publications remains a chaotic puzzle.

Large publishers are either not interested to cooperate with open services like DBLP, or their policy is very inconsistent. In most cases they are not able or not willing to deliver basic data required for DBLP in a direct way, but they encourage us to crawl their Web sites. This indirection has two main problems:

  1. The organisation and appearance of Web sites changes from time to time, this forces a reimplementation of information extraction scripts. [1]
  2. In many cases manual steps are necessary to get ‘complete’ bibliographic information.

For many small information sources it is not worthwhile to develop information extraction scripts. Data acquisition is done manually. There is an amazing variety of small but interesting journals, conferences and workshops in Computer Science which are not under the umbrella of ACM, IEEE, Springer, Elsevier etc. How they get it often is decided very pragmatically.

The goal of the talk and my visit to Manchester is to start a discussion process: The EasyChair conference management system developed by Andrei Voronkov and DBLP are parts of scientific publication workflow. They should be connected for mutual benefit?

References

  1. Lincoln Stein (2002). Creating a bioinformatics nation: screen scraping is torture Nature, 417 (6885), 119-120 DOI: 10.1038/417119a
« Previous PageNext Page »

Blog at WordPress.com.