June 18, 2013

Peter Suber’s Open Access book is now freely available under an open-access license

Peter Suber's Open Access book

Open Access by Peter Suber is now open access

If you never got around to buying Peter Suber’s book about Open Access (OA) publishing [1] “for busy people”, you might be pleased to learn that it’s now freely available under an open-access license.

One year after being published in dead-tree format, you can now get the whole digital book for free. There’s not much point writing yet another review of it [1], see Peter’s extensive collection of reviews at cyber.law.harvard.edu. The book succinctly covers:

  1. What Is Open Access? (and what it is not)
  2. Motivation: OA as solving problems and seizing opportunities
  3. Varieties: Green and Gold, Gratis versus libre 
  4. Policies: Funding mandates (NIH, Wellcome Trust etc)
  5. Scope: Pre-prints and post-prints
  6. Copyright: … or Copyfight?
  7. Economics: Who pays the bills? Publication fees, toll-access paywalls and “author pays”
  8. Casualties: “OA doesn’t threaten publishing; it only threatens existing publishers who do not adapt”
  9. Future: Where next?
  10. Self-Help: DIY publishing

Open Access for MACHINES!

A lot of the (often heated) debate about Open Access misses an important point about open access being for machines as well as humans, or as Suber puts in Chapter 5 on Scope:

We also want access for machines. I don’t mean the futuristic altruism in which kindly humans want to help curious machines answer their own questions. I mean something more selfish. We’re well into the era in which serious research is mediated by sophisticated software. If our machines don’t have access, then we don’t have access. Moreover, if we can’t get access for our machines, then we lose a momentous opportunity to enhance access with processing.

Think about the size of the body of literature to which you have access, online and off. Now think realistically about the subset to which you’d have practical access if you couldn’t use search engines, or if search engines couldn’t index the literature you needed.

Information overload didn’t start with the internet. The internet does vastly increase the volume of work to which we have access, but at the same time it vastly increases our ability to find what we need. We zero in on the pieces that deserve our limited time with the aid of powerful software, or more precisely, powerful software with access. Software helps us learn what exists, what’s new, what’s relevant, what others find relevant, and what others are saying about it. Without these tools, we couldn’t cope with information overload. Or we’d have to redefine “coping” as artificially reducing the range of work we are allowed to consider, investigate, read, or retrieve.

It’s refreshing to see someone making these points that are often ignored, forgotten or missed out of the public debate about Open Access. The book is available in various digital flavours including:


January 18, 2013

How to export, delete and move your Mendeley account and library #mendelete


Delete. Creative Commons licensed picture by Vitor Sá – Virgu via Flickr.com

News that Reed Elsevier is in talks to buy Mendeley.com will have many scientists reaching for their “delete account” button. Mendeley has built an impressive user-base of scientists and other academics since they started, but the possibility of an Elsevier takeover has worried some of its users. Elsevier has a strained relationship with some groups in the scientific community [1,2], so it will be interesting to see how this plays out.

If you’ve built a personal library of scientific papers in Mendeley, you won’t just want to delete all the data, you’ll need to export your library first, delete your account and then import it into a different tool.

Disclaimer: I’m not advocating that you delete your mendeley account (aka #mendelete), just that if you do decide to, here’s how to do it, and some alternatives to consider. Update April 2013, it wasn’t just a rumour.

Exporting your Mendeley library

Open up Mendeley Desktop, on the File menu select Export. You have a choice of three export formats:

  1. BibTeX (*.bib)
  2. RIS – Research Information Systems (*.ris)
  3. EndNote XML (*.xml)

It is probably best to create a backup in all three formats just in case as this will give you more options for importing into whatever you replace Mendeley with. Another possibility is to use the Mendeley API to export your data which will give you more control over how and what you export, or trawl through the Mendeley forums for alternatives. [update: see also comments below from William Gunn on exporting via your local SQLite cache]

Deleting your Mendeley account #mendelete

Login to Mendeley.com, click on the My Account button (top right), Select Account details from the drop down menu and scroll down to the bottom of the page and click on the link delete your account. You’ll be see a message We’re sorry you want to go, but if you must… which you can either cancel or select Delete my account and all my data. [update] To completely delete your account you’ll need to send an email to privacy at mendeley dot com. (Thanks P.Chris for pointing this out in the comments below)

Alternatives to Mendeley

Once you have exported your data, you’ll need an alternative to import your data into. Fortunately, there are quite a few to choose from [3], some of which are shown in the list below. This is not a comprehensive list, so please add suggestions below in the comments if I missed any obvious ones. Wikipedia has an extensive article which compares all the different reference management software which is quite handy (if slightly bewildering). Otherwise you might consider trying the following software:

One last alternative, if you are fed up with trying to manage all those clunky pdf files, you could just switch to Google Scholar which is getting better all the time. If you decide that Mendeley isn’t your cup of tea, now might be a good time to investigate some alternatives, there are plenty of good candidates to choose from. But beware, you may run from the arms of one large publisher (Elsevier) into the arms of another (Springer or Macmillan which own Papers and ReadCube respectively).


August 20, 2012

Digital Research 2012: September 10th-12th at St. Catherine’s College, Oxford, UK

The Radcliffe Camera, Oxford by chensiyuan

The Radcliffe Camera, Oxford by chensiyuan via wikipedia

The UK’s premier Digital Research community event is being held in Oxford 10-12 September 2012. Come along to showcase and share the latest in digital research practice – and set the agenda for tomorrow at Digital Research 2012. The conference features an exciting 3-day programme with a great set of invited speakers together with showcases of the work and vision of the Digital Research community. Here are some highlights of the programme – please see the website digital-research.oerc.ox.ac.uk for the full programme and registration information.

New Science of New Data Symposium and Innovation Showcase  on Monday 10th: Keynotes from Noshir Contractor [1] (Northwestern University) on Web Science, Nigel Shadbolt (Government Information Adviser) on Open Data and a closing address by Kieron O’Hara (computer scientist) – with twitter analytics, geolocated social media and web observatories in between. Also the launch of the Software Sustainability Institute’s Fellows programme and community workshops.

Future of Digital Research on Tuesday 11th: Keynotes from Stevan Harnad on “Digital Research: How and Why the Research Councils UK Open Access Policy Needs to Be Revised” [2], Jim Hendler (Rensselaer Polytechnic Institute) on “Broad Data” (not just big!), and Lizbeth Goodman (University College Dublin) on “SMART spaces by and for SMART people”. Sessions are themed on Open Science with a talk by Peter Murray-Rust, Smart Spaces as a Utility and future glimpses from the community, all culminating in a Roundtable discussion on the Future of Digital Research.

e–Infrastructure Forum and Innovation Showcase on Wednesday 12th opens with a dual-track community innovation showcase, then launch the UK e-Infrastructure Academic Community Forum where Peter Coveney (UK e-Infrastructure Leadership Council and University College London) will present the “state of the nation” followed by a Provider’s Panel, Software, Training and User’s Panel – an important and timely opportunity for the community to review current progress and determine what’s needed in the future.

There’s a lot more happening throughout the event, including an exciting “DevChallenge” hackathon run by DevCSI, software surgery by the Software Sustainability Institute (SSI) and multiple community workshops – plus the Digital Research 2012 dinner in College and a reception in the spectacular Museum of Natural History in Oxford. Digital Research 2012 is very grateful to everyone who has come together to make this event possible, including e-Research South, Open Knowledge Foundation, Web Science, the Digital Social Research programme, our Digital Economy colleagues and the All Hands Foundation.

We look forward to seeing you at Digital Research 2012 in Oxford in September.


May 11, 2012

Journal Fire: Bonfire of the Vanity Journals?

Fire by John Curley on Flickr

Fire by John Curley, available via Creative Commons license.

When I first heard about Journal Fire, I thought, Great! someone is going to take all the closed-access scientific journals and make a big bonfire of them! At the top of this bonfire would be the burning effigy of a wicker man, representing the very worst of the vanity journals [1,2].

Unfortunately Journal Fire aren’t burning anything just yet, but what they are doing is something just as interesting. Their web based application allows you to manage and share your journal club online. I thought I’d give it a whirl because a friend of mine asked me what I thought about a paper on ontologies in biodiversity [3]. Rather than post a brief review here, I’ve posted it over at Journal Fire. Here’s some initial thoughts on a quick test drive of their application:


On the up side Journal Fire:

  • Is a neutral-ish third party space where anyone can discuss scientific papers.
  • Understands common identifiers (DOI and PMID) to tackle the identity crisis.
  • Allows you to post simple anchor links in reviews, but not much else, see below.
  • Does not require you to use cumbersome syntax used in ResearchBlogging [4], ScienceSeeker and elsewhere
  • Is integrated with citeulike, for those that use it
  • It can potentially provide many different reviews of a given paper in one place
  • Is web-based, so you don’t have to download and install any software, unlike alternative desktop systems Mendeley and Utopia docs


On the down side Journal Fire:

  • Is yet another piece social software for scientists. Do we really need more, when we’ve had far too many already?
  • Requires you to sign up for an account without  re-using your existing digital identity with Google, Facebook, Twitter etc.
  • Does not seem to have many people on it (yet) despite the fact it has been going since at least since 2007.
  • Looks a bit stale, the last blog post was published in 2010. Although the software still works fine, it is not clear if it is being actively maintained and developed.
  • Does not allow much formatting in reviews besides simple links, something like markdown would be good.
  • Does not understand or import arXiv identifiers, at the moment.
  • As far as I can see, Journal Fire is a small startup based in Pasadena, California. Like all startups, they might go bust. If this happens, they’ll take your journal club, and all its reviews down with them.

I think the pros mostly outweigh the cons, so if you like the idea of a third-party hosting your journal club, Journal Fire is worth a trial run.


February 15, 2012

The Open Access Irony Awards: Naming and shaming them

Ask me about open access by mollyaliOpen Access (OA) publishing aims to make the results of scientific research available to the widest possible audience. Scientific papers that are published in Open Access journals are freely available for crucial data mining and for anyone or anything to read, wherever they may be.

In the last ten years, the Open Access movement has made huge progress in allowing:

“any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers.”

But there is still a long way to go yet, as much of the world’s scientific knowledge remains locked up behind publisher’s paywalls, unavailable for re-use by text-mining software and inaccessible to the public, who often funded the research through taxation.

Openly ironic?

ironicIronically, some of the papers that are inaccessible discuss or even champion the very Open Access movement itself. Sometimes the lack of access is deliberate, other times accidental – but the consequences are serious. Whether deliberate or accidental, restricted access to public scientific knowledge is slowing scientific progress [1]. Sometimes the best way to make a serious point is to have a laugh and joke about it. This is what the Open Access Irony Awards do, by gathering all the offenders in one place, we can laugh and make a serious point at the same time by naming and shaming the papers in question.

To get the ball rolling, here is are some examples:

  • The Lancet owned by Evilseviersorry I mean Elsevier, recently  published a paper on “the case for open data” [2] (please login to access article). Login?! Not very open…
  • Serial offender and über-journal Science has an article by Elias Zerhouni on the NIH public access policy [3] (Subscribe/Join AAAS to View Full Text), another on “making data maximally available” [4] (Subscribe/Join AAAS to View Full Text) and another on a high profile advocate of open science [5] (Buy Access to This Article to View Full Text) Irony of ironies.
  • From Nature Publishing Group comes a fascinating paper about harnessing the wisdom of the crowds to predict protein structures [6]. Not only have members of the tax-paying public funded this work, they actually did some of the work too! But unfortunately they have to pay to see the paper describing their results. Ironic? Also, another published in Nature Medicine proclaims the “delay in sharing research data is costing lives” [1] (instant access only $32!)
  • From the British Medical Journal (BMJ) comes the worrying news of dodgy American laws that will lock up valuable scientific data behind paywalls [7] (please subscribe or pay below). Ironic? *
  • The “green” road to Open Access publishing involves authors uploading their manuscript to self-archive the data in some kind of  public repository. But there are many social, political and technical barriers to this, and they have been well documented [8]. You could find out about them in this paper [8], but it appears that the author hasn’t self-archived the paper or taken the “gold” road and pulished in an Open Access journal. Ironic?
  • Last, but not least, it would be interesting to know what commercial publishers make of all this text-mining magic in Science [9], but we would have to pay $24 to find out. Ironic?

These are just a small selection from amongst many. If you would like to nominate a paper for an Open Access Irony Award, simply post it to the group on Citeulike or group on Mendeley. Please feel free to start your own group elsewhere if you’re not on Citeulike or Mendeley. The name of this award probably originated from an idea Jonathan Eisen, picked up by Joe Dunckley and Matthew Cockerill at BioMed Central (see tweet below). So thanks to them for the inspiration.

For added ironic amusement, take a screenshot of the offending article and post it to the Flickr group. Sometimes the shame is too much, and articles are retrospectively made open access so a screenshot will preserve the irony.

Join us in poking fun at the crazy business of academic publishing, while making a serious point about the lack of Open Access to scientific data.


[CC licensed picture "ask me about open access" by mollyali.]

* Please note, some research articles in BMJ are available by Open Access, but news articles like [7] are not. Thanks to Trish Groves at BMJ for bringing this to my attention after this blog post was published. Also, some “articles” here are in a grey area for open access, particularly “journalistic” stuff like news, editorials and correspondence, as pointed out by Becky Furlong. See tweets below…

January 21, 2010

Blogging a Book about Bio-Ontologies

Waterloo Station Ultrawide Panoramic by Tim NugentIf you wanted to write a guide to Biomedical and Biological Ontologies [1], especially the what, why, when, how, where and who, there are at least three choices for publishing your work:

  1. Journal publishing in your favourite scientific journal.
  2. Book publishing with your favourite academic or technical publisher.
  3. Self publishing on a web blog with your favourite blogging software.

Each of these has its own unique problems:

  • The trouble with journals is that they typically don’t publish “how to” guides, although you might be able to publish some kind of review.
  • The trouble with books, and academic books in particular, is that people (and machines) often don’t read them. Also, academic books can be prohibitively expensive to buy and this can make the data inside them less visible and accessible to the widest audience. Unfortunately all that lovely knowledge gets locked up behind publishers paywalls. To add insult to injury, most academic books take a very long time to publish, often several years. By the time of printing, the content of many academic books is often very dated.
  • The trouble with blogs, they aren’t peer-reviewed in the traditional way and they tend to be written by a single person from a not very neutral point of view. Or as Dave once put it “vanity publishing for arrogant people with an inflated ego“. Ouch.

So the people behind the Ontogenesis network (Robert Stevens and Phillip Lord with funding from the EPSRC grant ref: EP/E021352/1) had an idea. Why not blog a book about Ontology? As a publishing experiment – it might just work by combining the merits of books and blogs together in order to overcome their shortcomings. This will involve getting a small group of about twenty people (mostly bio-ontologists) together, and writing about what an ontology is, why you would want to a biomedical ontology, how to build one and so on. We will be doing some of the peer-review online too.

As part of an ongoing experiment, we are posting all this information on a blog called http://ontogenesis.knowledgeblog.org if you’d like to follow, subscribe to the feed and read the manifesto.


[Ultrawide panoramic picture of Waterloo station by Tim Nugent]

September 18, 2009

Popular, personal and public data: Article-level metrics at PLoS

PLoS: The Public Library of ScienceThe Public Library of Science (PLoS) is a non-profit organisation committed to making the world’s scientific and medical literature freely accessible to everyone via open access publishing. As recently announced they have just published the first article-level metrics (e.g. web server logs and related information) for all articles in their library. This is novel, interesting and potentially useful data, not currently made publicly available by other publishers. Here is a  selection of some of the data, taken from the full dataset here (large file), which includes the “top ten” papers by viewing statistics.

Article level metrics for some papers published in PLoS (August 2009)

Rank* Article Journal Views Citations**
1 Why Most Published Research Findings Are False (including this one?) [1] PLoS Medicine 232847 52
2 Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration [2] PLoS Medicine 182305 15
3 Serotonin and Depression: A Disconnect between the Advertisements and the Scientific Literature [3] PLoS Medicine 105498 16
4 The Diploid Genome Sequence of an Individual Human [4] PLoS Biology 88271 54
5 Ultrasonic Songs of Male Mice [5] PLoS Biology 81331 8
6 Complete Primate Skeleton from the Middle Eocene of Messel in Germany: Morphology and Paleobiology [6] PLoS ONE 62449 0
7 The Impact Factor Game: It is time to find a better way to assess the scientific literature [7] PLoS Medicine 61353 13
8 A Map of Recent Positive Selection in the Human Genome [8] PLoS Biology 59512 94
9 Mapping the Structural Core of Human Cerebral Cortex [9] PLoS Biology 58151 8
10 Ten Simple Rules for Getting Published [10] PLoS Computational Biology 57312 1
11 Men, Women, and Ghosts in Science [11] PLoS Biology 56982 0
120 Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web [12] (w00t!) PLoS Computational Biology 16295 3
1500 Specificity and evolvability in eukaryotic protein interaction networks [13] PLoS Computational Biology 4270 7
1632 Comparative genomics and disorder prediction identify biologically relevant SH3 protein interactions [14] PLoS Computational Biology 4063 10
1755 Folding Very Short Peptides Using Molecular Dynamics [15] PLoS Computational Biology 3876 2
2535 Microblogging the ISMB: A New Approach to Conference Reporting [16] PLoS Computational Biology 3055 1
7521 Probing the Flexibility of Large Conformational Changes in Protein Structures through Local Perturbations [17] PLoS Computational Biology 1024 0
12549 Deciphering Proteomic Signatures of Early Diapause in Nasonia [18] PLoS ONE 0 0

*The rank is based on the 12,549 papers for which viewing data (combined usage of HTML + PDF + XML) are available.

**Citation counts are via PubMedCentral (data from CrossRef and Scopus is also provided, see Bora’s comments and commentary at Blue Lab Coats.)

Science is not a popularity contest but…

Analysing this data is not straightforward. Some highly-viewed articles are never cited (reviews, editorial, essays, opinion, etc). Likewise, popularity and importance are not the same thing. Some articles get lots of citations but few views, which suggests that people are not actually reading the papers them before citing them. As described on the PLoS website article-level-metrics.plos.org:

“When looking at Article-Level Metrics for the first time bear the following points in mind:

  • Online usage is dependent on the article type, the age of the article, and the subject area(s) it is in. Therefore you should be aware of these effects when considering the performance of any given article.
  • Older articles normally have higher usage than younger ones simply because the usage has had longer to accumulate. Articles typically have a peak in their usage in the first 3 months and usage then levels off after that.
  • Spikes of usage can be caused by media coverage, usage by large numbers of people, out of control download scripts or any number of other reasons. Without a detailed look at the raw usage logs it is often impossible to tell what the reason is and so we encourage you to regard usage data as indicative of trends, rather than as an absolute measure for any given article.
  • We currently have missing usage data for some of our articles, but we are working to fill the gaps. Primarily this affects those articles published before June 17th, 2005.
  • Newly published articles do not accumulate usage data instantaneously but require a day or two before data are shown.
  • Article citations as recorded by the Scopus database are sometimes undercounted because there are two records in the database for the same article. We’re working with Scopus to correct this issue.
  • All metrics will accrue over time (and some, such as citations, will take several years to accrue). Therefore, recent articles may not show many metrics (other than online usage, which accrues from day one). ”

So all the usual caveats apply when using this bibliometric data. Despite the limitations, it is more revealing than the useful (but simplistic) “highly accesssed” papers at BioMedCentral, which doesn’t always give full information on what “highly” actually means next to each published article. It will be interesting to see if other publishers now follow the lead of PLoS and BioMed Central and also publish their usage data combined with other bibliometric indicators such as blog coverage. For authors publishing with PLoS, this data has an added personal dimension too, it is handy to see how many views your paper has.

As paying customers of the services that commercial publishers provide, should scientists and their funders be demanding more of this kind of information in the future? I reckon they should. You have to wonder, why these kind of innovations have taken so long to happen, but they are a welcome addition.

[More commentary on this post over at friendfeed.]


July 24, 2009

Escape from the impact factor: The Great Escape?

The Great Escape with Steve McQueenQuite by chance, I stumbled on this interesting paper [1] yesterday by Philip Campbell who is the Editor-in-Chief of the scientific über-journal Nature [2]. Here is the abstract:

As Editor-in-Chief of the journal Nature, I am concerned by the tendency within academic administrations to focus on a journal’s impact factor when judging the worth of scientific contributions by researchers, affecting promotions, recruitment and, in some countries, financial bonuses for each paper. Our own internal research demonstrates how a high journal impact factor can be the skewed result of many citations of a few papers rather than the average level of the majority, reducing its value as an objective measure of an individual paper. Proposed alternative indices have their own drawbacks. Many researchers say that their important work has been published in low-impact journals. Focusing on the citations of individual papers is a more reliable indicator of an individual’s impact. A positive development is the increasing ability to track the contributions of individuals by means of author-contribution statements and perhaps, in the future, citability of components of papers rather than the whole. There are attempts to escape the hierarchy of high-impact-factor journals by means of undifferentiated databases of peer-reviewed papers such as PLoS One. It remains to be seen whether that model will help outstanding work to rise to due recognition regardless of editorial selectivity. Although the current system may be effective at measuring merit on national and institutional scales, the most effective and fair analysis of a person’s contribution derives from a direct assessment of individual papers, regardless of where they were published.

It’s well worth reading the views of the editor of an important closed-access journal like Nature, a world champion heavyweight of Impact Factor Boxing. So their view on article-level bibliometrics and novel models of scientific publishing on the Web like PLoS ONE is enlightening. There are some interesting papers in the same issue, which has a special theme on the use and misuse of bibliometric indices in evaluating scholarly performance. Oh, and the article is published in an Open Access Journal too. Is it just me, or is there a strong smell of irony in here?


February 11, 2009

Janet Street-Porter on the Internet Revolution

Janet Street-PortableI’m not much of a fan of Janet Street-Porter, neither am I a regular viewer of the BBC Money programme but right now they are screening an interesting series of three half-hour programmes on the impact of the internet on newspapers, books and television. It’s a familiar tale of the power-and-money struggle between old media and new media that, if the first programme is anything to go by, is worth watching. Here is the blurb from the first episode in the series, billed as Media Revolution: Stop Press?

Former national newspaper editor Janet Street-Porter investigates how papers are coping with falling circulation, advertising revenues and the growth of the internet, and asks if newspapers can survive in their current form. In her quest to discover what the future holds for her beloved newspapers, Janet visits newsrooms, printing plants and even spends a morning as a papergirl. With contributions from national editors, advertising gurus and a rare interview with media mogul Rupert Murdoch, Janet examines if papers can survive as new multimedia information giants.

There are some interesting parallels between the changes described in this programme, and scientific media, especially the scientific journal publishing racket.

Scientific Media Revolution?

The story of the current revolution in scientific and technical publishing is perhaps just as interesting (and more important) than the one being told on the money programme. Just think of it, why scientists publish, the emergence of peer review, how Robert Maxwell made his fortune from the Pergamon Press, the impact factor game, the birth of the Web (in a scientific laboratory), the growth of Google, the copyright wars, open-access publishing, social software, the rise and fall of publishing empires (and technology companies), the vanity journals, scientific blogs and wikis, software showdowns, how all this change affects producers and consumers of science and technology, both now and in the future. A juicy subject, worthy of broadcasting on any media (old or new). You would need a lot more than three half-hour programmes to cover this particular ongoing epic, so who is going to tell that story?

Anyway, the series is worth a look (if you haven’t already seen it) at least according to me  (others disagree see also no paper is the future). It is also available on iPlayer for up to a week after first broadcast – Thursday 5th, 12th and 19th February 2008 – for each episode in the UK only, unless you go through some kind of proxy.

October 14, 2008

Open Access Day: Why It Matters

Open Access Day 14th October 2008Today, Tuesday the 14th of October 2008, is Open Access Day. Like many others, this blog post is joining in by describing why Open Access matters – from a personal point of view. According to the wikipedia article Open Access (OA) is “free, immediate, permanent, full-text, online access, for any user, web-wide, to digital scientific and scholarly material, primarily research articles published in peer-reviewed journals. OA means that any individual user, anywhere, who has access to the Internet, may link, read, download, store, print-off, use, and data-mine the digital content of that article. An OA article usually has limited copyright and licensing restrictions.” What does all this mean and why does it matter? Well, in four question-and-answer points, here goes… (more…)

