O'Really?

July 15, 2010

How many journal articles have been published (ever)?

Fifty Million and Fifty Billion by ZeroOne

According to some estimates, there are fifty million articles in existence as of 2010. Picture of a fifty million dollar note by ZeroOne on Flickr.

Earlier this year, the scientific journal PLoS ONE published their 10,000th article. Ten thousand articles is a lot of papers especially when you consider that PLoS ONE only started publishing four short years ago in 2006. But scientists have been publishing in journals for at least 350 years [1] so it might make you wonder, how many articles have been published in scientific and learned journals since time began?

If we look at PubMed Central, a full-text archive of journals freely available to all – PubMedCentral currently holds over 1.7 million articles. But these articles are only a tiny fraction of the total literature – since a lot of the rest is locked up behind publishers paywalls and is inaccessible to many people.

PubMed, a freely available index of biomedical abstracts published by the National Center for Biotechnology Information has a collection of more than 19 million citations.  Nineteen million is hard to comprehend but around one paper per minute* is added to this database (on average) and this is an easier number to understand. But even this enormous database excludes large swathes of published articles in Physics, Mathematics, Chemistry, Engineering and Computer Science not deemed “worthy” of indexing by the United States National Library of Medicine. Neither does it include all the humanities publications – PubMed is not the world ®.

Next up, Scopus, a subscription-only database of journals covers a wider range of literature than PubMed and currently claims to have indexed over 40 million records. A rival of Scopus, ISI Web of Knowledge (WOK), claims to be a similar size with 40 millon items. But again like PubMed, Scopus and WOK are not the world. Google Scholar which is currently trying to take over the world, indexes all this data too, but they don’t say how big their index is [2].

Finally, Arif Jinha at the University of Ottawa has recently estimated that the number of journal articles published since time began is about 50 million [3]. This estimate is based on what has been published since 1665 when the journal Philosophical Transactions of the Royal Society first started. The 50 million article estimate has recently been published in Learned Publishing [3]. It’s debateable how accurate the estimate is, since its actually quite tricky to work out, but  50 million is still a big number to get your head around. Perhaps its easier to think of 50 million papers like this:

So how many journal articles have been published (ever)? It depends on what you mean by “journal”, “article”, “published” and “ever” – and these terms are taking on new meanings since the invention of the Web. But for the definitions used in [3] an estimate of 50 million seems reasonable, plus or minus a few million.

[update: see commentary on this article via bit.ly. Also available in German at 50 Millionen Artikel. Please note these stats were correct at the time of writing but will obviously change over time.]

References

  1. Henry Oldenburg (1665). Epistle Dedicatory Philosophical Transactions of the Royal Society of London, 1 (1-22) DOI: 10.1098/rstl.1665.0001
  2. Peter Jacsó (2010). Metadata mega mess in Google Scholar Online Information Review, 34 (1), 175-191 DOI: 10.1108/14684521011024191
  3. Arif Jinha (2010). Article 50 million: an estimate of the number of scholarly articles in existence Learned Publishing, 23 (3), 258-263 DOI: 10.1087/20100308 free pre-print available from author here

[Creative Commons licensed picture of the interior of the Bibliotheca Alexandrina by Mindy McAdams on Flickr]

* One paper per minute is based on 679,858 papers per year in 2009 / 365 days / 24 hours / 60 minutes = 1.29 papers per minute. The PubMed database isn’t updated that frequently, but if it was, there would be 1.29 papers added per minute according to MEDLINE statistics.

23 Comments »

  1. The question is indeed what is relevant, and even more important: where is all that published information accessible? That’s why it is much more important to organize published knowledge in structured databases. Imagine published DNA sequences without searchable databases!

    Another example, biodiversity: there are on the order of 5-10 million species on the planet, but only 1.3 million have been indexed in non-redundant (!) databases. I just saw a list of about 100 specialized herpetological journals that deal ONLY with reptiles and/or amphibians (about 15,000 out of the 60,000 vertebrate species). How much of the information published in these journals is available through databases? Hardly any and much of natural history information is just buried somewhere in some obscure journals, so this data is simply dead as long as nobody organizes it.

    P.

    Comment by Peter Uetz — July 15, 2010 @ 6:23 pm | Reply

  2. Duncan — interesting collection of stats. Don’t forget CrossRef — 41,913,980 DOIs registered to date the majority journal articles, but some book items. So 50m isn’t a bad guesstimate. Would make a good infographic. (Still only one paper per 10 people on Facebook…)
    Richard

    Comment by Richard O'Beirne — July 15, 2010 @ 7:02 pm | Reply

    • Hi Richard, thanks for the stats on DOI’s I didn’t realise there were quite so many, DOI’s have only been around for about ten years? The trouble with DOIs they can identify things that aren’t articles, e.g. they are used for figure id’s in PLoS, plain old data and slighly dodgy pre-prints on Nature Precedings.

      And yes, Facebook dwarfs twitter :-)

      Comment by Duncan — July 16, 2010 @ 10:55 am | Reply

  3. The staggering fact among the figures here is not how many journal articles have been published during the last 350 years, but rather, how many tweets there are in a single day! My god… I had no idea.

    Comment by Rebecca Lew — July 15, 2010 @ 11:05 pm | Reply

  4. I think that there’s a curve in Derek J DeSolla Price’s Big Science Little Science – I’m not sure if that’s an estimate or based on data. The number of new articles was doubling at some rate at that time – I’ll look it up. Since more recent information is easier to find (say from 1900 to current), then this would provide an estimate of the older. The other problem is the smaller, regional, and non-English journals. Count them?

    Comment by Christina Pikas — July 16, 2010 @ 3:16 am | Reply

    • Christina. The lit review and methods section of my paper (Article 50 million) addresses some of these questions. DeSolla’s work of 1965 was conceptually awesome and groundbreaking, but the curve leads to a figure for the year 2000 (for journals) which turned out to be way too high. Like mine it was based on best available data and assumptions. Probably we will have more data and more sensitive tools in the future. With the current understanding of the growth curve, we have doubling time of the total of 24 years for the grand total, so by 2034, we should have 100million articles, but it will just take one article in the meantime to correct my estimate!

      Comment by Arif Jinha — February 26, 2011 @ 2:40 pm | Reply

  5. […] How many journal articles have been published (ever)?: http://duncan.hull.name/2010/07/15/fifty-million/ […]

    Pingback by Things I’m Reading Lately « Emerging Technologies Librarian — July 20, 2010 @ 3:58 am | Reply

  6. See also, Google’s estimates of books: http://booksearch.blogspot.com/2010/08/books-of-world-stand-up-and-be-counted.html

    Comment by quiddity — August 6, 2010 @ 4:10 am | Reply

  7. There are two problems with this vast amount of literature.

    The first – and most obvious – is the sheer size. A megabyte of documents can be read. A gigabyte can be skimmed (if you want to have a life).

    A year ago, I read an article that talked about the petabyte age. This spring I read the first article on the exabyte age. Last week, I read that in 2012, the world will have 2,500 exabytes … welcome to the zettabyte age.

    The point is that all the “ages” have no meaning for human beings. We are in the n-byte age, now and forever.

    We will need tools to work this data over. That is a pipes and plumbing problem.

    The second point is more subtle. All the information fragments in a document are arranged according to the viewpoint of the author.

    That is very powerful because it has the ability to prejudice information. Fragments are not shown as independent entities but arranged in a sequence to illustrate a view point.

    When we read such documents, the human action is to extract fragments from their context and recombine them to form new associations and chains of thought.

    The second set of tools will correlation tools.

    The combination of the n-byte age and correlation tools will lead to new ways of traversing knowledge.

    Correlation as causation will become accepted. The way of the future will beckon humans to become info adventurers and info explorers, not merely memorizers of tomes written long ago.

    Comment by Carl Wimmer — September 29, 2010 @ 7:38 am | Reply

  8. […] Twenty million citations: That’s a lot of data and it’s growing at a rate of about one paper per minute (on average). […]

    Pingback by PubMed: a success or a tragedy? « Science Intelligence and InfoPros — December 17, 2010 @ 10:37 pm | Reply

  9. […] first appear to be. Firstly, there’s just the law of large numbers, with an estimated that 50 million such papers having been published since anybody started tracking these things, around 1665. Then […]

    Pingback by The Science Of Manipulation: New Study Comparing Underage Drinking Riddled With Problems | Brookston Beer Bulletin — April 30, 2011 @ 1:23 am | Reply

  10. I just wrote a post on Health, science, knowledge, access and elitism: Lawrence Lessig and science as remix culture – sparked by Larry Lessig’s talk at CERN on “The Architecture of Access to Scientific Knowledge: Just How Badly We Have Messed This Up”, in which he notes, among other things:

    The thing to recognize is that we built this world, we built this architecture for access. This flows from the deployment of copyright, but here, copyright to benefit publishers, not to enable authors. Not one of these authors gets money from copyright, not one of them wants the distribution of their articles limited, not one of them has a business model that turns upon restricting access to their work, not one of them should support this system.

    As a knowledge policy, for the creators of this knowledge, this is crazy.

    I’d be interested to know what proportion of journal articles are accessible online, and what proportion of those are freely accessible online. I realize your data goes back far before the Internet was created, but I’m wondering whether the growth in production is being accompanied by a growth in public / free access, or whether this is simply creating more silos.

    Comment by Joe McCarthy — April 30, 2011 @ 11:50 pm | Reply

    • Hi Joe, just looking at UK PubMed Central (a small fraction of that 50 million) it is encouraging to see that an increasing amount of content in UKPMC is fully open access. We’ve obviously a long way to go though…

      Comment by Duncan — May 1, 2011 @ 8:59 am | Reply

    • Hi Joe, Nice to see you over here! I’m one of your readers on TypePad (Curious Ellie). I have a similar concern, plus one other. Regarding public access, it is indeed encouraging about the extent of open access on the UK’s PubMed Central, as Duncan mentions. In the U.S.A., I’m not certain whether our PubMed does as well or not.

      In addition to the siloing effect your describe, I’ve been noticing the proliferation of academic publishers who are now offering their own citation systems. As well as efforts at standardized systems like Mendeley. Doesn’t seem like there is a need for so many, and that it will just cause confusion and dilution of any potential benefit of collaboration.

      Comment by Ellie K — May 28, 2011 @ 2:43 pm | Reply

  11. […] Pubmed alone has 21 million articles, adding an average of one every minute, and as Duncan Hull points out, it concentrates on biomedical literature so a huge number of physics, mathematics, chemistry, […]

    Pingback by OECD Science, Technology and Industry Scoreboard: Innovation and Growth in Knowledge Economies « OECD Insights Blog — September 22, 2011 @ 1:00 pm | Reply

  12. […] – http://duncan.hull.name/2010/07/15/fifty-million/ – Article 50 million: an estimate of the number of scholarly articles in existence. Jinha, Arif E. […]

    Pingback by How many science journals? « Science Intelligence and InfoPros — January 17, 2012 @ 11:02 pm | Reply

  13. […] [1] ‘How many journal articles have been published (ever)?’, O’Really?, 15 July 2010,  http://duncan.hull.name/2010/07/15/fifty-million/ [accessed 02 October […]

    Pingback by Transition Complete? – Daryl Yang | RLUK Redefining the Research Library Model — October 26, 2012 @ 2:16 pm | Reply

  14. […] Who does the research?  Often it’s simply those of us who disregard the warning that curiosity killed the cat.  Not only are they curious, but they take the time to be trained, find funding, and have the will power to dedicate themselves wholeheartedly to their little sliver of the scientific community.  These scientists must focus all of that energy on one specialized field when their intellect and interests might try to tug them in many directions.  For those of us who are not professional scientists, but who still have the inquisitive nature of one, there is a bit of a reprieve.  We have the opportunity to learn as much as we want about many areas of science (albiet not in as much depth as the experts).  Unfortunately, many of us are unaware of how to access papers written about the newest discoveries, much less how to wade through the jargon-filled prose that constitutes a scientific publication.  Once we do find those articles, we need to wade through the ocean of over 50 million publications.  (http://duncan.hull.name/2010/07/15/fifty-million/) […]

    Pingback by Welcome! | Synaptic Speculations — October 31, 2012 @ 7:16 pm | Reply

  15. […] the number of papers published in biomedicine, estimated two year ago to be more than one a minute (http://duncan.hull.name/2010/07/15/fifty-million/) […]

    Pingback by day 8 the research-driven part of "multi-disciplinary research-driven care" | Entering a World of Pink — February 23, 2013 @ 8:34 pm | Reply

  16. […] indexes over 22 million citations, up from 20 million in mid 2010 (see two interesting posts [1, 2] by Duncan Hull on pros/cons of this, and how many journal articles […]

    Pingback by Michael Bell » Citation reuse in UniProtKB — March 19, 2013 @ 11:50 am | Reply

  17. […] better meta-analysis, which is an utterly vital part of research, especially in medicine. There are tens of millions of articles in the research literature, yet only a small percentage report enough information to enable meta-analysis. I’ve read […]

    Pingback by Shaunagm.net » Improving trust in science — June 5, 2013 @ 7:52 pm | Reply

  18. […] 1: There are over 50 million scientific papers and over 80 million patent […]

    Pingback by The Rise of the Innovation Economy – Part 1 | Deltasight — October 9, 2013 @ 6:18 pm | Reply

  19. […] lot of what is produced, but it’s hard to know — or even guess — how many of the more than 50 million scientific articles that have been published are included […]

    Pingback by How the internet can make knowledge disappear and 2 ways to stop it — October 13, 2013 @ 11:16 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,595 other followers

%d bloggers like this: