connotea | O'Really?

September 1, 2010

How many unique papers are there in Mendeley?

Filed under: data mining,publishing — Duncan Hull @ 10:17 am
Tags: buggotea, citeulike, connotea, DOI, Elsevier, identity, identity crisis, Jan Reichelt, last.fm, Mendeley, More Popular than Jesus, normalisation, PMID, pubmed, redundancy, scopus, Thomson, thomson-reuters, Victor Henning, wok

Mendeley is a handy piece of desktop and web software for managing and sharing research papers [1]. This popular tool has been getting a lot of attention lately, and with some impressive statistics it’s not difficult to see why. At the time of writing Mendeley claims to have over 36 million papers, added by just under half a million users working at more than 10,000 research institutions around the world. That’s impressive considering the startup company behind it have only been going for a few years. The major established commercial players in the field of bibliographic databases (WoK and Scopus) currently have around 40 million documents, so if Mendeley continues to grow at this rate, they’ll be more popular than Jesus (and Elsevier and Thomson) before you can say “bibliography”. But to get a real handle on how big Mendeley is we need to know how many of those 36 million documents are unique because if there are lots of duplicated documents then it will affect the overall head count. (more…)

Comments (29)

April 30, 2010

Daniel Cohen on The Social Life of Digital Libraries

Filed under: biocuration,data mining,publishing — Duncan Hull @ 7:12 am
Tags: Arcadia, Cambridge, citeulike, Clare College, connotea, dancohen, Daniel Cohen, defrosting the digital library, digital library, Firefox, First Monday, George Mason University, GMU, John Naughton, Mekentosj, Mendeley, refworks, scholarometer, Zotero

Daniel Cohen is giving a talk in Cambridge today on The Social Life of Digital Libraries, abstract below:

The digitization of libraries had a clear initial goal: to permit anyone to read the contents of collections anywhere and anytime. But universal access is only the beginning of what may happen to libraries and researchers in the digital age. Because machines as well as humans have access to the same online collections, a complex web of interactions is emerging. Digital libraries are now engaging in online relationships with other libraries, with scholars, and with software, often without the knowledge of those who maintain the libraries, and in unexpected ways. These digital relationships open new avenues for discovery, analysis, and collaboration.

Daniel J. Cohen is an Associate Professor at George Mason University and has been involved in the development of the Zotero extension for the Firefox browser that enables users to manage bibliographic data while doing online research. Zotero [1] is one of many new tools [2] that are attempting to add a social dimension to scholarly information on the Web, so this should be an interesting talk.

If you’d like to come, the talk starts at 6pm in Clare College, Cambridge and you need to RSVP by email via the talks.cam.ac.uk page

References

Cohen, D.J. (2008). Creating scholarly tools and resources for the digital ecosystem: Building connections in the Zotero project. First Monday 13 (8)
Hull, D., Pettifer, S., & Kell, D. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web PLoS Computational Biology, 4 (10) DOI: 10.1371/journal.pcbi.1000204

June 2, 2009

Michael Ley on Digital Bibliographies

Filed under: data mining,defrost,publishing,seminars — Duncan Hull @ 7:43 am
Tags: acm, arxiv, bibliography, citeulike, connotea, DBLP, defrosting the digital library, digital library, Elsevier, google scholar, IEEE, Lincoln Stein, Mendeley, Michael Ley, scopus, screen scraping, Springer, wiley

Michael Ley is visiting Manchester this week, he will be doing a seminar on Wednesday 3rd June, here are some details for anyone who is interested in attending:

Date: 3rd Jun 2009

Title: DBLP: How the data get in

Speaker: Dr Michael Ley. University of Trier, Germany

Time & Location: 14:15, Lecture Theatre 1.4, Kilburn Building

Abstract: The DBLP (Digital Bibliography & Library Project) Computer Science Bibliography now includes more than 1.2 million bibliographic records. For Computer Science researchers the DBLP web site now is a popular tool to trace the work of colleagues and to retrieve bibliographic details when composing the lists of references for new papers. Ranking and profiling of persons, institutions, journals, or conferences is another usage of DBLP. Many scientists are aware of this and want their publications being listed as complete as possible.

The talk focuses on the data acquisition workflow for DBLP. To get ‘clean’ basic bibliographic information for scientific publications remains a chaotic puzzle.

Large publishers are either not interested to cooperate with open services like DBLP, or their policy is very inconsistent. In most cases they are not able or not willing to deliver basic data required for DBLP in a direct way, but they encourage us to crawl their Web sites. This indirection has two main problems:

The organisation and appearance of Web sites changes from time to time, this forces a reimplementation of information extraction scripts. [1]

In many cases manual steps are necessary to get ‘complete’ bibliographic information.

For many small information sources it is not worthwhile to develop information extraction scripts. Data acquisition is done manually. There is an amazing variety of small but interesting journals, conferences and workshops in Computer Science which are not under the umbrella of ACM, IEEE, Springer, Elsevier etc. How they get it often is decided very pragmatically.

The goal of the talk and my visit to Manchester is to start a discussion process: The EasyChair conference management system developed by Andrei Voronkov and DBLP are parts of scientific publication workflow. They should be connected for mutual benefit?

References

Lincoln Stein (2002). Creating a bioinformatics nation: screen scraping is torture Nature, 417 (6885), 119-120 DOI: 10.1038/417119a

May 19, 2009

Defrosting the John Rylands University Library

Filed under: seminars — Duncan Hull @ 4:14 pm
Tags: bbsrc, citeulike, connotea, digital library, dystopia, John Rylands, JRUL, JRULM, nactem, pubmed, REFINE, scopus, utopia

For anyone who missed the original bioinformatics seminar I’ll be doing a repeat of the “Defrosting the Digital Library” talk, this time for the staff in the John Rylands University Library (JRUL) . This is the main academic library in Manchester with (quote) “more than 4 million printed books and manuscripts, over 41,000 electronic journals and 500,000 electronic books, as well as several hundred databases, the John Rylands University Library is one of the best-resourced academic libraries in the country.” The journal subscription budget of the library is currently around £4 million per year, that’s before they’ve even bought any books! Here is the abstract for the talk:

After centuries with little change, scientific libraries have recently experienced massive upheaval. From being almost entirely paper-based, most libraries are now almost completely digital. This information revolution has all happened in less than 20 years and has created many novel opportunities and threats for scientists, publishers and libraries.

Today, we are struggling with an embarrassing wealth of digital knowledge on the Web. Most scientists access this knowledge through some kind of digital library, however these places can be cold, impersonal, isolated, and inaccessible places. Many libraries are still clinging to obsolete models of identity, attribution, contribution, citation and publication.

Based on a review published in PLoS Computational Biology, pubmed.gov/18974831 this talk will discuss the current chilly state of digital libraries for biologists, chemists and informaticians, including PubMed and Google Scholar. We highlight problems and solutions to the coupling and decoupling of publication data and metadata, with a tool called citeulike.org. This software tool (and many other tools just like it) exploit the Web to make digital libraries “warmer”: more personal, sociable, integrated, and accessible places.

Finally issues that will help or hinder the continued warming of libraries in the future, particularly the accurate identity of authors and their publications, are briefly introduced. These are discussed in the context of the BBSRC funded REFINE project, at the National Centre for Text Mining (NaCTeM.ac.uk), which is linking biochemical pathway data with evidence for pathways from the PubMed database.

Date: Thursday 21st May 2009, Time: 13.00, Location: John Rylands University (Main) Library Oxford Road, Parkinson Room (inside main entrance, first on right) University of Manchester (number 55 on google map of the Manchester campus). Please come along if you are interested…

References

Hull, D., Pettifer, S., & Kell, D. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web PLoS Computational Biology, 4 (10) DOI: 10.1371/journal.pcbi.1000204

[CC licensed picture above, the John Rylands Library on Deansgate by dpicker: David Picker]

Comments (2)

March 16, 2009

Defrosting the Digital Slideshow

Filed under: biotech,communication,informatics — Duncan Hull @ 3:14 pm
Tags: 2Collab, BBC Monitoring, bibtex, biochemistry, bioinformatics, Casey Bergman, ChEBI, cheminformatics, citeulike, connotea, CSW Informatics Ltd, database, endnote, Ford, google scholar, identity, Institutional Repository, John Chelsom, library, Mavis Cournane, Mekentosj Papers, Mendeley, metadata, Neil Smalheiser, openid, Papyro, Peter Murray-Rust, pubmed, refworks, scopus, text mining, Vetle Torvik

Slides from the seminar today, for those that asked for them. Thanks to everyone who came, we had a good turn out, much better than expected.

Those Library and Institutional Repository people have asked for an encore too…

Comments (2)

October 31, 2008

Defrosting the Digital Library

Filed under: web of science — Duncan Hull @ 7:21 am
Tags: 2Collab, arxiv, bbsrc, bibtex, Citeseer, citeulike, connotea, DBLP, endnote, google scholar, Mekentosj Papers, Mendeley, pubmed, refworks, scifoo, scopus, Zotero

Bibliographic Tools for the Next Generation Web

We started writing this paper [1] over a year ago, so it’s great to see it finally published today. Here is the abstract:

“Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as “thought in cold storage,” and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.”

Thanks to Kevin Emamy, Richard Cameron, Martin Flack, and Ian Mulvany for answering questions on the CiteULike and Connotea mailing lists; and Greg Tyrelle for ongoing discussion about metadata and the semantic Web nodalpoint.org. Also thanks to Timo Hannay and Tim O’Reilly for an invitation to scifoo, where some of the issues described in this publication were discussed. Last but not least, thanks to Douglas Kell and Steve Pettifer for helping me write it and the BBSRC for funding it (grant code BB/E004431/1 REFINE project). We hope it is a useful review, and that you enjoy reading it as much as we enjoyed writing it.

References

Duncan Hull, Steve Pettifer and Douglas B. Kell (2008). Defrosting the digital library: Bibliographic tools for the next generation web. PLoS Computational Biology, 4(10):e1000204+. DOI:10.1371/journal.pcbi.1000204, pmid:18974831, pmcid:2568856, citeulike:3467077
Also mentioned (in no particular order) by NCESS, Wowter, Twine, Stephen Abram, Rod Page, Digital Koans, Twitter, Bora Zivkovic, Digg, reddit, Library Intelligencer, OpenHelix, Delicious, friendfeed, Dr. Shock, GribbleLab, Nature Blogs, Ben Good, Rafael Sidi, Scholarship 2.0, Subio, up2date, SecondBrain, Hubmed, BusinessExchange, CiteGeist, Connotea and Google

[Sunrise Ice Sculptures picture from Mark K.]

Comments (5)

September 4, 2008

Famous for fifteen people

Filed under: web of science — Duncan Hull @ 2:47 pm
Tags: Andrew Walkingshaw, Andy Warhol, axiope, badscience, barcamb, Ben Goldacre, BrainDuck, Brian Cox, cameron neylon, Clan Duncan, connotea, Coracle, DIUS, Egon Willighagen, Euan Adie, Gia Millinovich, Jean-Claude Bradley, London, Marilyn Monroe, Mekentosj, MOMA, momus, myexperiment, Nature Network, New York, nobel, NTK, penis, people, Peter Murray-Rust, sciblog, scifoo, soloconf, soloconf_09, Timo Hannay, Zoe Corbyn

The artist Andy Warhol once said:

“In the future everybody will be world famous for fifteen minutes”.

This well worn saying has been quoted and misquoted in hundreds of different ways in the forty years since Warhol first coined it [1].

Bad Scientist Ben Goldacre, in his keynote speech* at Science Blogging (sciblog) 2008, highlighted one of these deliberate misquotes, which he attributed to NTK.net (Need To Know: Britain’s most sarcastic high-tech weekly newsletter). It goes a little something like this:

“On the internet everybody can be world famous for fifteen people“.

This wonderful expression captures the nature and scale of science blogging on the internet today in a nutshell. Personally, I think it also sums up much of the spirit of the Science Blogging 2008 conference as well. In total, around eight groups of fifteen people, attended the conference. It was physically impossible to talk to all of them in one day, especially since I had to slink off early at 7pm, but I did manage to meet the following people: (more…)

Comments (17)

June 20, 2008

A Brief Review of RefWorks

Filed under: informatics — Duncan Hull @ 2:11 pm
Tags: bibtex, citeulike, connotea, endnote, proquest, pubmed, reed-elsevier, refshare, refworks, review, scopus, thomson-reuters, wok

There is no shortage of bibliographic management tools out there, which ultimately aim to save your time managing the papers and books in your personal library. I’ve just been to a demo and sales pitch for one of them, a tool called RefWorks. Refworks claims to be “an online research management, writing and collaboration tool — designed to help researchers easily gather, manage, store and share all types of information, as well as generate citations and bibliographies”. It looks like a pretty good tool, similar to the likes of EndNote but with more web-based features that are common with Citeulike and Connotea. Here are some ultra-brief notes. RefWorks in five minutes, the good, the bad and the ugly.

The Good…

Refworks finer features

Refworks is web based, you can use it from any computer with an internet connection, without having to install any software. Platform independent, Mac, Windows, Linux, Blackberry, iPhone, Woteva. This feature is becoming increasingly common, see Martin Fenner’s Online reference managers, not quite there yet article at Nature Network.
Share selected references and bibliographies on the Web via RefShare
It imports and exports all the things you would expect, Endnote (definitely), XML, Feeds (RSS), flat files, BibTeX (check?), RIS (check?) and several others via the screenscraping tool RefGrab-It
Interfaces with PubMed and Scopus (and many other databases) closely, e.g. you can search these directly from your RefWorks library. You can also export from Scopus to Refworks…
Not part of the Reed-Elsevier global empire (yet), currently part of ProQuest, based in California.
Free 30 day trial is available
Just like EndNote, it can be closely integrated with Microsoft Word, to cite-while-you-write

(more…)

Comments (7)

April 4, 2008

myScience: “social software” for scientists

Filed under: web of science — Duncan Hull @ 9:03 am
Tags: arxiv, biomedexperts, Carole Goble, citeulike, connotea, e-science, endnote, Epernicus, facebook, flickr, friendfeed, labroots, mashup, me-Science, Mendeley, myexperiment, myScience, mySpace, openwetware, refworks, slideshare, Tara Brabazon, Web 2.0

With apologies to Jonathan Swift:

“Great sites have little sites upon their back to bite ’em
And little sites have lesser sites, and so ad infinitum…”

So what happened was, Carole Goble asked on the myExperiment mailing list, “is there a list of scientist social networking sites”? Here is first attempt at such a list (not comprehensive), you’ll have to decide for yourself which are the great, greater, little and lesser sites.
(more…)

Comments (13)

September 5, 2007

Semantic Biomedical Mashups with Connotea

Filed under: Uncategorized — Duncan Hull @ 9:26 pm
Tags: Ben Good, bioinformatics, connotea, Elsevier, Entity Describer, Evilsevier, informatics, JBI, mashup, medical informatics, semantic web

The Journal of Biomedical Informatics (JBI), will soon be publishing their special issue on Semantic Biomedical Mashups (can you fit any more buzzwords into a Call For Papers?!). Ben Good and friends have submitted a paper on their Entity Describer which extends connotea using some Semantic Web goodness. They’d appreciate your comments on their submitted manuscript over at i9606. As Ben says, their pre-publication turns out to be an interesting experiment “figuring out how blogging might fit into the academic publishing landscape”. If this interests you, get commenting now!

Update: Just spotted this interesting graphic of the Elsevier / Evilsevier logo (snigger), who are the publishers of JBI…

September 1, 2010

April 30, 2010

References

June 2, 2009

References

May 19, 2009

References

March 16, 2009

October 31, 2008

Bibliographic Tools for the Next Generation Web

References

September 4, 2008

June 20, 2008

The Good…

April 4, 2008

September 5, 2007

Meta / μετά