O'Really?

May 18, 2012

Web analytics: Numbers speak louder than words

Two hundo! by B. Rosen

Two hundred light painting by B. Rosen, via  Flickr available by Creative Commons license

According to the software which runs this site, this is the 200th post here at O’Really To mark the occasion, here are some stats via WordPress with thoughts and general navel-gazing analysis paralysis [1] on web analytics. It all started just over six years ago at nodalpoint with help from Greg Tyrelle, the last four years have been WordPressed with help from Matt Mullenweg. WordPress stats are unfortunately very primitive compared to the likes of Google Analytics and don’t give you access to the server log files either. WordPress probably flatters to deceive by exaggerating page views and encouraging users to post more content, but it doesn’t count self-visits to the blog. Despite all the usual limitations of the murky underworld of web analytics and SEO, here are the stats, warts and all.

As of May 2012, this blog is just shy of 200,000 page views in total with 500+ comments (genuine) comments and 100,000+ spam comments nuked by the Akismet filter. The busiest day so far was the 15th February 2012 with 931 views of a post in a single day which got linked to by the Wall Street Journal. The regular traffic is pretty steady around the 1,000 views per week (~4000 views per month) mark. Most readers come from the United States, United Kingdom and Germany (jawohl! in that order) which breaks down as follows:

Top posts: What people read when they get here

The most popular pages here are as follows:

Page Views
Home page / Archives 33,977
Impact Factor Boxing 2010 17,267
Impact Factor Boxing 2009 10,652
How many journal articles have been published? 7,181
Impact Factor Boxing 2011 6,635

Are we obsessed with dodgy performance metrics like journal impact factors? I’m not, honest guv’, but lots of people on t’interwebs clearly are.

Top search terms: How people get here

The search engines send traffic here through the following search terms:

Search terms Views
plos biology impact factor 2010 3,175
impact factor 2010 1,631
impact factor 1,589
plos biology impact factor 1,566
impact factor 2009 1,333

Is there a correlation between Obsessive Compulsive Disorder (OCD) and Impact Factor (IF)? Probably. Will it ever stop? Probably not.

Referrals: Spread the link love

It’s not just search engines that send you traffic…

Referrer Views
Search Engines 16,339
cs.man.ac.uk 4,654
Twitter 2,334
friendfeed.com 2,262
flickr.com 2,077
researchblogging.org 1,904
en.wordpress.com 1.037

… social media (twitter, friendfeed, flickr, researchblogging and wordpress etc) refers nearly as much traffic as the search engines do. I fit the demographic of bloggers previously described [1]: male, educated and a life scientist.

Top five clicks: How people leave

This is what people are clicking on:

URL Clicks
isiknowledge.com/JCR 914
feeds2.feedburner.com/oreally 407
en.wikipedia.org/wiki/Dead_on_arrival 396
aps.org/publications/apsnews/200811/zero-gravity.cfm 363
plosbiology.org 305

Dear Thomson Reuters, you should have an associates scheme like Amazon. I’m advertising your commercial product (Journal Citation Reports) for free! I’m far too kind, please send me a generous cheque immediately for my troubles or I will remove all links to your product.

Lots of people looking for the lyrics of the Friends sitcom jingle don’t know what “Your love life’s D.O.A.” means. Glad to be of service.

Conclusions

Traffic here is fairly modest compared to some blogs, but is still significant and to my mind justifies the time spent blogging. It is great fun to blog, and like most things in life, it can be very time consuming to do well. There is a long way to go before reaching the 10,000 hours milestone, maybe one day.

What people are actually interested in reading, and what you think they will be interested in reading are often two completely different things. Solo blogging has disadvantages and it’s been very tempting to try and join one of the many excellent blogging collectives like PLoS Blogs, Occam’s Typewriter or the Guardian science blogs. For the meantime though, going it alone on a personal domain name has it’s advantages too.

So, if you’ve read, commented or linked to this site, thank you very much. I hope you enjoy reading these posts as much as I enjoy writing them. Like smartphones and wifi, it’s hard to imagine life without blogs and bloggers.

References

  1. Shema, H., Bar-Ilan, J., & Thelwall, M. (2012). Research Blogs and the Discussion of Scholarly Information PLoS ONE, 7 (5) DOI: 10.1371/journal.pone.0035869

February 20, 2009

Mistaken Identity: Google thinks I’m Maurice Wilkins

Who's afraid of Google?In a curious case of mistaken identity, Google seems to think I’m Maurice Wilkins. Here is how. If you Google the words DNA and mania (google.com/search?q=dna+mania) one of the first results is a tongue-in-cheek article I wrote two years ago about our obsession with Deoxyribonucleic Acid. Now Google (or more precisely Googlebot) seems to think this article is written by one M Wilkins. That’s M Wilkins as in the physicist Maurice Wilkins, the third man of the double helix (after Watson and Crick) and Nobel prize winner back in ’62. How could such a silly (but amusing) mistake be made? Because the article is about what Wilkins once said, but not actually by Wilkins. Computers can’t tell the difference between these two things. Consequently, it has been known for some time that Google Scholar has many other mistaken identities for authors like this. Scholar even thinks there is an author called Professor Forgotten Password (a prolific author who has been widely cited in many fields)!

The other curiosity is this, the original post on nodalpoint.org is also counted as a citation in Google Scholar too. It’s a bit of a mystery how scholar actually works, what it includes (and excludes) and how big it is, but you’ll find the article counted as a proper citation for a book about genes. Scientific spammers must be licking their lips with the opportunity to influence results and citation counts, with humble blog posts, rather than more kosher articles in peer-reviewed scientific journals.

So what does this all this curious interweb mischief tell us?

  1. Identifying people on the web is a tricky business, more complex than most people think
  2. Googlebot needs to have its algowithms tweaked by those Google Scholars at the Googleplex. Not really surprising, what else did you expect from Beta software? (P.S. Googlebot, when you read this, I’m not Maurice Wilkins, that’s not my name. I haven’t won a Nobel prize either.  I’m sort of flattered that you’ve mistaken me for such a distinguished scientist, so I’ll enjoy my alternative identity while it lasts.)
  3. Blogs are increasingly part of the scientific conversation, counted in various bibliometrics, will Google Scholar (and the rest) start indexing other blogs too? Where will this trend leave more conventional bibliometrics like the impact factor?

(Note: These search results were correct at the time of writing, but may change over time, results preserved for posterity on flickr)

References

  1. Maurice Wilkins (2003) The Third Man of the Double Helix: The Autobiography of Maurice Wilkins isbn:0198606656
  2. Péter Jacsó (2008) Savvy searching – Google Scholar revisited. Online Information Review 32: 102-11 DOI:10.1108/14684520810866010 (see also Defrosting the Digital Library)
  3. Douglas Kell (2008) What’s in a name? Guest, ghost and indeed quite imaginary authorships BBSRC blogs
  4. Neil R. Smalheiser and Vetle I. Torvik Author Name Disambiguation (This is a preprint version of a chapter published in Volume 43 (2009) of the Annual Review of Information Science and Technology (ARIST) (B. Cronin, Ed.) which is available from the publisher Information Today, Inc (http://books.infotoday.com/asist/#arist).
  5. Duncan Hull (2007) DNA mania. Nodalpoint.org
  6. Jules De Martino and Katie White (2008) That’s not my name (video)

November 17, 2008

Science blog meme: Why do we blog?

Keep Calm and Carry On via AJC1I have been virally infected by Martin Fenner’s “why do we blog” meme.

1. What is your blog about?

Science and technology, especially bioinformatics, systems biology and the Web. It is a personal laboratory notebook-cum-diary, with a few facts and many opinions that would be difficult to publish conventionally [1].

2. What will you never write about?

Banal personal trivia (“I went shopping today”), confidential work, collaborative projects before they have been published. If in doubt, I try to ask people, “is it OK if I blog this?”

3. Have you ever considered leaving science?

Already did, I left science after my undergraduate degree to work in industry, but came back after six years to do a PhD. I don’t think Science ever really leaves you, once a scientist, always a scientist. Can’t see myself “leaving” again, but you never know.

4. What would you do instead?

Tend olive trees in Greece. Sequence 10,000 + Olive tree genomes, do some olive tree systems biology [2]. Subsidise scientific research with money from olive oil export business.

5. What do you think will science blogging be like in 5 years?

Pretty much the same as it is now I reckon, maybe more senior scientists will start blogging, see big boffins with blogs.

6. What is the most extraordinary thing that happened to you because of blogging?

I’m pretty sure blogging was a significant factor in being invited to Science Foo Camp (scifoo)

7. Did you write a blog post or comment you later regretted?

Non, rien de rien. Non, je ne regrette rien. Some of the posts about semantic web and molecular biology I might come to regret in the future though, but life is too short. There is an ever present temptation to write controversial blog posts (that might be regretted later) to get more visitors to your blog. Sometimes I can’t resist. Also, there is no safety net of peer-review, so you can make mistakes very quickly, even faster than by drinking tequila. I often wonder what prospective employers and/or funding bodies would make of it all – by the time I find out, it might be too late 🙂

8. When did you first learn about science blogging?

Via nodalpoint which is run by Greg Tyrelle.

9. What do your colleagues at work say about your blogging?

So far, there have been five basic responses to my blog among colleagues.

a) Great idea, carry on (see picture, top right). Can you blog this for me?

b) Bad idea, why do you waste so much time blogging? When are you going to do some “real” work?

c) Teasing: “I’m drinking a coffee, are you blogging this?”

d) Head-in-the-sand, no acknowledgment, denial, look the other way.

e) Ignorance is bliss. What is a blog? Do you have one of those interweb things on your computer?

References

  1. Michael R. Seringhaus and Mark B. Gerstein (2007). Publishing perishing? towards tomorrow’s information architecture. BMC Bioinformatics 8, 17+. DOI:10.1186/1471-2105-8-17, pmid:17239245
  2. Royston Goodacre, Douglas B Kell, Giorgio Bianchi (1992). Neural networks and olive oil. Nature 359 (6396), 594. DOI:10.1038/359594a0

[Keep Calm and Carry On via AJC1]

December 12, 2007

Mapping the Internet

Internet mapAs of 2007, the Internet is mostly still a wild untamed jungle. Many people have tried to chart the territory, but what should a map of the internet look like?

One of my favourite maps is “The Web Is Agreement” by Paul Downey. Paul’s map has a Tolkien-like Lord of the Rings feel to it, so instead of Microsoft we have Mordorsoft. The all seeing eye of Sauron is Google of course, helping search, but raising privacy concerns.

Paul is not the only cartographer busy drawing maps, Randall Munroe has drawn a nifty map based on Internet Protocol (IP) addresses (available as a poster, for hard-core geeks) and an online communities map, shown at the bottom of this post.

If the atoms of the Internet had numbers, you could organise them into a map like the Periodic Table, just as Mendeleev did. Hence we have The Periodic Table of the Internet by Wellington Grey, which uses PageRank (instead of atomic numbers) as a means of charting the Internet.

Periodic Table of the Internet

And of course there’s some bloke called Tim who, showing his British roots, often draws more abstract maps that look like the London Underground, shown below.

The map is not the territory but you can learn a hell of a lot by looking at the map before you head into the jungle. Using the map below, you’ll find nodalpoint, down South in the warm “blogipelago“, past the “Gulf of YouTube” below. Bon voyage!

November 6, 2007

What’s The Point of Blogging?

I am a hard bloggin' scientist. Read the Manifesto.
Sometimes I wonder what what the point of blogging is and just how much time people (myself included) waste reading and writing them. Let’s face it, most leading scientists are too damn busy to pay much attention to the blogosphere, especially when it descends (as it frequently does) into “uncontrollable verbal discharge”. This unfortunate medical condition is also known as Blogorrhoea. A free-flowing blog is unlikely to directly increase a scientists productivity (as approximated by the infamous h-index), and might even decrease it. Now, we all know that powerpoint can be PowerPointless, so is blogging also a pointless activity? Or to put it another way: Nodalpoint or Nodalpointless?

If you’ve ever wondered what the point of scientific blogging is, you should read the following, (if you haven’t already):

So what the heck, if blogging is fun and helps you communicate ideas with people, why get all uptight about questionable metrics for measuring scientific productivity? Wherever you blog, blog hard, blog fast and enjoy it. At the very least, it will fill the gaping void left on the Web by traditional scientific publishing. Who knows what the other benefits might be?

References

  1. Jorge Hirsch An index to quantify an individual’s scientific research output Proceedings of the National Academy of Sciences. 2005 November;102(46):16569-16572 DOI:10.1073/pnas.0507655102
  2. this post originally on nodalpoint with comments

August 7, 2007

Scifoo: Geek Out! Le Geek, C’est Chic…

Deepak Singh and Euan Adie

As well as big famous superstars at Science Foo Camp (scifoo), there is a chance to meet and “geek out” with younger engineers and scientists like Vince Smith, Aaron Schwartz and Vaughan Bell.

Aaron Schwartz and the open library project

On Sunday at scifoo, Aaron (of archive.org) gave a quick demo of the Open Library. Currently this project is taking books that are out of print and not in other book catalogues like Amazon, and making them available online. They are intending to move into archiving scientific journals, so watch that space. I’ve always wondered how the internet archive survived financially, and managed all its interesting projects (like the open library). It’s all funded by some bloke called Brewster Kahle. They provide some great services, like hosting digital artifacts for free, see http://www.archive.org/create/.

Vince Smith, Museums and Drupal

Vince Smith is a “cyber-taxonomist” at the Natural History Museum in London. He’s a world expert on parasitic lice, and uses a multi-site installation of Drupal, see vsmith.info (Hmmm, that drupal skin looks familiar…). Vince uses a drupal module for bibliographic citations, called biblio, looks handy. It’d be nice to have it on nodalpoint? Anyway, anytime spent looking around Vince’s site is time well spent.

Vaughan Bell, Mind Hacker

Vaughan Bell is a clinical psychologist. We chatted about wikipedia and science, as demonstrated by Schizophrenia. He’s also a contributor to a book on MindHacks and blogs at mindhacks.com. My suitcase is full of free O’Reilly book-schwag I filled my boots with on Friday, one of which is Vaughan’s book. Looks like it will be a good read on the plane home, because my brain is in need of some serious “optimisation”.

(Two more geeks, pictured right, but regular nodalpoint readers will know all about them already, Deepak Singh and Euan Adie.)

Theres plenty more I could blog about scifoo, but I’m all foo-ked up, geeked out and mashed-up. It’s time to go home. For more scifoo blogging see www.technorati.com/tags/scifoo, www.nature.com/scifoo and network.nature.com/blogs/tag/scifoo.

References

  1. Aaaaah: Freak Out! Le Freak, C’est Chic…

Creative Commons License

This work is licensed under a

Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.


November 28, 2006

Postdoc Hell: Should I Stay Or Should I Go?

Postdoc Hell? "Please dispose thoughtfully of your used postdocs"Sometimes, being a PostDoctoral researcher is a tough life. Thankfully, help is at hand in Philip Bourne and Iddo Friedberg‘s guide Ten Simple Rules for Selecting a Post-Doctoral Position published in PLOS Computational Biology. This article is part of a series of editorials [1,2,3] which discuss various aspects of the weird and wonderful world of scientific research. They are worth reading if you’re at an early stage of your career, although you may not always agree with all the advice given. For example, the article advises PostDocs to:

Think very carefully before extending your graduate work into a postdoc in the same laboratory where you are now – to some professionals this raises a red flag when they look at your resume. Almost never does it maximise your gain of knowledge and experience, but that can be offset by rapid and important publications.

Do any experienced postdocs (or post-postdocs) out there have an opinions on the importance of moving labs after a PhD? What if you’re already in a great lab and like where you work? To what extent is it important to move, just to get new experience and skills? Or as The Clash once put it [4]:

♫ If I go there will be trouble, if I stay it will be double.
So come on and let me know, should I cool it or should I blow? ♫

References

  1. Phillip Bourne (2006) Ten Simple Rules for Getting Published PLOS Computational Biology
  2. Phillip Bourne and Leo Chalupa (2006) Ten Simple Rules for Getting Grants PLOS Computational Biology
  3. Phillip Bourne and Alon Korngreen (2006) Ten Simple Rules for Reviewers PLOS Computational Biology
  4. Joe Strummer and Mick Jones (1981) Should I stay or should I go?
  5. Jawahar Swaminathan (2006) A ten step plan for PostDoc training nodalpoint.org
  6. this post originally on nodalpoint with comments
  7. Postdoc Hell, a collection of articles describing the plight of the postdoctoral researcher on citeulike


Creative Commons License

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

update: Mysteriously, Nature jobs used the Clash as a theme to their careers supplement, two weeks after this post was published. See How to ask yourself questions about major career decisions and Should I Stay Or Should I Go?. Coincidence? I wonder if they read nodalpoint?

November 1, 2006

Bioinformatics Impact Factors

B of the Bang (in Big Bangchester)There are all sorts of flaws with using impact factors for judging the quality of biomedical research. Love them or hate them, just getting hold of impact factors for journals in bioinformatics and related fields is much harder than it should be, so I thought I’d reproduce some statistics I gathered here. The rankings, which you should use with caution [1,2], are correct as of June 2006 (and apply to citations in 2005) courtesy of Journal Citation Reports®, part of Thomson ISI Web of Knowledge. JCR has a pretty horrible clunky web interface when compared to some of its rivals [3,4], maybe one day they’ll make it better. Anyway, this is not a comprehensive list, just a fairly random selection of bioinformatics and computer science journals that publish articles I’ve been reading the last few years.

Journal ISI impact factor
Science 30.927
Cell 29.431
Nature Reviews Molecular Cell Biology 29.852
Nature 29.273
Nature Genetics 25.797
Nature Biotechnology 22.378
Nature Reviews Drug Discovery 18.775
PLOS Biology 14.672
PNAS 10.231
Genome Research 10.139
Genome Biology 9.712
Drug Discovery Today 7.755
Nucleic Acids Research 7.552
Bioessays 6.787
Plant Physiology 6.114
Bioinformatics (OUP) 6.019
BMC Bioinformatics 4.958
BMC Genomics 4.092
Proteins: structure, function and bioinformatics 4.684
IEEE Intelligent Systems 2.560
Journal of Computational Biology 2.446
Journal of Biomedical Informatics 2.388
IEEE Internet Computing 2.304
Artificial Intelligence in Medicine 1.882
Comparative and Functional Genomics 0.992
Concurrency and Computation: Practice and experience 0.535
Briefings in Bioinformatics (OUP) not listed
PLOS Computational Biology not listed
Journal of Web Semantics not listed

One point of interest, cheeky young upstart BioMed Central Bioinformatics (going since 2000) seems to be catching up on traditional old-school favourite OUP Bioinformatics (going since 1985), which as mentioned on nodalpoint, has been publishing some dodgy parser papers lately.

October 27, 2006

MEDIE: MEDLINE++

Filed under: informatics — Duncan Hull @ 10:21 pm
Tags: , , , , , , , , ,

MedieMEDIE is an “intelligent” semantic search engine that retrieves biomedical correlations from over 14 million articles in MEDLINE. You can find abstracts and sentences in MEDLINE by specifying the semantics of correlations; for example, What activates tumour suppressor protein p53? So just how useful is MEDIE and is it at the cutting edge?

At the Manchester Interdisciplinary Biocentre (MIB) launch yesterday, Professor Jun’ichi Tsujii gave a presentation on Linking text with knowledge – challenges for Text Mining in Biology. As part of this presentation he gave a demonstration of Medie: an intelligent search engine for Medline. This tool looks quite impressive if you experiment with some sample queries. I wonder what nodalpointers, especially hardened text-miners, natural language processing (NLP) nerds and computational linguists, make of Medie?

[This post was originally published on nodalpoint, with comments]

October 20, 2006

Manchester Biocentre Launch

MIB: Spot the test tubeThe Manchester Interdisciplinary Biocentre (MIB) is officially opening on 25/26th October 2006. The centre has been about a decade in the making, and aims to be a world-class research centre, with around £37 million (~$70 million) of initial funding from the Wellcome Trust charity, UK Research Councils and others. If you’re looking for a bioinformatics job, PhD, PostDoc etc in the UK, MIB is continuously hiring and looks like a good place to work, if the opening programme (which follows) is anything to go by.

Unfortunately the MIB web pages aren’t quite world class yet, the promotional launch material is only available in pdf format, *sigh*, see references below. So I’m blogging the MIB Symposium launch programme here to put the stuff online. Talks scheduled for the second day of the opening, 26th October 2006, are listed below, and these can be attended by free registration (see references):

Session 1: Bio-molecular machines, 9.00-11.00

Session chaired by Alan North, Dean of the Faculty of Life Sciences

  • John E. Walker (MRC Dunn Human Nutrition Unit, Cambridge, UK): Biomolecular rotary motors.
  • Yoshi Nakamura (Tokyo University, Japan): Aptamer as RNA-made super antibody for basic and therapeutic applications
  • John McCarthy, (Manchester Interdisciplinary Biocentre): Molecular mechanisms underlying post-transcrptional gene expression.
  • Refreshment break

Session 2: Biomolecular Structure and Dynamics, 11.00-12.40

Session chaired by Bob Ford, Professor of Structural Biology, Faculty of Life Sciences.

Session 3: Systems and Information, 13.35-15.45

Session chaired by John Perkins, Dean of Faculty of Engineering and Physical Sciences.

Session 4: Biocatalysis, 16.10-17.00

Session chaired by Hans Westerhoff, Manchester Interdisciplinary Biocentre

  • Nigel Scrutton (MIB and Faculty of Life Sciences): ‘Squeezing’ barriers – a dynamical view of enzyme catalysis.
  • Gill Stephens, (MIB and School of Chemical Engineering): Redox biocatalysis – the next generation of enzymes for manufacturing pharmaceutical intermediates and specialty chemicals.

Session 5: Bionanoscience and engineering: 17.00-18.00

Session chaired by Peter Fielden, Chemical Engineering

  • Joseph Wang (Arizona State University, USA): Nanomaterials for monitoring and controlling biomolecular interactions.
  • Milan Stojanovich (Columbia University Medical School, New York, USA): Deoxyribozyme-based devices.

Session 6: Postgenomic Analytical Technologies, 18.00-19.10

Session chaired by Roy Goodacre, MIB and School of Chemistry

  • Ruedi Aebersold (ETH Zürich): Quantitative Proteomics and Systems Biology
  • Simon Gaskell, MIB and School of Chemistry: New analytical science in proteomics and metabolomics.
  • Concluding remarks.
Next Page »

Blog at WordPress.com.