
bbc.co.uk/programmes as a QR Code by /Sizemore/ Mike Atherton on Flickr available under a creative commons licence
Over at @BBCSport and @BBC2012 there are some Olympian feats of big data wrestling going on behind the scenes for London 2012 [1]. While we all enjoy the Olympics on a range of platforms and devices, a team of twenty engineers is busy making it all happen. It’s great that the BBC, unlike other large organisations, can talk openly about their technology and share hard-won knowledge widely.
Back in 2006 the BBC published another impressive application that allowed users to search and browse over 75 years of programme data. The programme was built from metadata, not the actual audio and visual data from the TV and Radio, but the data that comes after-data, information about the programmes from an internal database known as Infax [2,3].
The web app was published at open.bbc.co.uk/catalogue and built by a crack team of experts led by Matt Biddulph @MattB (and including Tom Coates @TomCoates, Ben Hammersley @BenHammersley, Gavin Bell @zzGavin, Tom Loosemore @TomskiTomski and several others – see comments below).
It allowed users to find weird and wonderful things. For example, you could browse all the programmes that Alan Turing or Albert Einstein had appeared in or search for all the programmes with Jennifer Ehle. You could query it as well, to list all episodes of Dr Who in the order they were aired. It wasn’t so much Big Data as Big Metadata, [4,5] potentially useful for improving the viewing and listening experience of the audience.
At the time of its launch, Dave Beckett @dajobe blogged about it, Matt Biddulph wrote some release notes, Tom Loosemore said a few words, backstage clocked it and I scribbled some notes too. Being a proof-of-concept “experimental prototype”, the app eventually disappeared into the great bit bucket in the sky. The only visible trace of the catalogue today is the blog posts above and the message below which greets you when you visit the site:
“Thank you for your continued interest in the BBC Programme Catalogue. The BBC is now looking into how this data can be incorporated into its programme information pages.”
You can still get some BBC programme metadata from bbc.co.uk/programmes and bbc.co.uk/archive. Every programme has publicly available metadata but only a fraction of what was in the open catalogue. Although the app has gone, lots of the data must still be there somewhere. Take for example, the opening ceremony for the Olympic Games…
The metadata that is currently available
The metadata for each BBC programme can be found via its own page, so the opening ceremony programme has metadata available in xml and rdf which tells you several things including this synopsis:
“Coverage of the opening ceremony, which officially starts at 9.00, with the eyes of the world focused on the Olympic Stadium as the 30th Olympiad is officially declared open by Her Majesty the Queen. Film director Danny Boyle is set to produce a stunning cultural show ahead of the athletes’ parade, during which over 200 countries are expected to be represented. This is followed by the official opening, the arrival of the torch and the lighting of the cauldron.”
The metadata also tells you that this particular programme was presented by Sue Barker, Huw Edwards, Gary Lineker, Jake Humphrey and Mishal Husain. The Executive Producer was Paul Davies and there’s a bunch of other stuff: date of first broadcast, links to related information and clips but that’s about it.
The metadata that used to be available
The great thing about the open catalogue was that it went into lots more detail than above. So, for the Olympics ceremony, the participants in the programme would have been listed as Danny Boyle, Daniel Craig, Thomas Heatherwick, Elizbaeth II, Paul McCartney, Rowan Atkinson, Bradley Wiggins, Kenneth Brannagh, Steve Redgrave, J.K. Rowling and so on. For each contributor, you could see what other programmes they had been involved in, not just recently broadcast ones, but those going back 75 years. You could also see who had collaborated with who and when their first broadcast was and so on. It didn’t just document the über-famous people either, it went into just as much detail about other people you might not necessarily have heard of like Frank Cottrell Boyce, Callum Airlie and Jordan Duckitt. It was great stuff, but neither the archive or current programmes seem to have this level of detail.
Meta-conclusions
It’s a bit of a mystery where all the lovely BBC metadata went, it’s probably just sitting on some servers somewhere, inaccessible to the outside world. With my licence fee paying hat on, this seems a bit of a waste. I’ve asked everyone I know, including people at the beeb, but have drawn a blank. Most have shrugged their shoulders and pointed to the useful but slightly impoverished /programmes and /archive which is why I’m writing this post on t’interwebs.
Maybe the Olympic task of curating all that data makes it un-sustainable. Perhaps somebody decided there is no point competing with wikipedia where wiki-nerds curate programme data for free? It’s possible you can’t justify serving big metadata without giving the actual data (programmes) too? Maybe there’s a shiny new application in the pipeline to replace the catalogue, currently being worked on or an upgrade to @ArchiveAtBBC & @programmes so they include much more data. Could there be issues with publishing this kind of personal data on the web which meant the whole thing got canned? Nasty copyright issues could probably sink a project like this too. Who knows…
Does anyone reading this know the answers? If you do, I’d love to hear from you.
[Update, In October 2014 the Big British Castle launched their BBC Geome project at genome.ch.bbc.co.uk which covers much of the same data as infax, from 1923 to 2009. There’s a pretty decent Wikipedia article on the BBC programme catalogue and its ancestors, including infax]
References
- David Rogers (2012). Building the Olympic Data Services, BBC Internet blog
- Ant Miller (2012). Opening up the Archives: Finding content using metadata from Infax catalogue and Radio Times, BBC Research & Development blog
- Ant Miller (2012). Opening up the Archives: New kinds of metadata, BBC Research & Development blog
- Karen Loasby (2006). Changing approaches to metadata at bbc.co.uk: From chaos to control and then letting go again, Bulletin of the American Society for Information Science and Technology, 33 (1) 26. DOI: 10.1002/bult.2006.1720330109
- Andrius Butkus and Michael Petersen (2007). Semantic Modelling Using TV-Anytime Genre Metadata, Lecture Notes in Computer Science, 4471 234. DOI: 10.1007/978-3-540-72559-6_24
I wonder whether the cataloguers/metadata people at the British Library Sound Archive might have some idea? Or the BFI Archive? Both include many programmes from the BBC.
Comment by Frank Norman — August 3, 2012 @ 8:37 am |
It’s a shame the programme catalogue isn’t live, but I can understand, and maybe explain, why.
It was my decision to pull it offline, after it became apparent that it contained at least one really potentially very scary error. Not an embarrassing error; a really scary error. The kind of error that is not a problem when INFAX was used internally within the BBC, but is not remotely acceptable when surfaced in Google.
It was also a class of error that would prove very very hard to predict, spot and hence eliminate (needle in haystack).
So pulling the catalogue offline was a very easy decision, if a very sad one. Bit nothing ventured, nothing gained.
I left the BBC in 2007, but I can’t say I’m surprised it’s still not reappeared. The investment required to check such a big db would be significant, and probably be better spent starting again, since INFAX is at root an overloaded 70s era db, albeit one curated with love and care by BBC librarians.
For the record, lovely though all the people you credit are, the kudos for the catalogue belongs to Matt Biddulph – he built it all alone, while Ben Hammersley did the minimalist paint job.
The rest of is cheered from the sidelines. For the record, it cost less than £10k. Matt didn’t know his own worth in those days.
Comment by Tom Loosemore — August 3, 2012 @ 2:34 pm |
Thanks Tom, that’s a mystery solved. I’m curious to know exactly what these “really scary errors” were. I hope that the BBC Archive can put much more metadata online, it all seems a bit minimalist at the moment.
Comment by Duncan — August 3, 2012 @ 9:58 pm |
Why do you consider /programmes to be “slightly impoverished” ?
The same system provides a page for every programme the BBC has broadcast since 2007 – ranging from local radio shows to the Archers as well as some of the BBC’s biggest brands like Doctor Who.
In certain places, the effort has been made to clean up and publish archive data – for example, check out the Desert Island Discs archive complete with tracklists going all the way back to 1943:
http://www.bbc.co.uk/programmes/b006qnmr/broadcasts/1943/03
On top of that, it publishes different representations of each page: desktop, mobile, rdf, json and xml.
I agree that more could be done to enable browsing by contributor as you describe. But /programmes is a fantastic resource which deserves more love!
Comment by Patrick Sinclair — August 4, 2012 @ 11:05 am |
Hello Patrick , thanks for you comments
“If I hadn’t seen such riches I could live with being poor”
Poor is a relative term and slightly impoverished was probably a bad choice of description. I wasn’t dissing /programmes or the archive, just pointing out ways they could be richer than they are already. I’d love to see more metadata about BBC programmes, having seen the riches that were in the Infax catalogue. Obviously there are some issues (see comments and tweets above) with Infax, but perhaps a cleaned up lightweight version of Infax could be incorporated into /programmes?
The different representations and platforms from the BBC are all great, especially when compared to what other large media organisations are providing. So my post above was mainly mourning the loss of Infax to the outside world and wondering where all the metadata had gone, I’ve found out now, and look forward to more metadata in the future.
Comment by Duncan — August 7, 2012 @ 12:44 pm |