Over at @BBCSport and @BBC2012 there are some Olympian feats of big data wrestling going on behind the scenes for London 2012 . While we all enjoy the Olympics on a range of platforms and devices, a team of twenty engineers is busy making it all happen. It’s great that the BBC, unlike other large organisations, can talk openly about their technology and share hard-won knowledge widely.
Back in 2006 the BBC published another impressive application that allowed users to search and browse over 75 years of programme data. The programme was built from metadata, not the actual audio and visual data from the TV and Radio, but the data that comes after-data, information about the programmes from an internal database known as Infax [2,3].
The web app was published at open.bbc.co.uk/catalogue and built by a crack team of experts led by Matt Biddulph @MattB (and including Tom Coates @TomCoates, Ben Hammersley @BenHammersley, Gavin Bell @zzGavin, Tom Loosemore @TomskiTomski and several others – see comments below).
It allowed users to find weird and wonderful things. For example, you could browse all the programmes that Alan Turing or Albert Einstein had appeared in or search for all the programmes with Jennifer Ehle. You could query it as well, to list all episodes of Dr Who in the order they were aired. It wasn’t so much Big Data as Big Metadata, [4,5] potentially useful for improving the viewing and listening experience of the audience.
At the time of its launch, Dave Beckett @dajobe blogged about it, Matt Biddulph wrote some release notes, Tom Loosemore said a few words, backstage clocked it and I scribbled some notes too. Being a proof-of-concept “experimental prototype”, the app eventually disappeared into the great bit bucket in the sky. The only visible trace of the catalogue today is the blog posts above and the message below which greets you when you visit the site:
“Thank you for your continued interest in the BBC Programme Catalogue. The BBC is now looking into how this data can be incorporated into its programme information pages.”
You can still get some BBC programme metadata from bbc.co.uk/programmes and bbc.co.uk/archive. Every programme has publicly available metadata but only a fraction of what was in the open catalogue. Although the app has gone, lots of the data must still be there somewhere. Take for example, the opening ceremony for the Olympic Games…
The metadata that is currently available
“Coverage of the opening ceremony, which officially starts at 9.00, with the eyes of the world focused on the Olympic Stadium as the 30th Olympiad is officially declared open by Her Majesty the Queen. Film director Danny Boyle is set to produce a stunning cultural show ahead of the athletes’ parade, during which over 200 countries are expected to be represented. This is followed by the official opening, the arrival of the torch and the lighting of the cauldron.”
The metadata also tells you that this particular programme was presented by Sue Barker, Huw Edwards, Gary Lineker, Jake Humphrey and Mishal Husain. The Executive Producer was Paul Davies and there’s a bunch of other stuff: date of first broadcast, links to related information and clips but that’s about it.
The metadata that used to be available
The great thing about the open catalogue was that it went into lots more detail than above. So, for the Olympics ceremony, the participants in the programme would have been listed as Danny Boyle, Daniel Craig, Thomas Heatherwick, Elizbaeth II, Paul McCartney, Rowan Atkinson, Bradley Wiggins, Kenneth Brannagh, Steve Redgrave, J.K. Rowling and so on. For each contributor, you could see what other programmes they had been involved in, not just recently broadcast ones, but those going back 75 years. You could also see who had collaborated with who and when their first broadcast was and so on. It didn’t just document the über-famous people either, it went into just as much detail about other people you might not necessarily have heard of like Frank Cottrell Boyce, Callum Airlie and Jordan Duckitt. It was great stuff, but neither the archive or current programmes seem to have this level of detail.
It’s a bit of a mystery where all the lovely BBC metadata went, it’s probably just sitting on some servers somewhere, inaccessible to the outside world. With my licence fee paying hat on, this seems a bit of a waste. I’ve asked everyone I know, including people at the beeb, but have drawn a blank. Most have shrugged their shoulders and pointed to the useful but slightly impoverished /programmes and /archive which is why I’m writing this post on t’interwebs.
Maybe the Olympic task of curating all that data makes it un-sustainable. Perhaps somebody decided there is no point competing with wikipedia where wiki-nerds curate programme data for free? It’s possible you can’t justify serving big metadata without giving the actual data (programmes) too? Maybe there’s a shiny new application in the pipeline to replace the catalogue, currently being worked on or an upgrade to @ArchiveAtBBC & @programmes so they include much more data. Could there be issues with publishing this kind of personal data on the web which meant the whole thing got canned? Nasty copyright issues could probably sink a project like this too. Who knows…
Does anyone reading this know the answers? If you do, I’d love to hear from you.
- David Rogers (2012). Building the Olympic Data Services, BBC Internet blog
- Ant Miller (2012). Opening up the Archives: Finding content using metadata from Infax catalogue and Radio Times, BBC Research & Development blog
- Ant Miller (2012). Opening up the Archives: New kinds of metadata, BBC Research & Development blog
- Karen Loasby (2006). Changing approaches to metadata at bbc.co.uk: From chaos to control and then letting go again, Bulletin of the American Society for Information Science and Technology, 33 (1) 26. DOI: 10.1002/bult.2006.1720330109
- Andrius Butkus and Michael Petersen (2007). Semantic Modelling Using TV-Anytime Genre Metadata, Lecture Notes in Computer Science, 4471 234. DOI: 10.1007/978-3-540-72559-6_24