Reading old news in the web archive, distantly

[The substance of this post has now been published.]

One of the defining moments of Rowan Williams’ time as archbishop of Canterbury was the public reaction to his lecture in February 2008 on the interaction between English family law and Islamic shari’a law. As well as focussing attention on real and persistent issues of the interaction of secular law and religious practice, it also prompted much comment on the place of the Church of England in public life, the role of the archbishop, and on Williams personally. I tried to record a sample of the discussion in an earlier post.

Of course, a great deal of the media firestorm happened online. I want to take the episode as an example of the types of analysis that the systematic archiving of the web now makes possible: a new kind of what Franco Moretti called ‘distant reading.’

The British Library holds a copy of the holdings of the Internet Archive for the .uk top level domain for the period 1996-2010. One of the secondary datasets that the Library has made available is the Host Link Graph. With this data, it’s possible to begin examining how different parts of the UK web space referred to others. Which hosts linked to others, and from when until when ?

This graph shows the total number of unique hosts that were found linking at least once to in each year.

Canterbury unique linking hosts - bar

My hypothesis was that there should be more unique hosts linking to the archbishop’s site after February 2008, which is by and large borne out. The figure for 2008 is nearly 50% higher than for the previous year, and nearly 25% higher than the previous peak in 2004. This would suggest that a significant number of hosts that had not previously linked to the Canterbury site did so in 2008, quite possibly in reaction to the shari’a story.

What I had not expected to see was the total number fall back to trend in 2009 and 2010. I had rather expected to see the absolute numbers rise in 2008 and then stay at similar levels – that is, to see the links persist. The drop suggests that either large numbers of sites were revised to remove links that were thought to be ‘ephemeral’ (that is to say, actively removed), or that there is a more general effect in that certain types of “news” content are not (in web archivist terms) self-archiving. [Update 02/07/2014 – see comment below ]

The next step is for me to look in detail at those domains that linked only once to Canterbury, in 2008, and to examine these questions in a more qualitative way. Here then is distant reading leading to close reading.

You can download the data, which is in the public domain, from here . Be sure to have plenty of hard disk space, as when unzipped the data is more than 120GB. The data looks like this:

2010 | | | 20

which tells you that in 2010, the Internet Archive captured 20 individual resources (usually, although not always, “pages”) in the Church Times site that linked to the archbishop’s site. My poor old laptop spent a whole night running through the dataset and extracting all the instances of the string “”.

Then I looked at the total numbers of unique hosts linking to the archbishop’s site in each year. In order to do so, I:

(i) stripped out those results which were outward links from a small number of captures of the archbishop’s site itself.

(ii) allowed for the occasions when the IA had captured the same host twice in a single year (which does not occur consistently from year to year.)

(iii) did not aggregate results for hosts that were part of a larger domain. This would have been easy to spot in the case of the larger media organisations such as the Guardian, which has multiple hosts (society,,, etc.) However, it is much harder to do reliably for all such cases without examining individual archived instances, which was not possible at this scale.


(i) that a host “” held the same content as “”.

(ii) that the Internet Archive were no more likely to miss hosts that linked to the Canterbury site than ones that did not – ie., if there are gaps in what the Internet Archive found, there is no reason to suppose that they systematically skew this particular analysis.

Rowan Williams and sharia

Now that the dust has settled a little, I thought it worthwhile to gather together some of the more interesting contributions to a debate generally characterised by hysteria and a steadfast refusal to engage with detail. Any other contributions to this compendium gratefully received.

The lecture itself
I happened to be in the audience for Dr Williams’ lecture, the text of which may be found on his own site, along with his subsequent address to the General Synod. It was (unfortunately) trailed by an interview on the BBC earlier that day. At the time I thought the lecture to be carefully argued and, whilst dense, only so far as is necessary to deal with a complex matter.

The Archbishop and the media
Much has been made of Williams’ supposed naivety, and the quality or otherwise of his press officers’ advice. I agree with Madeleine Bunting (Guardian 9th Feb) and Giles Fraser (Guardian 12th Feb) that, whilst he and his staff were reported as being taken aback at the ferocity of the reaction, he was well aware (as he hinted in the questions after the lecture) of the possible reactions, but that he is (rightly in my view) unprepared to succumb entirely to what a Guardian editorial described as the ‘simplicity complex’ in our media (9/2/08): difficult issues need to be addressed, and it will take time and patience to do so properly. Not everything can be boiled down into simple slogans.

Church and state
On the issue itself, I note a number of commentators who detected a manoeuvre on behalf of all faiths in a lecture ostensibly about Islam, which led all three in various ways to call for an equalisation downwards before the law – by means, explictly or implicitly, of the completion of the disestablishment of the Church of England: [Andrew Anthony, Guardian 12/2/08; Matthew Parris, Times 9/2/08; Janet Daley, Telegraph 11/2/08.]
One of the more interesting engagements with the detail of what was proposed came from Thom Dyke on the Prospect website.

The Archbishop and the church
Much was made of the calls for Williams to resign, but it is clear that these came mostly from those who have been unhappy about him from the beginning. On the response from his predecessor: George Carey’s article for the News of the World is actually more supportive than was widely reported, but the key word he used was ‘disastrous’, and was the News of the World really the place for his intervention ? (now on the website under the strapline ‘Hapless prelate’s Sharia views condemned by Lord Carey’)
There have however been some more favourable evangelical responses: see Jonathan Chaplin and more tangentially Andrew Goddard on the ‘open evangelical’ Fulcrum site.

[7 Sep 2015: some of the links in this post have been updated to refer to the Internet Archive where content has disappeared from the live web.]