What’s in a (top-level domain) name?

I think there would be general agreement amongst web archivists that the country code top-level domain alone is not the whole of a national web. Implementations of legal deposit for the web tend to rely at least in part on the ccTLD (.uk, or .fr) as the means of defining their scope, even if supplemented by other means of selection.

However, efforts to understand the scale and patterns of national web content that lies outside national ccTLDs are in their infancy. An indication of the scale of the question is given by a recent investigation by the British Library. The @UKWebArchive team found more than 2.5 million hosts that were physically located in the UK without having .uk domain names. This would suggest that as much as a third of the UK web may lie outside its ccTLD.

And this is important to scholars, because we often tend to study questions in national terms – and it is difficult to generalise about a national web if the web archive we have is mostly made up of the ccTLD. And it is even more difficult if we don’t really understand how much national content there is outside that circle, and also which kinds of content are more or less likely to be outside the circle. Day to day, we can see that in the UK there are political parties, banks, train companies and all kinds of other organisations that ‘live’ outside .uk – but we understand almost nothing about how typical that is within any particular sector. We also understand very little about what motivates individuals and organisations to register their site in a particular national space.

So as a community of scholars we need case studies of particular sectors to understand their ‘residence patterns’, as it were: are British engineering firms (say) more or less likely to have a web domain from the ccTLD than nurseries, or taxi firms, or supermarkets? And so here is a modest attempt at just such a case study.

All the mainstream Christian churches in the island of Ireland date their origins to many years before the current political division of the island in 1921. As such, all the churches are organised on an all-Ireland basis, with organisational units that do not recognise the political border. In the case of the Church of Ireland (Anglican), although Northern Ireland lies entirely within the province of Armagh (the other province being Dublin), several of the dioceses of the province span the border, such that the bishop must cross the political border on a daily basis to minister to his various parishes.

Anglican Ireland. (Church of Ireland, via WIkimedia Commons, CC BY-SA 3.0)

Anglican Ireland. (Church of Ireland, via Wikimedia Commons, CC BY-SA 3.0)

How is this reflected on the web? In particular, where congregations in the same church are situated in either side of the border, where do their websites live – in .uk, or in .ie, or indeed in neither?

I have been assembling lists of individual congregation websites as part of a larger modelling of the Irish religious webspace, and one of these is the Presbyterian Church of Ireland. My initial list contains just over two hundred individual church sites, the vast majority of which are in Northern Ireland (as is the bulk of the membership of the church). Looking at Northern Ireland, the ‘residence pattern’ is:

.co.uk – 23%
.org.uk – 20%
.com – 17%
.org – 37%
Other – 3%

In sum, less than half of these sites – of church congregations within the United Kingdom – are ‘resident’ within the UK ccTLD. A good deal of research would need to be done to understand the choices made by individual webmasters. However, it is noteworthy that, for Protestant churches in a part of the world where religious and national identity are so closely identified, to have a UK domain seems not to be all that important.

1. My initial list (derived from one published by the PCI itself) represents only sites which the central organisation of the denomination knew existed at the time of compilation, and there are more than twice as many congregations as there are sites listed. However, it seems unlikely that that in itself can have skewed the proportions.

2. For the very small number of PCI congregations in the Republic of Ireland (that appear in the list), the situation is similar, with less than 30% of churches opting for a domain name within the .ie ccTLD. However, the number is too small (26 in all) to draw any conclusions from it.

The vicar and the Midwich Cuckoos

After an extended break, another post on the occasional series on Anglican clergy in modern British fiction. Today, it is the turn of John Wyndham, and The Midwich Cuckoos, first published in 1957.

The Penguin edition of 1960.

The Penguin edition of 1960.

The Reverend Hubert Leebody is one of the more substantial clerical characters in recent times, and the character functions as a foil to Gordon Zellaby, resident of Kyle Manor: gentleman sceptic, pragmatist, and the closest thing the novel has to a heroic character. Midwich is an archetypal English country village, in which nothing of note has seemingly occurred in a millennium. In Midwich, the old certainties about social leadership are embodied in Zellaby, Willers the doctor, and Leebody, resident of the Georgian vicarage and incumbent of the church: ‘mostly perp. and dec., but with a Norman west doorway and font.’ (chapter 1) And as the bizarre events unfold, Leebody continues to be the social glue that holds the community together. In chapter 6 the village flock to the church for the funeral of the first casualties, and it is Leebody who conducts them along with a service of thanksgiving for the sparing of the remainder. As the girls of the village discover their collective pregnancy, it is to Leebody that they come in their confusion. ‘He had baptized them when they were babies;he knew them, and their parents well.’ (ch.7) As the Children arrive, it is Leebody who baptises them in turn, in a faintly desperate attempt to normalise the hideous fact of their xenogenesis. (ch.12)

Ultimately, however, it is not Leebody who graps the depth of the moral crisis in which the village and the authorities find themselves, but Zellaby. Wyndham expounds much of the dilemma in dialogue between the two in chapter 17. How are humans to account for the existence in their midst of seemingly other beings, albeit in human form? How may they be fitted into a system of law that would allow a co-existence, and restrain the overwhelming coercive power that it is revealed that the Children have? Are they humans at all, or a dangerous other species, to wipe out which would be morally defensible in order to save humanity? Leebody confesses himself ‘in a morass’ about the matter, and the dialogue moves back and forth inconclusively until Leebody is called away to keep the peace as a lynch mob of villagers confronts the Children.

And this is the last the reader sees of the Reverend Leebody. In a manner reminiscent of H.G. Wells’s curate in The War of the Worlds, the reader is left with the impression that the vicar’s frame of reference can contribute no more to the situation. A good man, and socially important, when put under extreme pressure the vicar is found wanting. It is left to Zellaby to lead the village to the point at which a solution can be imagined; and it is the clear-sighted sceptic Zellaby – the only person in the village able and prepared to see the situation as it really is – who has the courage to act.

British blogs in the web archive: some data

While working on another project, I’ve had occasion to make some data relating to the blog aggregator site britishblogs.co.uk  (now apparently defunct) which occurs in the Internet Archive between 2006 and 2012. I am unlikely to exploit it very much myself, and so I have made it available in figshare, in case it should be of use to anyone else.

Specifically, it is data derived from the UK Host Link Graph, which states the presence of links from one host to another in the JISC UK Web Domain Dataset (1996-2010), a dataset of archived web content for the UK country code top level domain captured by the Internet Archive.

It has 19,423 individual lines, each expressing one host-to-host linkage in content from a single year.

Since the blog as a format seems to be particularly prone to disappearance over time, scholars of the British blogosphere may find this useful in locating now defunct blogs in the Internet Archive or elsewhere. My sense is that the blogs included in the aggregator were nominated by their authors as being British, and so this may be of some help in identifying British content in platforms such as WordPress or Blogger.

Some words of caution. The data is offered very much as-is, without any guarantees about robustness or significance. In particular:

(i) I have made no effort to de-duplicate where the Internet Archive crawled the site, or parts of it, more than once in a single year.

(ii) also present are a certain number of inbound links – that is to say, other hosts linking to britishblogs.co.uk. However, these are very much the minority.

(iii) there is also some analysis needed in understanding which links are to blogs, and which are to content linked to from within those blogs (and aggregated by British Blogs).


Church Times review of Archbishop Ramsey

Ramsey - coverPerhaps not surprisingly, the first review of my book on Michael Ramsey comes from the Church Times, in the issue of 31 July. The reviewer is Graham James, bishop of Norwich, to whom my thanks are due. Read the full review (PDF).

As is the case with most reviews, James points out a factual error, where I have indeed misnamed a theological college (or rather, applied a later name change to an earlier period). I rather think that if this was my worst mistake, I should think I had done quite well. More interesting are some points of interpretation of subsequent Anglican history, which I mention here.

James quite rightly notes a mismatch between the memoirs of archbishops and those of the politicians with whom they interacted, where evidence of influence is as absent in the latter as it is present in the former. He asks whether this is self-delusion on the part of ecclesiastics, or an attempt to downplay influence on the part of politicians, in memoirs often written at a later point in time. I suspect the answer is something of both.

One of the central burdens of the book is that, as Ramsey sought and gained greater autonomy from the state while overseeing the emptying of the moral law of its Christian content, there opened up a space and an opportunity for the Church of England to discover a more prophetic role for itself, speaking the truth to power from a greater distance. James, I think correctly, notes that “it has never emerged, except, perhaps, in the Runcie years under an archbishop rather at home with the Establishment.” There remains a whole new research project into why the Church of England didn’t grasp the opportunity.

James also suggests that one of the results of the greater autonomy of the church to make its own decisions was that “the Church of England became increasingly captive to its own internal political factions. Ramsey seems to have been innocent to this possibility…… His grasp of ecclesiastical politics was immeasurably weaker, and his interest even less. We suffer from the consequences still.”  Certainly the General Synod can be partisan, as more recent transactions such as those over the ordination of women bishops show. But so could the Church Assembly be that preceded the Synod, and a great deal less efficient with it. One would hope that no-one would seriously now argue that the Church of England needs Parliament to help mediate when it can’t make up its own mind (which is what this view seems to me to imply.) If there is partisanship, it isn’t the fault of the Synod as an apparatus, but is about people and culture and an endemic lack of trust between members of the same church. I don’t think Ramsey could have done much to change that, at least.


When using an archive could put it in danger

Towards the end of 2013 the UK saw a public controversy seemingly made to showcase the value of web archives. The Conservative Party, in what I still think was nothing more than a housekeeping exercise, moved an archive of older political speeches to a harder-to-find part of their site, and applied the robots.txt protocol to the content. As I wrote for the UK Web Archive blog at the time:

“Firstly, the copies held by the Internet Archive (archive.org) were not erased or deleted – all that happened is that access to the resources was blocked. Due to the legal environment in which the Internet Archive operates, they have adopted a policy that allows web sites to use robots.txt to directly control whether the archived copies can be made available. The robots.txt protocol has no legal force but the observance of it is part of good manners in interaction online. It requests that search engines and other web crawlers such as those used by web archives do not visit or index the page. The Internet Archive policy extends the same courtesy to playback.

“At some point after the content in question was removed from the original website, the party added the content in question to their robots.txt file. As the practice of the Internet Archive is to observe robots.txt retrospectively, it began to withhold its copies, which had been made before the party implemented robots.txt on the archive of speeches. Since then, the party has reversed that decision, and the Internet Archive copies are live once again.

Courtesy of wfryer on flickr.com, CC BY-SA 2.0 : https://www.flickr.com/photos/wfryer/

Courtesy of wfryer on flickr.com, CC BY-SA 2.0 : https://www.flickr.com/photos/wfryer/

As public engagement lead for the UK Web Archive at the time, I was happily able to use the episode to draw attention to holdings of the same content in UKWA that were not retrospectively affected by a change to the robots.txt of the original site.

This week I’ve been prompted to think about another aspect of this issue by my own research. I’ve had occasion to spend some time looking at archived content from a political organisation in the UK, the values of which I deplore but which as scholars we need to understand. The UK Web Archive holds some data from this particular domain, but only back to 2005, and the earlier content is only available in the Internet Archive.

Some time ago I mused on a possible ‘Heisenberg principle of web archiving‘ – the idea that, as public consciousness of web archiving steadily grows, the consciousness of that fact begins to affect the behaviour of the live web. In 2012 it was hard to see how we might observe any such trend, and I don’t think we’re any closer to being able to do so. But the Conservative party episode highlights the vulnerability of content in the Internet Archive to a change in robots.txt policy by an organisation with something to hide and a new-found understanding of how web archiving works.

Put simply: the content I’ve been citing this week could later today disappear from view if the organisation concerned wanted it to, and was to come to understand how to make it happen. It is possible, in short, effectively to delete the archive – which is rather terrifying.

In the UK, at least, the danger of this is removed for content published after 2013, due to the provisions of Non-Print Legal Deposit. (And this is yet another argument for legal deposit provisions in every jurisdiction worldwide). In the meantime, as scholars, we are left with the uneasy awareness that the more we draw attention to the archive, the greater the danger to which it is exposed.

New resources at Lambeth Palace Library

As in previous years, a little round-up of newly available resources at Lambeth for historians of the twentieth century, derived as usual from the Annual Review, this time for 2014.

The cataloguing of the main run of Archbishops’ Papers has reached 1984, a year which sees Robert Runcie having to deal with the controversial appointment of David Jenkins as bishop of Durham, and the miners’ strike.

Of particular interest to me are the newly catalogued papers of the Council for Foreign Relations dealing with relations with Roman Catholics in the UK (CFR RC 161-193), from the immediate post-war period until the 1980s. Also from the CFR are the papers relating to Lutheran and Reformed church overseas for the key period from 1933 until 1981. Both the series complement my own work on Michael Ramsey.

For historians of evangelicalism, the cataloguing of the papers of John Stott is also complete, including a substantial amount of printed material.

The manuscripts catalogue may be accessed here.

Welcoming the new Journal of Open Humanities Data

After some months in the making, I am delighted to be able to draw attention to the new Journal of Open Humanities Data. I’m particularly pleased to be a member of the editorial board.

Fully peer-reviewed, JOHD carries “publications describing humanities data or techniques with high potential for reuse.”

The journal accepts two kinds of papers:

“1. Metapapers, that describe humanities research objects with high reuse potential. This might include quantitative and qualitative data, software, algorithms, maps, simulations, ontologies etc. These are short (1000 word) highly structured narratives and must conform to the Metapaper template.

“2. Full length research papers that describe different methods used to create, process, evaluate, or curate humanities research objects. These are intended to be longer narratives (3,000 – 5,000 words) which give authors the ability to describe a research object and its creation in greater detail than a traditional publication.

For more detail, see the JOHD at Ubiquity Press.