Lessons from cross-border religion in the Irish web sphere

I’m delighted to announce that I have a chapter accepted (subject to peer review) in a forthcoming book of essays on national web domains, The Historical Web and Digital Humanities: the Case of National Web domains. It is edited by Niels Brügger & Ditte Laursen and will be published by Routledge.

Here’s the abstract

Understanding the limitations of the ccTLD as a proxy for the national web: lessons from cross-border religion in the Northern Irish web sphere

The web archiving community has known for a long while that the country-code top level domain (.uk, .ie) only ever represents a subset (although a very substantial one) of a national web sphere. Every national sphere (when defined as content published within a national jurisdiction) includes very substantial amounts of content that resides within the various geographically non-specific domains, such as .com or .org. However, the current state of knowledge is such that little is known with any certainty about the content that ‘lives’ outside the ccTLD, and what factors determine the choices made by webmasters as to the domain registrations they choose.

The situation is particular complicated in the island of Ireland, since two political units (the UK and the Republic of Ireland) and two ccTLDs (.uk and .ie) share (as it were) a land border. This paper takes the case of the Christian churches in Ireland (north and south) as a case study in the mapping (or lack of it) between the nation and the ccTLD. The churches in Ireland are organised on an all-Ireland basis: a reflection of their origins that pre-date the partition of Ireland into north and south. Using link graph data provided by the British Library, it recreates the networks of links between individual church congregations on both sides of the border, and the national infrastructure of the churches into which local congregations fit. To what extent are these link networks influenced by the fact of the ccTLD – are they denser between churches within the UK ccTLD than between those inside it and those outside? Also, given the historic interlinking of religious allegiance and national identity in the north of Ireland, do these patterns differ between those within (and between) the Protestant denominations – on the one hand – and Roman Catholic churches on the other? Finally, the paper reflects on the issues presented to the scholar in working in the space between one nation with a highly developed web archiving infrastructure (the UK) and another in which web archiving is less well developed (the Republic).

Forthcoming web archive conferences

2017 offers not one but two international conferences for scholars interested in the way we use the archived web. I’m particularly pleased to promote them here as I am a member of the programme committee for both of them.

There are calls for papers open now for both.

Curation and research use of the past Web
(The Web Archiving Conference of the International Internet Preservation Consortium)
Lisbon, 29-30 March 2017
Call for Papers now open

Researchers, practitioners and the archived Web
(2nd conference of ReSAW, the Europe-wide Research Infrastructure for the Study of Archived Web Materials)
London, 14-15 June 2017
Call for Papers now open

Religion in Web history: a survey

I am currently working on a chapter contribution to the forthcoming Sage Handbook to Web History, edited by Megan Sapnar Ankerson, Niels Brugger and Ian Milligan. Although the inclusion of the paper is subject to peer review, here’s my abstract. It should appear some time in late 2017.

“This chapter seeks both to assess the state of current scholarship on online religion, and to suggest potential directions for future research. There are now 20 years of research in the field of Internet Studies in relationship to religious organisations, faith and practice. However, it is less clear that this body of work yet represents a specifically historical inquiry about religion on the Web, although it will in many cases provide the foundation of such work. Much of the research to date has concentrated on the nature of emerging communities of individuals: communities that were either an alternative or a supplement to face-to-face relations in particular localities. This chapter draws out trends emerging in this scholarship over the 25 years of Web history, as the affordances of the Web have developed. Attention has also been paid to the balance of institutional authority and individual self-expression in a religious space that is unregulated, or at least that must be regulated in new ways. The chapter asks how far this scholarship may be integrated into wider histories of offline religious authority and practice, which have themselves undergone shifts and transformations of perhaps equal significance.

“Rather less prominent in the literature so far is the institutional history of religion. Making use of the archived Web in particular, the chapter sketches the outline of a new area of inquiry: the evolution of the religious web sphere, both as a global whole, within each of the global religions and denominations, and at a national level. To what degree has the nature of the Web, a decentralised international network system which contrasts with the hierarchical nature of most religious organisations, moulded the religious web sphere into a different shape? Early studies in this area have suggested that, in certain key ways, the religious web sphere can be read as a reimplementation of older structures of influence, attention and esteem that were visible before, and remain visible offline. Insofar as the religious web does not mirror the traditional offline structure of religious organisations, the chapter also reflects on how far this changed shape may be accounted for by broader trends in religious history, in a period of rapid change. How far does it relate to the recent history of religion in the media more generally?

“At a more abstract level, the chapter will attend to the degree to which the myths of the Web, and indeed of the whole Internet – of a pluralistic, idealistic, liberating force with an agency of its own – have shaped understandings of the Web’s religious history. It examines how far the last quarter century has really been a period of rupture and discontinuity, and how much has in fact stayed the same, or continued on a path on which it was set before the Web appeared. It will also assess how far the field has so far been focussed to excess on the new, to the neglect of understanding the histories of how practices and technologies that were once new become mainstream.

Doing (very) contemporary history with the archived Web: Oxford, June 9th

Details of a lecture I shall give next week:

Title: Doing (very) contemporary history with the archived Web: Rowan Williams, archbishop of Canterbury, and the sharia law controversy of 2008

Date: Thursday, 9th June, 1pm
Venue: Weston Library Lecture Theatre, University of Oxford
Booking details: booking is advisable but not essential. It’s free.

Abstract: The decade following the turn of the millennium may have seen an epochal shift in the nature of the discussion of religion in public life in the UK. The 9/11 attacks in the USA, and the terrorist bombings in London in 2005 prompted an outpouring of anxiety concerning the place of Islam in British society. The period also saw the coming to prominence of the ‘New Atheism’ associated with figures such as Richard Dawkins and Christopher Hitchens. The uniquely privileged position of Christianity, and the Church of England in particular, was also under greater scrutiny than had been the case for decades.

Wikimedia Commons, CC BY SA 2.0, by Brian (of Toronto)

Wikimedia Commons, CC BY SA 2.0, by Brian (of Toronto)

This paper examines a crucial episode of public controversy closely connected to each of these trends: a lecture given in 2008 by Rowan Williams, archbishop of Canterbury, on the accommodation of Islamic sharia law into British law. Using archived web content from the UK Web Archive, held by the British Library, it examines the controversy as it played out on the UK web. It argues that the episode prompted a step-change in both the levels of attention paid to the archbishop’s web domain, and a broadening of the types of organisation which took notice of him. At the same time, it also suggests that the historic media habit of privileging the public statements of the archbishop over those of any other British faith leader was extended onto the web.

The paper uses techniques of both close and distant reading: on the one hand, aggregate link analysis of the whole .uk web domain, and on the other hand, micro analysis of individual domains and pages. In doing so, it demonstrates some of the various ways in which contemporary historians will very soon need to use the archived web to address older questions in a new way, in a new context of super-abundant data.

What do we need to know about the archived web?

A theme that emerged for me in the IIPC web archiving conference in Reykjavik last week was metadata, and specifically: precisely which metadata do users of web archives need in order to understand the material they are using?

At one level, a precise answer to this will only come from sustained and detailed engagement with users themselves; research which I would very much hope that the IIPC would see as part of its role to stimulate, organise and indeed fund. But that takes time, and at present, most users understand the nature of the web archiving process only rather vaguely. As a result, I suspect that without the right kind of engagement, scholars are likely (as Matthew Weber noted) to default to ‘we need everything’, or if asked directly ‘what metadata do you need?’ may well answer ‘well, what do you have, and what would it tell me?’

During my own paper I referred to the issue, and was asked by a member of the audience if I could say what such enhanced metadata provision might look like. What I offer here is the first draft of an answer: a five-part scheme of kinds of metadata and documentation that may be needed (or at least, that I myself would need). I could hardly imagine this would meet every user requirement; but it’s a start.

1. Institutional
At the very broadest level, users need to know something of the history of the collecting organisation, and how web archiving has become part of its mission and purpose. I hope to provide a overview of aspects of this on a world scale in this forthcoming article on the recent history of web archiving.

2. Domain or broad crawl
Periodic archiving of a whole national domain under legal deposit provisions now offers the prospect of the kind of aggregate analysis that takes us way beyond single-resource views in Wayback. But it becomes absolutely vital to know certain things at a crawl level. How was territoriality determined – by ccTLD, domain registration, Geo-IP lookup, curatorial decision? The way the national web sphere is defined fundamentally shapes the way in which we can analyse it. How big was the crawl in relation to previous years? How many domains are new, and how many have disappeared? What’s the policy on robots.txt (by default) ? How deep was the crawl scope (by default)? Was there a data cap per host? Some of this will already be articulated in internal documents, some will need some additional data analysis; but it all goes to the heart of how we might read the national web sphere as a whole.

3. Curated collection level
Many web archives have extensive curated collections on particular themes or events. These are a great means of showcasing the value of web archives to the public and to those who hold the pursestrings. But if not transparently documented they present some difficulties to the user trying to interpret them, as the process introduced a level of human judgment to add to the more technical decisions that I outlined above. In order to evaluate the collection as a whole, scholars really do need to know the selection criteria, and at a more detailed level than is often provided right now. In particular, in cases where permissions were requested for sites but not received, being able to access the whole list of sites selected rather than just those that were successfully archived would help a great deal in understanding the way in which a collection was made.

4. Host/domain level
This is the level at which a great deal of effort is expended to create metadata that looks very much like a traditional catalogue record: subject keywords, free-text descriptions and the like. For me, it would be important to know when the first attempt to crawl a host was, and the most recent, and whether there were 404 responses received for crawl attempts at any time in between. Was this host capped (or uncapped) at the discretion of a curator differentially to the policy for a crawl as a whole? Similarly, was the crawl scoping different, or the policy on robots.txt? If the crawl incorporates a GeoIP check, what was the result? Which other domains has it redirected to, and which redirect to it, and which times?

5. Individual resource level
Finally, there are some useful things to know about individual resources. As at the host level, information about the date of the first and last attempts to crawl, and about intervening 404s, would tell the user useful things about what we might call the career of a resource. If the resource changes, what is the profile of that: for instance, how has the file size changed over time? Were there other captures which were rejected, perhaps on a QA basis, and if so, when?

Much if not quite all of this could be based on data which is widely collected already (in policy documents, or curator tools, crawl logs or CDX) or could be with some adjustment. It presents some very significant GUI design challenges in how best to deliver these data to users. Some might be better delivered as datasets for download or via an API. What I hope to have provided, though, is a first sketch of an agenda for what the next generation of access services might disclose, that is not a default to ‘everything’ and is feasible given the tools in use.

Towards a cultural history of web archiving

This week I’m writing the first draft of a chapter on the cultural history of web archiving, for a forthcoming volume of essays (details here). It is subject to peer review and so isn’t yet certain to be published, but here’s the abstract.

I should welcome comments very much, and there may also be a short opportunity for open online peer review.

Users, technologies, organisations: towards a cultural history of world web archiving

‘As systematic archiving of the World Wide Web approaches its twentieth anniversary, the time is ripe for an initial historical assessment of the patterns in which web archiving has fallen. The scene is characterised by a highly asymmetric pattern, involving a single global organisation, the Internet Archive, alongside a growing number of national memory institutions, many of which are affiliated to the International Internet Preservation Consortium. Many other organisations also engage in archiving the web, including universities and other institutions in the galleries, libraries, archives and museums sector. Alongside these is a proliferation of private sector providers of web archiving services, and a small but highly diverse group of individuals acting on their own behalf. The evolution of this ecosystem, and the consequences of that evolution, are ripe for investigation.

‘Employing evidence derived from interviews and from published sources, the paper sets out to document at length for the first time the development of the sector in its institutional and cultural aspects. In particular it considers how the relationship between archiving organisations and their stakeholders has played out in different circumstances. How have the needs of the archives themselves and their internal stakeholders and external funders interacted with the needs of the scholarly end users of the archived web? Has web archiving been driven by the evolution of the technologies used to carry it out, the internal imperatives of the organisations involved, or by the needs of the end user?

What’s in a (top-level domain) name?

I think there would be general agreement amongst web archivists that the country code top-level domain alone is not the whole of a national web. Implementations of legal deposit for the web tend to rely at least in part on the ccTLD (.uk, or .fr) as the means of defining their scope, even if supplemented by other means of selection.

However, efforts to understand the scale and patterns of national web content that lies outside national ccTLDs are in their infancy. An indication of the scale of the question is given by a recent investigation by the British Library. The @UKWebArchive team found more than 2.5 million hosts that were physically located in the UK without having .uk domain names. This would suggest that as much as a third of the UK web may lie outside its ccTLD.

And this is important to scholars, because we often tend to study questions in national terms – and it is difficult to generalise about a national web if the web archive we have is mostly made up of the ccTLD. And it is even more difficult if we don’t really understand how much national content there is outside that circle, and also which kinds of content are more or less likely to be outside the circle. Day to day, we can see that in the UK there are political parties, banks, train companies and all kinds of other organisations that ‘live’ outside .uk – but we understand almost nothing about how typical that is within any particular sector. We also understand very little about what motivates individuals and organisations to register their site in a particular national space.

So as a community of scholars we need case studies of particular sectors to understand their ‘residence patterns’, as it were: are British engineering firms (say) more or less likely to have a web domain from the ccTLD than nurseries, or taxi firms, or supermarkets? And so here is a modest attempt at just such a case study.

All the mainstream Christian churches in the island of Ireland date their origins to many years before the current political division of the island in 1921. As such, all the churches are organised on an all-Ireland basis, with organisational units that do not recognise the political border. In the case of the Church of Ireland (Anglican), although Northern Ireland lies entirely within the province of Armagh (the other province being Dublin), several of the dioceses of the province span the border, such that the bishop must cross the political border on a daily basis to minister to his various parishes.

Anglican Ireland. (Church of Ireland, via WIkimedia Commons, CC BY-SA 3.0)

Anglican Ireland. (Church of Ireland, via Wikimedia Commons, CC BY-SA 3.0)

How is this reflected on the web? In particular, where congregations in the same church are situated in either side of the border, where do their websites live – in .uk, or in .ie, or indeed in neither?

I have been assembling lists of individual congregation websites as part of a larger modelling of the Irish religious webspace, and one of these is the Presbyterian Church of Ireland. My initial list contains just over two hundred individual church sites, the vast majority of which are in Northern Ireland (as is the bulk of the membership of the church). Looking at Northern Ireland, the ‘residence pattern’ is:

.co.uk – 23%
.org.uk – 20%
.com – 17%
.org – 37%
Other – 3%

In sum, less than half of these sites – of church congregations within the United Kingdom – are ‘resident’ within the UK ccTLD. A good deal of research would need to be done to understand the choices made by individual webmasters. However, it is noteworthy that, for Protestant churches in a part of the world where religious and national identity are so closely identified, to have a UK domain seems not to be all that important.

1. My initial list (derived from one published by the PCI itself) represents only sites which the central organisation of the denomination knew existed at the time of compilation, and there are more than twice as many congregations as there are sites listed. However, it seems unlikely that that in itself can have skewed the proportions.

2. For the very small number of PCI congregations in the Republic of Ireland (that appear in the list), the situation is similar, with less than 30% of churches opting for a domain name within the .ie ccTLD. However, the number is too small (26 in all) to draw any conclusions from it.