Forthcoming web archive conferences

2017 offers not one but two international conferences for scholars interested in the way we use the archived web. I’m particularly pleased to promote them here as I am a member of the programme committee for both of them.

There are calls for papers open now for both.

Curation and research use of the past Web
(The Web Archiving Conference of the International Internet Preservation Consortium)
Lisbon, 29-30 March 2017
Call for Papers now open

Researchers, practitioners and the archived Web
(2nd conference of ReSAW, the Europe-wide Research Infrastructure for the Study of Archived Web Materials)
London, 14-15 June 2017
Call for Papers now open

Doing (very) contemporary history with the archived Web: Oxford, June 9th

Details of a lecture I shall give next week:

Title: Doing (very) contemporary history with the archived Web: Rowan Williams, archbishop of Canterbury, and the sharia law controversy of 2008

Date: Thursday, 9th June, 1pm
Venue: Weston Library Lecture Theatre, University of Oxford
Booking details: booking is advisable but not essential. It’s free.

Abstract: The decade following the turn of the millennium may have seen an epochal shift in the nature of the discussion of religion in public life in the UK. The 9/11 attacks in the USA, and the terrorist bombings in London in 2005 prompted an outpouring of anxiety concerning the place of Islam in British society. The period also saw the coming to prominence of the ‘New Atheism’ associated with figures such as Richard Dawkins and Christopher Hitchens. The uniquely privileged position of Christianity, and the Church of England in particular, was also under greater scrutiny than had been the case for decades.

Wikimedia Commons, CC BY SA 2.0, by Brian (of Toronto)

Wikimedia Commons, CC BY SA 2.0, by Brian (of Toronto)

This paper examines a crucial episode of public controversy closely connected to each of these trends: a lecture given in 2008 by Rowan Williams, archbishop of Canterbury, on the accommodation of Islamic sharia law into British law. Using archived web content from the UK Web Archive, held by the British Library, it examines the controversy as it played out on the UK web. It argues that the episode prompted a step-change in both the levels of attention paid to the archbishop’s web domain, and a broadening of the types of organisation which took notice of him. At the same time, it also suggests that the historic media habit of privileging the public statements of the archbishop over those of any other British faith leader was extended onto the web.

The paper uses techniques of both close and distant reading: on the one hand, aggregate link analysis of the whole .uk web domain, and on the other hand, micro analysis of individual domains and pages. In doing so, it demonstrates some of the various ways in which contemporary historians will very soon need to use the archived web to address older questions in a new way, in a new context of super-abundant data.

Welcoming the new Journal of Open Humanities Data

After some months in the making, I am delighted to be able to draw attention to the new Journal of Open Humanities Data. I’m particularly pleased to be a member of the editorial board.

Fully peer-reviewed, JOHD carries “publications describing humanities data or techniques with high potential for reuse.”

The journal accepts two kinds of papers:

“1. Metapapers, that describe humanities research objects with high reuse potential. This might include quantitative and qualitative data, software, algorithms, maps, simulations, ontologies etc. These are short (1000 word) highly structured narratives and must conform to the Metapaper template.

“2. Full length research papers that describe different methods used to create, process, evaluate, or curate humanities research objects. These are intended to be longer narratives (3,000 – 5,000 words) which give authors the ability to describe a research object and its creation in greater detail than a traditional publication.

For more detail, see the JOHD at Ubiquity Press.

Understanding the web of faith: forthcoming book chapter

I’m very pleased to say that an essay of mine has been accepted for a forthcoming volume: The Web as History: the first two decades. It is edited by Niels Brügger and Ralph Schroeder, and will appear Open Access with UCL Press in 2016.

Here’s my abstract:

‘Much of the discourse that historians of contemporary religion until recently tracked in correspondence, periodical publication and print ephemera has migrated online. But the task of understanding religious discourse in the UK web space has hardly begun. The task is hard to undertake at the highest level since there are no second-level domains that serve as useful units of analysis — there is no faith.uk to match nhs.uk or ac.uk.

‘This chapter represents a first step towards understanding the evolution of the UK religious web space, by means of two interrelated case studies, which between them point to the agenda and content of a larger research project. Both studies utilise the JISC UK Web Domain Dataset for the period 1996-2008, as held by the British Library.

‘Firstly, it will examine the web archive footprint left by the public controversy in 2008 over the comments made by Rowan Williams, archbishop of Canterbury, on the matter of sharia law. Using both the link graph and a direct qualitative analysis of archived content, it will explore both the shape and the content of the controversy and show the degree to which religious debate had not only migrated from print to the web, but in doing so had engaged different actors and lost others, and changed in its tone.

‘Secondly, it will consider the growing tension in religious discourse between faith groups and organisations with a secularist agenda. Again, using the link graph and some qualitative analysis, it will explore the patterns in which linkages grew and shifted between the web estates of key but opposed organisations in relation to issues including faith schools and creationism, the reform of the law on blasphemy, and the place of the bishops in the House of Lords.

Will historians of the future be able to study Twitter?

Over the last year or so, the IHR Digital History seminar has become increasingly web-focussed, which is of course of interest to me (if not necessarily to everyone.) Last week we had an excellent paper from Jack Grieve of Aston University on the tracking of newly emerging words as they appeared in large corpora of tweets from the UK and the US. By amassing very large tweet datasets, he and his colleagues are able to observe the early traces of newly emerging words, and also (when those tweets were submitted from devices which attach geo-references) to see where those new words first appear, and how they spread. Jack and his colleagues are finding that words quite often emerge first (in the US) in the east and south-east (or California) and then spread towards the centre of the continent. They don’t necessarily spread in even waves across space, or even spring between urban centres and then to rural areas (as would have been my uneducated guess). Read more at the project site, treets.net, or watch the paper.

This kind of approach is quite impossible without the kind of very large-scale natural language data such as social media afford. This is particularly so as most words are (perhaps counter-intuitively) rather rare. In the corpus in question, the majority of the 67,000 most common words appear only once in 25 million words. Given this, datasets of billions of tweets are the minimum size necessary to be able to see the patterns.

It was interesting to me as a convenor to see the rather different spread of people who came to this paper, as opposed to the more usual digital history work the seminar showcases. Jack focussed on tweets posted since 2013; a time span that even the most contemporary historian would struggle to call their own; and so not so many of them came along – but we had perhaps our first mathematician instead. This was a shame, as Jack’s paper was a fascinating glimpse into the way that historical linguistics, and indeed other types of historical enquiry, might look in a couple of decades’ time.

But there is a caveat to this, which was beyond the scope of Jack’s paper, to do with the means by which this data will be accessible to scholars of 2014 working in (say) 2044. Jack and his colleagues work directly from the so-called Twitter “firehose”; they harvest every tweet coming from the Twitter API, and (on their own hardware) process each tweet and discard those that are not geo-coded to within the study area. This kind of work involves considerable local computing firepower, and (more importantly) is concerned with the now. It creates data in real time to answer questions of the very recent past.

Researchers working in 2044 and interested in 2014 may well be able to re-use this particular bespoke dataset (assuming it is preserved – a different matter of research data management, for another post sometime). However, they may equally well want to ask completely different questions, and so need data prepared in a quite different way. Right now, the future of the vast ocean of past tweets is not certain; and so it is not clear whether the scholar of 2044 will be able to create their own bespoke subset of data from the archive. The Library of Congress, to be sure, are receiving an archive of data from Twitter; but the access arrangements for this data are not clear, and (at present) are zero. So, in the same way that historians need to take some ownership of the future of the archived web, we need to become much more concerned about the future of social media: the primary sources that our graduate students, and their graduate students in turn, will need to work with two generations down the line.

Certainly, historians have always been used to working around and across the gaps in the historical record; it’s part of the basic skillset, to deal with the fragmentary survival of the record. But there is right now a moment in which major strategic decisions are to be made about that survival, and historians need to make themselves heard.

[This post also appears on the IHR Digital History Seminar blog.]

Religion, social media and the web archive

Late last year I was delighted to be invited to be one of four keynote speakers at a workshop on religion and social media at the International AAAI Conference on Web and Social Media in Oxford in May. Here are some initial thoughts on what I intend to say.

There has been an interesting upswing recently in scholarly interest in the ways in which religious people, and the organisations in which they gather together, represent themselves and communicate with others on social media. However, this work has been conducted relatively independently from the emerging body of scholarship on the archived web.

There are some reasons for this. First is the fact that much of the scholarship on social media tends to be focussed very firmly on the present. As such, data tends to be gathered directly from social media platforms “to order”, to match the particular research questions in view, and does not engage the various web archives that are in existence, whether at national libraries or the Internet Archive.

The second reason (which may indeed be the more important) is that traditional web archiving has limited success in archiving social media content. There are several well-documented reasons for this, not least the significant technical difficulties in capturing the content as it is presented in user interfaces such as that for Twitter or YouTube. Also, the data gathered is wrapped up in its presentation layer, rather than being neatly organised as a dataset for analysis. Aside from these technical challenges, the very social nature of social media – with multiple content creators co-existing and interacting on the same platform – adds considerable complexity to the task of the web archivist of determining which content can be archived under existing legal deposit frameworks.

So much for the reasons; but this gap between social media research and the archived web needs to be closed, because part of the story is missed. If we want to understand the evolution of the engagement of churches with social media, then we need to understand the ways in which traditional church websites integrated social media content within themselves, and from what point in time. As well as this, we need to be able to understand the content to which social media users were referring and linking – content which will increasingly often be found only in web archives as it disappears from the live web.

In Oxford, I shall be presenting some small case studies in the development of the web and social media presence of local churches, individuals and national church bodies in England and in Ireland. How quickly did churches begin to integrate their social media channels with their websites – which is to ask, at which point did social media become central to their communication strategies ? This is enabled by data made available from the British Library which covers the period from 1996 until 2013; the period in which social media grew from nothing to the prominence it now holds.

[Updated, 5 June 2015: here are the slides:
[http://www.slideshare.net/pj_webster/slideshelf]