The silence of the archive. A review

[A review that appeared first in the LSE Review of Books.]

David Thomas, Simon Fowler and Valerie Johnson.
The Silence of the Archive.
Facet, 2017.

In the past two to three decades, the archival profession has been caught between two currents of cultural and technological change: simultaneous, largely unrelated, both apparently inexorable. Largely confined to the academy, but resonating beyond it, has been a radical scepticism about the stability of meaning in language resulting from the postmodern turn in historical thinking. Coupled with this epistemological scepticism has been a hermeneutic of suspicion of the power relations that are embedded in the creation, description and accessing of archival records. This has been bound up with the emergence of a wider politics of identity, and the assertion of the experience of marginalised groups as being equally worthy of documentation and study as those more ‘official’ voices that have traditionally dominated archives.

At much the same time, the transition from paper to digital in records management and archiving has presented the profession with challenges of exceptional scale and complexity, as laid out by David Thomas, former Director of Technology at the National Archives of the UK, in Chapter Three of this fascinating book. This transformation has fundamentally changed the ways in which live records are created and managed by organisations, with the significant added risk of mis-description as frontline staff are pressed into becoming their own archivists, and also of discontinuity in working IT systems such that data is lost or rendered uninterpretable. As these records pass to the archive, new and intractable challenges of scale come into play as archivists must select content for archiving and appraise it, presenting the difficulty of finding effective ways of describing these records and designing access systems that meet the needs of users.

For most working historians, much of the ferment of the discussion that these changes have prompted amongst archivists and theorists has been largely obscure; most of the literature that the authors (all three of them present or former TNA staff) synthesise here is to be found in the journals of the archival profession, into which historians rarely look. For those scholars whose only contact with archives is in the search room, this book will likely come as something of a revelation of just how far-ranging and radical some of that thought has been in the last ten years, and should be widely read for that reason alone. One might expect it also to find its way onto reading lists for introductory courses in the methods of archival research. It is therefore a matter for regret that the book, even in paperback form, is priced at a level that makes it unlikely that it will find its way into many private collections.

As a whole, the book has two major themes, one of which is acknowledged by both the title and the back cover, and another, equally if not more important, which is everywhere implied but rarely stated (to which this review returns below). Firstly, the theme of the title: the silence of the archive. The authors, along with Anne J. Gilliland in the foreword, identify an image that has formed in the public imaginary of the archive as a comprehensive repository of all known facts about the past. Scholars will differ on how potent and pervasive that image is, but the authors set out to show firstly that archives are neither comprehensive in this sense nor purely objective, even supposing such a state were possible.

Chapter One, by Simon Fowler, deals with ‘enforced silences’, whereby organisations conceal, amend or destroy records before they reach the archive, or where (as an unintended consequence of freedom of information legislation) records are never created as business is transacted informally. All manner of decisions are then made as the archivist selects which records should be preserved, appraises those records that are selected and removes material in the process, and then catalogues records in ways that bring certain aspects of a record to the fore while effectively silencing other voices. In addition, neither the transient quality of everyday life nor the lives of the majority of the population often come under the gaze of the state and so leave few traces (Chapter Six by Valerie Johnson is instructive on the ways in which marginalised communities may be intentionally brought into view and their stories documented as a result.)

Professional historians are of course accustomed to engaging critically with the ways in which their archival sources come into being, but they will benefit nonetheless from this wide-ranging survey of the particular issues. In several places, however, a strangely critical note is struck: a suppressed frustration with the users of archives and their apparent inability to understand the issues. In Chapter Two, ‘Inappropriate Expectations’, Fowler quotes the historian Nicholas Rodger on the distaste of staff of the Public Record Office when asked to provide subject indexes: to do so ‘would imply that the Office had a duty to provide something the public wanted, instead of the public having a duty to shift for itself and leave the archivists in peace’ (54).

While this whole book is a testament to how far those kinds of attitudes have been eclipsed, glimpses occasionally show through. Archivists, we are told on page 45, are familiar with being ‘bombarded’ with questions which cannot be answered, by users who ‘struggle to understand’ the issues (60). Johnson writes (after Lisa Jardine) of the ‘longing of historians and researchers to find that golden key which will unlock the secret they are investigating’, which in some cases leads to false assumptions about evidence that does not in fact exist and (at the extreme) to the sorts of conspiracy theorising, fictionalisation and fabrication that Thomas explores in Chapter Five. Whilst some researchers can and do cross this line, the experience of this reviewer, at least, is that such cases are rare, and are perhaps overstressed here. Most historians are able to control their longing. That said, archive users, for their part, have no doubt been guilty of failing to appreciate the role of the archivist as something more than a mere fetcher and carrier of files as Johnson notes (146): there is work to do perhaps on both sides of the relationship.

To a certain extent, the book is let down by its title and chapter headings, since the focus on what is not possible obscures a more hopeful and arguably more important thread which appears explicitly only on page 141. Johnson asks where the responsibility for the documentation of society lies, and answers: ‘it has been the implicit argument of this book that we are all responsible, whether as creators of records or professional curators of those papers, or as users, researchers, historians and informed citizens.’ At this point, this reviewer must declare an interest, as one working to facilitate precisely this better working between archives and the users of their digital services.

Nonetheless, The Silence of the Archive is throughout a call for a new relationship between archivists, the ‘archival subjects’ (those whose lives are documented) and those who use the archived record. Johnson writes of the process whereby those archival subjects are engaged in the process of creating the archive of their existence, thus becoming co-creators with the archivist (149-53). Thomas points out the acute need in a digital archive for close engagement with end users, both in the selection of material and in the design of the interfaces that make those records first discoverable and then usable (70-72). It is a shame, then, that this call for change – necessary and urgent – is somewhat muted here; indeed, in general, the authors have a tendency to quote and expound the work of others rather than elaborate an argument, and could have been bolder. However, it is a case that should be widely heard. Records managers, archivists, historians and other users of archives should read this timely and important book.

New article: Users, technologies, organisations – towards a cultural history of Web archiving

This article is now published, in Web 25. Histories from 25 Years of the World Wide Web, edited by Niels Brügger. It is published by Peter Lang, in hardback, paperback and ebook formats. A postprint version is available to download here.

From the Introduction:

If 2015 marked the elapse of 25 years since the birth of the web, 2016 marked the 20th anniversary of web archiving: of systematic attempts to preserve web content and make it accessible to scholars and the public. As such, the time is ripe to make an initial assessment of the history of the movement, and the patterns into which it has already fallen. This chapter represents the first attempt to document the subject at length. It concentrates on what might be termed the cultural history of the movement. It does not address the question of how web archiving has been carried out, but why, by whom, and on whose behalf.

Historians have for long known that, in order to interpret archival materials properly, it is first necessary to understand how that archive came into being. Why is a particular object to be found, and not another? What does the archive seek to document, and whose interests does it serve? The last very few years has seen a very welcome growth in interest in the archived web among scholars. However, that interest is not yet accompanied by the necessary familiarity with how the archived web came into being, and to be thus familiar is arguably even more important in this context than for traditional paper-based archives. Older distinctions with which historians are familiar — between published document, ‘grey literature’ and institutional records — have become blurred, as have those between personal and institutional publication. As a result, it has become less clear where the responsibility for preserving which types of content lies among the established institutions in the library and archives field. In addition, the archived web resource is unlike the live version from which it was derived in subtle and complex ways that do not apply to print publications or to manuscripts. If this chapter serves to orient users as to some of the questions they should be asking of their sources, and of the institutions that provide them, it will have achieved its aim.

It falls into the following sections: The Internet Archive / National libraries / The corporate record / Research-driven archiving / Activist archiving / Users and the future

Reflections on Web Archiving Week 2017

Once in a while, the unplanned turns out to be as good if not better than the planned. It had not been the intention that the annual Web Archiving Conference of the IIPC should be combined with the second conference of ReSAW (the Research Infrastructure for the Study of Archived Web Materials). However, they came together in London last week, with intriguing results.

One of the great pleasures of the event is the diversity of both speakers and delegates: the institutions represented by the IIPC were there in strength, but also present were the largest assemblage of researchers I have yet seen. These include not only people from computer science and related fields – a group that has been engaged in this space for a while – but also an enlarged contingent of scholars of media and communications and several of the humanities disciplines. At the Archives Unleashed datathon on Monday and Tuesday there was a particularly creative meeting of scholars, technologists and archivists – the crucial nexus of relationships for making successful tools and services. The whole week was marked (for me) by a refreshing openness to the perspectives of others, a frankness about difference, and a collegiality without hierarchy which (if it can be sustained) bodes very well for the future.

If I compare this discussions last week with those in this community perhaps three or four years ago, a number of differences stand out. As I’ve tried to show in my short history of Web archiving, direct engagement between archiving institutions and researchers came relatively late in that twenty year history, and even four years ago there was still a sense that researcher engagement was still only very exploratory. We now seem to have reached the stage where substantial attention is being paid to understanding the needs of users as a preliminary step to developing new tools and services (of which there were also many exciting examples). Here I’m bound to mention the research study that I (as Webster Research and Consulting) carried out for the Parliamentary Archives, which Chris Fryer and I presented, but I also have in mind papers on citation practice (Nyvang et al), the research data management issues involved (Zierau and Jurik), and what users need to know about the materials they use (ie. what to do about descriptive metadata), a theme taken up variously by Bingham, Dooley et al, Maemura et al. The variety of different use cases both discussed in the abstract and demonstrated in concrete reminded me of how varied the user base for web archives is (or could be) and how much we need as fine-grained an understanding of those different users as possible. As Ben Steinberg of Harvard noted ‘How we [ie. the providers of services] think archives should or could be used may not be as pertinent as we imagine…’

Another theme for researchers that surfaced several times at the first ReSAW conference in Aarhus two years ago was the need to understand the offline as context for the online. In Aarhus the particular point was about the need for oral history and for analysis of print and manuscript sources to understand how web materials make it online to begin with, and the theme was taken up last week by Federico Nanni and (in passing) by Gareth Millward and Richard Deswarte. There were also reminders here that a full history of the Web will need to take account of the history of computing more generally (Baker and Geiringer), the interaction between the Web proper and other content delivered online, notably social media (Castex, Schafer et al, Day Thomson), as well as the wider social and intellectual context in which the Web is embedded (Schroeder, and my own paper on the religious language of the Web) .

What of the future? Delegates who followed the same tracks as me may have come away with a sense of the diversity of analytic approaches to the study of the Web, and impressed with the depth at which scholars are now seeking to understand the methodological challenges they face. The aim, however, must be to build on this reflection to a point at which the Web archive becomes simply one type of scholarly source amongst many in the production of substantive scholarly insight on history, sociology or literature as Gareth Millward noted. I look forward to the day when I can go to mainstream historical conferences and hear contemporary history written using the archived Web.

There is also, I think, a challenge to the community at large in navigating a path through the diversity of new technical development and analytical need on display here, to decide which elements best serve users in particular situations, and so should brought forward and made part of ‘business as usual’ operations. Some will be incorporated by web archives themselves, others maintained by communities of interested scholars, others probably commercialised. The IIPC has a part to play here, while remembering that a significant part of this new thinking is taking place outside the membership. At least one person on Twitter thought a combined conference like this was worth repeating, and it would certainly be a way of developing the listening process between archives, users and developers that is required.

Finally: I celebrated the diversity of the conference when viewed in terms of professional background, but in another sense there is still much to do in terms of geography. I counted some 17 or 18 nationalities represented here, a joyous thing in a fragmenting world, but nonetheless overwhelmingly from Europe and north America. The archiving and study of the Web, a global medium, still remains dominated by certain countries.

My thanks are due to all those involved in organising such an excellent event: Jane Winters as host at the School of Advanced Study (University of London), and Olga Holownia of the IIPC and my former colleagues at the British Library which also contributed most significantly. It was my pleasure to be a part of both the IIPC and the ReSAW programme committee, and to hear such a fine set of papers.

 

Religion, law and national identity in the archived Web: new article

I’m delighted to say that an article of mine has appeared this week in a new collection of essays, edited by Niels Brügger and Ralph Schroeder: The Web as History (London: UCL Press, 2017, ISBN: 9781911307563).

My article is ‘Religious discourse in the archived web: Rowan Williams, Archbishop of Canterbury, and the sharia law controversy of 2008’ (pp. 190-203). It examines the controversy over a public lecture given by the archbishop on the interaction of civil and religious law, but from a new angle: the imprint the controversy left in the archive of the UK web. It makes particular use of British Library data documenting the link structure of the .uk country code top level domain for the period 1996-2010.

The whole thing is available as an Open Access PDF, but here’s my conclusion.

It is a brave historian who attempts to interpret the very recent past, as opposed to merely documenting it. As with most aspects of very recent history, the full significance of Rowan Williams’ lecture about sharia law will only become clear as the passage of time grants the historian a sufficiently long perspective from which to view it. An exhaustive qualitative examination of both the published record, and memoirs and private papers that are as yet inaccessible (not least the papers of the archbishop himself, not due to be released until 2038) will be needed to place the episode in its fullest context. Without these, we cannot yet know how changes in patterns of communication that are observable in the archived web were motivated, or how opinions expressed online related to broader patterns of social and intellectual change. However, even if it is difficult to explain changing patterns of religious discourse on the web, we may nonetheless document those changes.

First, the sharia law episode prompted a step-change in the levels of attention paid to the domain of the archbishop of Canterbury, as evidenced by the incidence of inbound links, and also a broadening of the types of hosts that contained those links. Second, a comparison of the inbound links to the Canterbury domain to that of the archbishop of York suggests that the historic privilege given to the views of Canterbury over those of York was extended onto the web. Regardless of their actual status in relation to each other within the Church of England, the media and the public at large seemed only to pay attention to Canterbury. Finally, a qualitative examination of the site of the British National Party shows that at least one organization, with a very particular concern with the place of Islam in British life, certainly took new account of the person of the archbishop as a result of the 2008 controversy.

This chapter has also sought to use the episode as a means of demonstrating both the potential for historians to utilize the archived web to address older questions in a new way, and some of the particular issues of method that web archives present. At one level, the methodological complications presented here – understanding the meaning of a link from one resource to another, say – are peculiar to the archived web and must be understood anew. As with all other born- digital sources, there is work to be done amongst historians in understanding these issues of method, and in acquiring the skills needed to handle data at scale. At the same time, it is part of the historian’s stock- in- trade to assess the provenance of a body of sources, its completeness and the contexts in which those sources were transmitted and received. The task at hand is in fact the application of older critical methods to a new kind of source: a challenge which historians have confronted and overcome before.

This chapter has also tried to show some of the potential available to historians, should they accept the challenge. In the study of public controversy, the archived web allows the detection of changing communication patterns at scale that would be impossible using a traditional qualitative method. It also enables the detection of attention being paid online in places where a scholar would not think to look. More generally, the chapter has attempted to outline an approach that combines quantitative readings of the links in web archives with qualitative examination of particular subsets of resources. When dealing with a new superabundance of historical sources, a combination of distant and close reading will be required to understand the archived web.

Lessons from cross-border religion in the Irish web sphere

[UPDATED: 21 June 2018. This book is now in production and should appear before the end of 2018.]

I’m delighted to announce that I have a chapter accepted (subject to peer review) in a forthcoming book of essays on national web domains, The Historical Web and Digital Humanities: the Case of National Web domains. It is edited by Niels Brügger & Ditte Laursen and will be published by Routledge.

Here’s the abstract.

Understanding the limitations of the ccTLD as a proxy for the national web: lessons from cross-border religion in the Northern Irish web sphere

The web archiving community has known for a long while that the country-code top level domain (.uk, .ie) only ever represents a subset (although a very substantial one) of a national web sphere. Every national sphere (when defined as content published within a national jurisdiction) includes very substantial amounts of content that resides within the various geographically non-specific domains, such as .com or .org. However, the current state of knowledge is such that little is known with any certainty about the content that ‘lives’ outside the ccTLD, and what factors determine the choices made by webmasters as to the domain registrations they choose.

The situation is particularly complicated in the island of Ireland, since two political units (the UK and the Republic of Ireland) and two ccTLDs (.uk and .ie) share (as it were) a land border. This chapter takes the case of the Christian churches in Ireland (north and south) as a case study in the mapping (or lack of it) between the nation and the ccTLD. The churches in Ireland are organised on an all-Ireland basis: a reflection of their origins that pre-date the partition of Ireland into north and south.

The chapter makes two distinct but related points. It investigates the degree to which the differing historic attitudes of Protestant and Roman Catholic churches in Northern Ireland to national identity are reflected in patterns of domain registration. Based on data for 2015 and 2016, it shows that Roman Catholic congregations were more likely to register domains outside the .uk ccTLD. However, there was no corresponding prioritisation of registration within .uk for the several Protestant denominations. If organisations which might be expected to register their web estate within a particular national domain do not in fact do so, it suggests that the ‘gravitational pull’ of the ccTLD is weak. Focussing on the Baptist churches in particular, the chapter also shows that the network of links between the individual Baptist church congregations on both sides of the border between 1996 and 2010 was both tightly focussed around the churches in Northern Ireland, and also highly localised within one part of the province, whilst at the same time being spread across four TLDs. While offline patterns of numeric strength and geographic concentration are reflected online, in this case at least, they map only very loosely to the ccTLD. As such, it would seem that the ccTLD on its own is a weak proxy for the national web.