New DPC guidance note on researcher use of the archived Web

My new Technology Watch Guidance Note is now available, entitled How researchers use the archived Web. I was delighted to be commissioned (through Webster Research and Consulting) to write this short note on the current state of the art for the good folk at the Digital Preservation Coalition.

In a context of both novelty and diversity, the Guidance Note is designed to orient DPC member organisations, and others engaged in Web archiving (or intending to be), as to the kinds of uses researchers might expect to make of the content they collect. It is hoped that it may support the development of programmes of user research and engagement, and (in turn) inform collection development policies and the design of discovery and access services.

It deals briefly with two questions: what in the archived Web are scholars studying, and how are they studying it?

It is  freely available from the DPC’s Knowledge Base (PDF)

Read more details of Webster Research and Consulting’s research services.

Research data management and Web archive studies: a missing piece

A short presentation I gave today to the inaugural meeting of WARCnet, a research network researching web domains and events. It is funded by the Independent Research Fund Denmark. (Due to the COVID-19 outbreak, the meeting occurred online rather than in Aarhus, and I recorded my presentation in my office in the UK.)

I briefly survey the development of ‘Web archive studies’ in recent years, pointing out that the management and reuse of research data is a missing piece in the makeup of the discipline. Though I do not have a solution to propose, I called on the WARCnet network to consider the issue, as one that poses a significant systemic risk.

Digital archaeology in the web of links: reconstructing a late Nineties web sphere

At the moment I have a chapter contribution to a book of essays working its way through the publication process. It isn’t yet formally accepted for publication, but I thought I would share details of it now. The abstract is below; I’d be very happy to share the paper privately, if people would care to contact me.

Abstract
One unit of analysis within the archived Web is the ‘web sphere’, a body of material from different hosts that is related in some meaningful sense (following, broadly, the definition coined by Niels Brügger). This chapter outlines a method of reconstructing such a web sphere from the late 1990s, that of conservative British Christians as they interacted with each other and with others in the United States in relation to issues of morality, domestic and international politics, law and the prophetic interpretation of world events.

Using an iterative method of interrogation of the graph of links for the archived UK web, it shows the potential for the reconstruction of a web sphere from what is in effect an archive that has a finding aid, but one with only classmarks and without descriptions. It also demonstrates the kind of multi-source investigation necessary to uncover the archaeology of the early Web. Big data and small, printed sources, the traces of previous Web archiving efforts (even when unsuccessful), and echoes in the scholarly record itself: all these come into play.

I also propose a conceptual division of Brügger’s web spheres into two kinds, ‘hard’ and ‘soft’, as distinguished by the ease with which its boundaries can be identified, and the speed with which they change.

On digitisation and the visibility of historic journals

Here follows a tale of two journals, a cautionary tale of the degree to which the historical record is conditioned by the interaction of technology and the economics of publishing.

Firstly, the journal Theology, perhaps the leading general theology journal in the UK. It was founded in 1920, published by the Society for Promoting Christian Knowledge (SPCK), already a leading publisher of books with a particular focus on the Church of England. Although its tone and content changed over time, it has always tended to provide a forum for the publication of theological writing of a breadth of concern that would interest both professional theologians and clerical and lay readers. In it one finds work on the perennial themes of the discipline alongside writing that reflected on the issues of the day, as the Anglican Communion encountered radical theological change and the pressing practical issues raised by the ecumenical movement.

Secondly, the Church Quarterly Review. Though it began life in 1875 as a privately published journal for one party within the Church of England, the CQR became a more general journal, and it too was published by SPCK from 1920. It occupied a similar space to Theology, with substantial articles aimed at both professional theologians and the wider church, and on issues old and contemporary. From the Anglican scholar Eric Mascall (one of my particular preoccupations), the CQR carried articles on topics as varied as the Eucharist, the prospects of reunion with the Church of Scotland, and the impact of the Second Vatican Council, along with dozens of book reviews. (His work also appeared regularly in Theology). But where Theology survives to the present, the CQR does not. In 1968, the journal merged with the London Quarterly and Holborn Review (a Methodist title), but the resulting Church Quarterly ran only until 1971 and was not succeeded by another title.

Although Theology is published by SPCK, its online distribution was taken over in 2011 by SAGE Publications, and the entire back run has been digitised and made available via the SAGE site. As such, scholars may now access complete metadata and the full text of the journal back to its inception. By contrast, the CQR has no public online presence whatever. Unsurprisingly, a defunct journal held little attraction for potential buyers in the great consolidation of online journal publishing of the last twenty years. And, although several SAGE journals are included in JSTOR, Theology (and other SPCK titles) are not. As such, the CQR was not swept up in retrospective digitisation as other defunct titles from publishers involved in JSTOR have been. As it is, to read the CQR I must trouble the staff at my nearest university library to walk across to a store in a separate building and fetch the volumes for me.

There is, I think, an issue here that sits in the intersection of other questions of technology and practice which are better known. It is abundantly clear that current (or very recent) issues of journals that are available online have an advantage over those available only in print, and that the advantage is compounded when the journal is available Open Access. There is now also a great deal of stimulating reflection on the impact of digitised historic sources on historical practice. Within that, it has been observed that the digitisation of newspapers such as The Times earlier than other, equally prominent national newspapers risked skewing readers’ attention towards one source at the expense of another. Despite scholars’ best intentions – of leaving no stone unturned to get to the truth, no matter how heavy the stone – it is at least plausible that more easily accessible sources will be privileged. And the cases of Theology and the CQR suggest that the same might be true in certain fields of modern intellectual history, as the back issues of some current journals are digitised as a byproduct of current needs and others are not. That process of digitisation has tended to favour journals that survive over those that do not, and (in the case of JSTOR), defunct titles seem to stand a better chance if they were absorbed by one that survives.

Of course, it may well be that the CQR is in fact a less significant journal for twentieth century religious history than is Theology. But historical matters become perceived as significant partly as a result of the attention they are paid. It is at least possible that the relative ease of access to Theology will in itself (over time) give it a significance greater than the CQR by a kind of default. If this pattern is repeated in other areas of twentieth century intellectual history, then it perhaps deserves more attention than it has received so far.

References

Josef L. Altholz, ‘The Church Quarterly Review, 1875-1900: a marked file and other sources’, Victorian Periodicals Review 17 (1.2), 1984, 52-7.

Adrian Bingham, ‘The digitization of newspaper archives: opportunities and challenges for historians’, Twentieth Century British History 21(2), 2010, 225-31.

Lara Putnam, ‘The trans-national and the text-searchable: digitised sources and the shadows they cast’, American Historical Review 121(2), 2016, 376-402.

SAGE publications, Press release from 2010 on the digitisation of Theology

Understanding national Web domains

Yesterday I was delighted to find in the mail my copy of an important new book of essays: The Historical Web and Digital Humanities: the case of national Web domains. It is published by Routledge and edited by Niels Brügger and Ditte Laursen.

I say it is important because it investigates for the first time a particular issue that is of immediate practical concerns for two quite distinct groups. The first – Web archivists in the world’s national libraries, and particularly those who work within a legal deposit framework – have sometimes to define and then certainly to work within a definition of the ‘national’ Web, and to understand how much of it they are able to archive. As the volume amply demonstrates, that task of definition is not straightforward, and has been dealt with in widely varying ways.

Outside the small but growing community of Web historians, there are many others (not least contemporary historians) who are not primarily interested in the Web itself, but in what a study of it can tell us about everything else. And the definition of the nation, of a shared but bounded space in which a political community speaks together, is the kind of question which has exercised historians of many periods and of other ‘new’ media. As I wrote in my own chapter:

The advent of the web presents historians with a new and somewhat perplexing question: where is it? What does it mean to think of the web in spatial and quasi-geographic terms? How may we write national histories of the web? Where did a particular website ‘live’? Of where was it a resident or citizen, so to speak?

The volume is important, too, because it explicitly tries to connect Web history with the larger field of digital humanities, where hitherto the two fields have been in only the loosest contact (rather to my surprise, I might add.) It is good to see the volume appear in Routledge’s series on Digital Research in the Arts and Humanities, which also carries work in more ‘traditional’ digital humanities areas.

Finally, the volume marks an important moment in the development of the discipline of Web history. Previous collections (in which my own work also appeared), all of them crucial in their way, have have been more specifically methodological in focus, and have been designed to make the case for the importance and the integrity of the discipline. Although each chapter made a contribution to its own particular field, those previous volumes did not contribute as a group to particular questions of history, or religious studies, or sociology. (See, for instance The Web as History (2017) or Web25 (2017), and the Sage Handbook of Web History (2018). This volume is the first for several years which speaks to a substantive issue of politics, history and sociology, as well as to archival science and the methodology of studying the archived Web.

The chapters fall into three sections: collecting and preserving national domains; methodological issues, and results and dissemination. I won’t try to summarise them here, save to say that each group of readers – archivists and scholars – should read each section, since their concerns overlap. As I’ve argued elsewhere, scholars need to understand more than they do about how archives come into existence, and (in this case) about the administrative histories of particular ccTLDs. Archivists will similarly gain a great deal from the discussions of method and dissemination in the second two parts, since those questions go to the heart of both archiving policy and the design of effective systems for discovery, playback and analysis of the archived Web.

Part One: Collecting and preserving a national Web domain
Kees Teszelszky on ‘reconstructing and saving the Dutch national web using historical methods’.
Sally Chambers, Peter Mechant, & Friedel Geeraert, on the PROMISE project in Belgium: ‘Towards a national web archive in a federated country’.
Ian Milligan and Tom Smyth on the Canadian .ca domain, and studying the web ‘in the shadow of Uncle Sam’.
Helen Hockx-Yu, Ditte Laursen, & Daniel Gomes on the curious case of the .eu domain.

Part Two: Methodological challenges
Jane Winters on the many archives of the UK web space.
Anat Ben-David on Palestine, Kosovo and the quest of national self-determination on the fringe of the Web.
My own chapter on Northern Ireland and the limitations of the ccTLD as proxy for the nation.
Niels Brügger, Ditte Laursen, & Janne Nielsen on establishing a corpus of the Danish web.

Part Three: Results and dissemination
Valérie Schafer explores the French web of the 1990s.
Rebecca Kahn on locating a national museum online (the British Museum).
Niels Brügger proposes a way towards the creation of a national web trend index.