Archiving the “Jewish internet”: some opening questions

[Some words addressed to the fourth world Judaica Curators Conference, held on 4th May 2021 under the aegis of the National Library of Israel. My thanks are due to Raquel Ukeles, head of collections at the NLI, for the invitation.]

I’m not sure how generally familiar you are with Web archiving, but this year marks 25 years since the generally accepted beginning of systematic archiving of the Web on a large scale. And so, there is now a substantial literature on the subject, from the practicalities of the process to the growing and remarkably diverse uses to which archived Web materials are now being put by researchers. Some suggested ways into this reading are on the slide. [Links are given at the end of this post.]
Some selected reading on the subject under discussion, also listed at the end of the post.
What I can most usefully do to set up our discussion today is to raise some initial broad questions as to what a collaborative archiving project for the “Jewish internet” might look like: to, if you like, set up the sheets on the flipchart into which this group may start to add some thoughts.

There are here a set of four basic questions, which are interdependent in several ways. In one sense, the easiest of these is the question of how. Web archiving is a process which tends to cross the departmental divisions in most libraries and archives: it requires curatorial expertise in the selection of material; technical expertise (and also curatorial skills of a particular kind) in managing the harvesting of the materials and maintaining quality; further (and different) technical skills are needed in managing storage, preservation and access; and different skills again in orientating and engaging end users. But within the small but worldwide (and well-integrated) community of people in this space, this set of skills and technologies are well understood and documented. So, though I’ve listed this question first, it is (I suggest) the one that needs answering last.

The question also arises of which institutions should be involved, and how any collaborative project should be configured. Several models are possible:
(i) first of these might be a situation where several institutions agree to parcel out sections of the Jewish internet, and each works largely independently to archive that material in line with their existing collection remits and resources;
(ii) second might be a rather tighter model, in which institutions collaborate directly in the selection of material to archive, but delegating the harvesting, description, preservation and access to a single institution. And that institution might be one of the curating institutions, or (alternatively) an outsourcing partner.

To give some examples, in the UK, the responsibility for web archiving under non-print legal deposit is shared between six institutions, one of which (the British Library) deals with much of the process. Also in the UK Web Archive is the Quaker collection, the content of which was selected by a specialist library but collected again by the British Library. Globally, several libraries have collaborated (through the International Internet Preservation Consortium) on collections on the Olympics Games and most recently Covid-19, delivered via the Internet Archive’s Archive-It service. Within either of these models, there is arguably much to be gained from direct engagement with faith communities themselves, both to build consent to archiving, to guide selection, and to bring forward users and advocates for the final archive.

The third basic question relates to the law, particularly surrounding copyright (but not only that). Some nations have legal deposit frameworks in place for non-print materials, including the UK, France and Denmark, although only a small minority of countries have such a thing, and these dispensations are usually accompanied by very significant restrictions on access. Nations vary in how they construe the notion of fair use in such things, and it is within the relatively liberal regime in the US that the Internet Archive is able to work. Various ways of working are available. Firstly, there is the direct approach to site owners for explicit permission to archive, a slow, resource-intensive and only intermittently successful endeavour. Other institutions have operated on a notice-and-takedown basis, where the owner is contacted and the lack of any expressed objection is taken as sufficient basis to continue; others again archive widely and take material down on request.

The right option for any project has to be determined by a weighing of available resources against the general attitude of each institution to risk. And that risk very much depends on what it is that is to be archived, as different kinds of materials have very different risks associated with them. This is why, I suggest, the first question with which this whole discussion needs to begin is the last on my slide: what *exactly* is to be included and excluded; in other words, what exactly *is* the “Jewish internet”?

A schema for understanding religions online, taken from the chapter 'Religion in Web history', published in The SAGE Handbook of Web History. A link to the full text of this paper is given at the end of the post.

I’m not about to begin giving specific answers to this question to an expert audience such as this. But I offer this organising schema written as from the perspective of an historian of modern Christianity, with now several years experience both on the archiving side and as a research user of the archived Web. What I set out in this particular article was quite tightly restricted to specifically religious texts, activities and organisations. But right away, further questions arise.

First is the question of how far the “Jewish internet” encompasses the Israeli web, given the (I think) unique relationship between state, ethnicity and religion that the state of Israel represents. What does that particular Venn diagram look like? Secondly, there is the broader question of culture, and the issue of language. The Danish legal deposit framework allows for archiving of content in the Danish language, regardless of subject and of where it was published. This kind of framework clearly makes little sense in a British, Spanish or French context, where the language has long since ceased to be the possession of the country in which it originated; what about material in Hebrew published outside Israel?

Consider, too, how far the net should be spread into areas such as the arts. Would an archive of the “Jewish internet” include material relating to the artist Marc Chagall? If so, should it include only his religious works, or the portraits and street scenes too? Just the works created to public commissions, or also the private works? Just those in Israel, or in other countries too, not least the window in Chichester cathedral on the south coast of England? In the case of Leonard Bernstein, should West Side Story – a fairly “secular” work – be set aside, but the ‘Kaddish’ Symphony included? What about Chichester Psalms, settings of the Hebrew scriptures commmissioned for an Anglican cathedral in England? I don’t have ready answers for these questions, but I hope to have helped to set them up as a useful place to start your deliberations.

Further reading

The SAGE Handbook of Web History (2018, ed. Brügger and Milligan)

The Past Web: Exploring Web Archives (Springer, 2021, eds. Gomes/Demidova/Winters/Risse)

Webster, ‘Users, technologies, organisations: towards a cultural history of world Web archiving’, in Brügger (ed.), Web 25 (Peter Lang, 2017). Download the PDF.

Webster, ‘Religion in Web history’ in The SAGE Handbook of Web History. Download the PDF.


Reconstructing a late-Nineties web sphere

I was very pleased to take part this week in ‘Engaging with Web Archives‘, originally supposed to take place at NUI Maynooth in April, but which instead took place online. My thanks to Sharon Healy and Michael Kurzmeier for the original idea, and for their remarkable perseverance in making the final event take place. It’s very good to see an event of this kind happening in Ireland, and I expect it will come to be regarded as a highly significant moment in Irish engagement with the archived Web.

My presentation is available on YouTube. It’s a shortened version of ‘Digital archaeology in the web of links: reconstructing a late-90s web sphere’, forthcoming in Gomes, Davidova, Winters and Risse (eds), The Past Web. Exploring Web archives (Springer, 2021)


New DPC guidance note on researcher use of the archived Web

My new Technology Watch Guidance Note is now available, entitled How researchers use the archived Web. I was delighted to be commissioned (through Webster Research and Consulting) to write this short note on the current state of the art for the good folk at the Digital Preservation Coalition.

In a context of both novelty and diversity, the Guidance Note is designed to orient DPC member organisations, and others engaged in Web archiving (or intending to be), as to the kinds of uses researchers might expect to make of the content they collect. It is hoped that it may support the development of programmes of user research and engagement, and (in turn) inform collection development policies and the design of discovery and access services.

It deals briefly with two questions: what in the archived Web are scholars studying, and how are they studying it?

It is  freely available from the DPC’s Knowledge Base (PDF)

Read more details of Webster Research and Consulting’s research services.

Digital archaeology in the web of links: reconstructing a late Nineties web sphere

At the moment I have a chapter contribution to a book of essays working its way through the publication process. The abstract is below; I’d be very happy to share the paper privately, if people would care to contact me.

A shortened version of the paper is available as a video presentation.

One unit of analysis within the archived Web is the ‘web sphere’, a body of material from different hosts that is related in some meaningful sense (following, broadly, the definition coined by Niels Brügger). This chapter outlines a method of reconstructing such a web sphere from the late 1990s, that of conservative British Christians as they interacted with each other and with others in the United States in relation to issues of morality, domestic and international politics, law and the prophetic interpretation of world events.

Using an iterative method of interrogation of the graph of links for the archived UK web, it shows the potential for the reconstruction of a web sphere from what is in effect an archive that has a finding aid, but one with only classmarks and without descriptions. It also demonstrates the kind of multi-source investigation necessary to uncover the archaeology of the early Web. Big data and small, printed sources, the traces of previous Web archiving efforts (even when unsuccessful), and echoes in the scholarly record itself: all these come into play.

I also propose a conceptual division of Brügger’s web spheres into two kinds, ‘hard’ and ‘soft’, as distinguished by the ease with which its boundaries can be identified, and the speed with which they change.

On digitisation and the visibility of historic journals

Here follows a tale of two journals, a cautionary tale of the degree to which the historical record is conditioned by the interaction of technology and the economics of publishing.

Firstly, the journal Theology, perhaps the leading general theology journal in the UK. It was founded in 1920, published by the Society for Promoting Christian Knowledge (SPCK), already a leading publisher of books with a particular focus on the Church of England. Although its tone and content changed over time, it has always tended to provide a forum for the publication of theological writing of a breadth of concern that would interest both professional theologians and clerical and lay readers. In it one finds work on the perennial themes of the discipline alongside writing that reflected on the issues of the day, as the Anglican Communion encountered radical theological change and the pressing practical issues raised by the ecumenical movement.

Secondly, the Church Quarterly Review. Though it began life in 1875 as a privately published journal for one party within the Church of England, the CQR became a more general journal, and it too was published by SPCK from 1920. It occupied a similar space to Theology, with substantial articles aimed at both professional theologians and the wider church, and on issues old and contemporary. From the Anglican scholar Eric Mascall (one of my particular preoccupations), the CQR carried articles on topics as varied as the Eucharist, the prospects of reunion with the Church of Scotland, and the impact of the Second Vatican Council, along with dozens of book reviews. (His work also appeared regularly in Theology). But where Theology survives to the present, the CQR does not. In 1968, the journal merged with the London Quarterly and Holborn Review (a Methodist title), but the resulting Church Quarterly ran only until 1971 and was not succeeded by another title.

Although Theology is published by SPCK, its online distribution was taken over in 2011 by SAGE Publications, and the entire back run has been digitised and made available via the SAGE site. As such, scholars may now access complete metadata and the full text of the journal back to its inception. By contrast, the CQR has no public online presence whatever. Unsurprisingly, a defunct journal held little attraction for potential buyers in the great consolidation of online journal publishing of the last twenty years. And, although several SAGE journals are included in JSTOR, Theology (and other SPCK titles) are not. As such, the CQR was not swept up in retrospective digitisation as other defunct titles from publishers involved in JSTOR have been. As it is, to read the CQR I must trouble the staff at my nearest university library to walk across to a store in a separate building and fetch the volumes for me.

There is, I think, an issue here that sits in the intersection of other questions of technology and practice which are better known. It is abundantly clear that current (or very recent) issues of journals that are available online have an advantage over those available only in print, and that the advantage is compounded when the journal is available Open Access. There is now also a great deal of stimulating reflection on the impact of digitised historic sources on historical practice. Within that, it has been observed that the digitisation of newspapers such as The Times earlier than other, equally prominent national newspapers risked skewing readers’ attention towards one source at the expense of another. Despite scholars’ best intentions – of leaving no stone unturned to get to the truth, no matter how heavy the stone – it is at least plausible that more easily accessible sources will be privileged. And the cases of Theology and the CQR suggest that the same might be true in certain fields of modern intellectual history, as the back issues of some current journals are digitised as a byproduct of current needs and others are not. That process of digitisation has tended to favour journals that survive over those that do not, and (in the case of JSTOR), defunct titles seem to stand a better chance if they were absorbed by one that survives.

Of course, it may well be that the CQR is in fact a less significant journal for twentieth century religious history than is Theology. But historical matters become perceived as significant partly as a result of the attention they are paid. It is at least possible that the relative ease of access to Theology will in itself (over time) give it a significance greater than the CQR by a kind of default. If this pattern is repeated in other areas of twentieth century intellectual history, then it perhaps deserves more attention than it has received so far.


Josef L. Altholz, ‘The Church Quarterly Review, 1875-1900: a marked file and other sources’, Victorian Periodicals Review 17 (1.2), 1984, 52-7.

Adrian Bingham, ‘The digitization of newspaper archives: opportunities and challenges for historians’, Twentieth Century British History 21(2), 2010, 225-31.

Lara Putnam, ‘The trans-national and the text-searchable: digitised sources and the shadows they cast’, American Historical Review 121(2), 2016, 376-402.

SAGE publications, Press release from 2010 on the digitisation of Theology