Archiving the “Jewish internet”: some opening questions

[Some words addressed to the fourth world Judaica Curators Conference, held on 4th May 2021 under the aegis of the National Library of Israel. My thanks are due to Raquel Ukeles, head of collections at the NLI, for the invitation.]

I’m not sure how generally familiar you are with Web archiving, but this year marks 25 years since the generally accepted beginning of systematic archiving of the Web on a large scale. And so, there is now a substantial literature on the subject, from the practicalities of the process to the growing and remarkably diverse uses to which archived Web materials are now being put by researchers. Some suggested ways into this reading are on the slide. [Links are given at the end of this post.]
What I can most usefully do to set up our discussion today is to raise some initial broad questions as to what a collaborative archiving project for the “Jewish internet” might look like: to, if you like, set up the sheets on the flipchart into which this group may start to add some thoughts.

There are here a set of four basic questions, which are interdependent in several ways. In one sense, the easiest of these is the question of how. Web archiving is a process which tends to cross the departmental divisions in most libraries and archives: it requires curatorial expertise in the selection of material; technical expertise (and also curatorial skills of a particular kind) in managing the harvesting of the materials and maintaining quality; further (and different) technical skills are needed in managing storage, preservation and access; and different skills again in orientating and engaging end users. But within the small but worldwide (and well-integrated) community of people in this space, this set of skills and technologies are well understood and documented. So, though I’ve listed this question first, it is (I suggest) the one that needs answering last.

The question also arises of which institutions should be involved, and how any collaborative project should be configured. Several models are possible:
(i) first of these might be a situation where several institutions agree to parcel out sections of the Jewish internet, and each works largely independently to archive that material in line with their existing collection remits and resources;
(ii) second might be a rather tighter model, in which institutions collaborate directly in the selection of material to archive, but delegating the harvesting, description, preservation and access to a single institution. And that institution might be one of the curating institutions, or (alternatively) an outsourcing partner.

To give some examples, in the UK, the responsibility for web archiving under non-print legal deposit is shared between six institutions, one of which (the British Library) deals with much of the process. Also in the UK Web Archive is the Quaker collection, the content of which was selected by a specialist library but collected again by the British Library. Globally, several libraries have collaborated (through the International Internet Preservation Consortium) on collections on the Olympics Games and most recently Covid-19, delivered via the Internet Archive’s Archive-It service. Within either of these models, there is arguably much to be gained from direct engagement with faith communities themselves, both to build consent to archiving, to guide selection, and to bring forward users and advocates for the final archive.

The third basic question relates to the law, particularly surrounding copyright (but not only that). Some nations have legal deposit frameworks in place for non-print materials, including the UK, France and Denmark, although only a small minority of countries have such a thing, and these dispensations are usually accompanied by very significant restrictions on access. Nations vary in how they construe the notion of fair use in such things, and it is within the relatively liberal regime in the US that the Internet Archive is able to work. Various ways of working are available. Firstly, there is the direct approach to site owners for explicit permission to archive, a slow, resource-intensive and only intermittently successful endeavour. Other institutions have operated on a notice-and-takedown basis, where the owner is contacted and the lack of any expressed objection is taken as sufficient basis to continue; others again archive widely and take material down on request.

The right option for any project has to be determined by a weighing of available resources against the general attitude of each institution to risk. And that risk very much depends on what it is that is to be archived, as different kinds of materials have very different risks associated with them. This is why, I suggest, the first question with which this whole discussion needs to begin is the last on my slide: what *exactly* is to be included and excluded; in other words, what exactly *is* the “Jewish internet”?

I’m not about to begin giving specific answers to this question to an expert audience such as this. But I offer this organising schema written as from the perspective of an historian of modern Christianity, with now several years experience both on the archiving side and as a research user of the archived Web. What I set out in this particular article was quite tightly restricted to specifically religious texts, activities and organisations. But right away, further questions arise.

First is the question of how far the “Jewish internet” encompasses the Israeli web, given the (I think) unique relationship between state, ethnicity and religion that the state of Israel represents. What does that particular Venn diagram look like? Secondly, there is the broader question of culture, and the issue of language. The Danish legal deposit framework allows for archiving of content in the Danish language, regardless of subject and of where it was published. This kind of framework clearly makes little sense in a British, Spanish or French context, where the language has long since ceased to be the possession of the country in which it originated; what about material in Hebrew published outside Israel?

Consider, too, how far the net should be spread into areas such as the arts. Would an archive of the “Jewish internet” include material relating to the artist Marc Chagall? If so, should it include only his religious works, or the portraits and street scenes too? Just the works created to public commissions, or also the private works? Just those in Israel, or in other countries too, not least the window in Chichester cathedral on the south coast of England? In the case of Leonard Bernstein, should West Side Story – a fairly “secular” work – be set aside, but the ‘Kaddish’ Symphony included? What about Chichester Psalms, settings of the Hebrew scriptures commmissioned for an Anglican cathedral in England? I don’t have ready answers for these questions, but I hope to have helped to set them up as a useful place to start your deliberations.

Reconstructing a late-Nineties web sphere

I was very pleased to take part this week in ‘Engaging with Web Archives‘, originally supposed to take place at NUI Maynooth in April, but which instead took place online. My thanks to Sharon Healy and Michael Kurzmeier for the original idea, and for their remarkable perseverance in making the final event take place. It’s very good to see an event of this kind happening in Ireland, and I expect it will come to be regarded as a highly significant moment in Irish engagement with the archived Web.

My presentation is available on YouTube. It’s a shortened version of ‘Digital archaeology in the web of links: reconstructing a late-90s web sphere’, forthcoming in Gomes, Davidova, Winters and Risse (eds), The Past Web. Exploring Web archives (Springer, 2021)


New DPC guidance note on researcher use of the archived Web

My new Technology Watch Guidance Note is now available, entitled How researchers use the archived Web. I was delighted to be commissioned (through Webster Research and Consulting) to write this short note on the current state of the art for the good folk at the Digital Preservation Coalition.

In a context of both novelty and diversity, the Guidance Note is designed to orient DPC member organisations, and others engaged in Web archiving (or intending to be), as to the kinds of uses researchers might expect to make of the content they collect. It is hoped that it may support the development of programmes of user research and engagement, and (in turn) inform collection development policies and the design of discovery and access services.

It deals briefly with two questions: what in the archived Web are scholars studying, and how are they studying it?

It is  freely available from the DPC’s Knowledge Base (PDF)

Research data management and Web archive studies: a missing piece

A short presentation I gave today to the inaugural meeting of WARCnet, a research network researching web domains and events. It is funded by the Independent Research Fund Denmark. (Due to the COVID-19 outbreak, the meeting occurred online rather than in Aarhus, and I recorded my presentation in my office in the UK.)

I briefly survey the development of ‘Web archive studies’ in recent years, pointing out that the management and reuse of research data is a missing piece in the makeup of the discipline. Though I do not have a solution to propose, I called on the WARCnet network to consider the issue, as one that poses a significant systemic risk.

Digital archaeology in the web of links: reconstructing a late Nineties web sphere

At the moment I have a chapter contribution to a book of essays working its way through the publication process. The abstract is below; I’d be very happy to share the paper privately, if people would care to contact me.

A shortened version of the paper is available as a video presentation.

One unit of analysis within the archived Web is the ‘web sphere’, a body of material from different hosts that is related in some meaningful sense (following, broadly, the definition coined by Niels Brügger). This chapter outlines a method of reconstructing such a web sphere from the late 1990s, that of conservative British Christians as they interacted with each other and with others in the United States in relation to issues of morality, domestic and international politics, law and the prophetic interpretation of world events.

Using an iterative method of interrogation of the graph of links for the archived UK web, it shows the potential for the reconstruction of a web sphere from what is in effect an archive that has a finding aid, but one with only classmarks and without descriptions. It also demonstrates the kind of multi-source investigation necessary to uncover the archaeology of the early Web. Big data and small, printed sources, the traces of previous Web archiving efforts (even when unsuccessful), and echoes in the scholarly record itself: all these come into play.

I also propose a conceptual division of Brügger’s web spheres into two kinds, ‘hard’ and ‘soft’, as distinguished by the ease with which its boundaries can be identified, and the speed with which they change.