Archiving the “Jewish internet”: some opening questions

[Some words addressed to the fourth world Judaica Curators Conference, held on 4th May 2021 under the aegis of the National Library of Israel. My thanks are due to Raquel Ukeles, head of collections at the NLI, for the invitation.]

I’m not sure how generally familiar you are with Web archiving, but this year marks 25 years since the generally accepted beginning of systematic archiving of the Web on a large scale. And so, there is now a substantial literature on the subject, from the practicalities of the process to the growing and remarkably diverse uses to which archived Web materials are now being put by researchers. Some suggested ways into this reading are on the slide. [Links are given at the end of this post.]
Some selected reading on the subject under discussion, also listed at the end of the post.
What I can most usefully do to set up our discussion today is to raise some initial broad questions as to what a collaborative archiving project for the “Jewish internet” might look like: to, if you like, set up the sheets on the flipchart into which this group may start to add some thoughts.

There are here a set of four basic questions, which are interdependent in several ways. In one sense, the easiest of these is the question of how. Web archiving is a process which tends to cross the departmental divisions in most libraries and archives: it requires curatorial expertise in the selection of material; technical expertise (and also curatorial skills of a particular kind) in managing the harvesting of the materials and maintaining quality; further (and different) technical skills are needed in managing storage, preservation and access; and different skills again in orientating and engaging end users. But within the small but worldwide (and well-integrated) community of people in this space, this set of skills and technologies are well understood and documented. So, though I’ve listed this question first, it is (I suggest) the one that needs answering last.

The question also arises of which institutions should be involved, and how any collaborative project should be configured. Several models are possible:
(i) first of these might be a situation where several institutions agree to parcel out sections of the Jewish internet, and each works largely independently to archive that material in line with their existing collection remits and resources;
(ii) second might be a rather tighter model, in which institutions collaborate directly in the selection of material to archive, but delegating the harvesting, description, preservation and access to a single institution. And that institution might be one of the curating institutions, or (alternatively) an outsourcing partner.

To give some examples, in the UK, the responsibility for web archiving under non-print legal deposit is shared between six institutions, one of which (the British Library) deals with much of the process. Also in the UK Web Archive is the Quaker collection, the content of which was selected by a specialist library but collected again by the British Library. Globally, several libraries have collaborated (through the International Internet Preservation Consortium) on collections on the Olympics Games and most recently Covid-19, delivered via the Internet Archive’s Archive-It service. Within either of these models, there is arguably much to be gained from direct engagement with faith communities themselves, both to build consent to archiving, to guide selection, and to bring forward users and advocates for the final archive.

The third basic question relates to the law, particularly surrounding copyright (but not only that). Some nations have legal deposit frameworks in place for non-print materials, including the UK, France and Denmark, although only a small minority of countries have such a thing, and these dispensations are usually accompanied by very significant restrictions on access. Nations vary in how they construe the notion of fair use in such things, and it is within the relatively liberal regime in the US that the Internet Archive is able to work. Various ways of working are available. Firstly, there is the direct approach to site owners for explicit permission to archive, a slow, resource-intensive and only intermittently successful endeavour. Other institutions have operated on a notice-and-takedown basis, where the owner is contacted and the lack of any expressed objection is taken as sufficient basis to continue; others again archive widely and take material down on request.

The right option for any project has to be determined by a weighing of available resources against the general attitude of each institution to risk. And that risk very much depends on what it is that is to be archived, as different kinds of materials have very different risks associated with them. This is why, I suggest, the first question with which this whole discussion needs to begin is the last on my slide: what *exactly* is to be included and excluded; in other words, what exactly *is* the “Jewish internet”?

A schema for understanding religions online, taken from the chapter 'Religion in Web history', published in The SAGE Handbook of Web History. A link to the full text of this paper is given at the end of the post.

I’m not about to begin giving specific answers to this question to an expert audience such as this. But I offer this organising schema written as from the perspective of an historian of modern Christianity, with now several years experience both on the archiving side and as a research user of the archived Web. What I set out in this particular article was quite tightly restricted to specifically religious texts, activities and organisations. But right away, further questions arise.

First is the question of how far the “Jewish internet” encompasses the Israeli web, given the (I think) unique relationship between state, ethnicity and religion that the state of Israel represents. What does that particular Venn diagram look like? Secondly, there is the broader question of culture, and the issue of language. The Danish legal deposit framework allows for archiving of content in the Danish language, regardless of subject and of where it was published. This kind of framework clearly makes little sense in a British, Spanish or French context, where the language has long since ceased to be the possession of the country in which it originated; what about material in Hebrew published outside Israel?

Consider, too, how far the net should be spread into areas such as the arts. Would an archive of the “Jewish internet” include material relating to the artist Marc Chagall? If so, should it include only his religious works, or the portraits and street scenes too? Just the works created to public commissions, or also the private works? Just those in Israel, or in other countries too, not least the window in Chichester cathedral on the south coast of England? In the case of Leonard Bernstein, should West Side Story – a fairly “secular” work – be set aside, but the ‘Kaddish’ Symphony included? What about Chichester Psalms, settings of the Hebrew scriptures commmissioned for an Anglican cathedral in England? I don’t have ready answers for these questions, but I hope to have helped to set them up as a useful place to start your deliberations.

Further reading

The SAGE Handbook of Web History (2018, ed. Brügger and Milligan)

The Past Web: Exploring Web Archives (Springer, 2021, eds. Gomes/Demidova/Winters/Risse)

Webster, ‘Users, technologies, organisations: towards a cultural history of world Web archiving’, in Brügger (ed.), Web 25 (Peter Lang, 2017). Download the PDF.

Webster, ‘Religion in Web history’ in The SAGE Handbook of Web History. Download the PDF.


Reconstructing a late-Nineties web sphere

I was very pleased to take part this week in ‘Engaging with Web Archives‘, originally supposed to take place at NUI Maynooth in April, but which instead took place online. My thanks to Sharon Healy and Michael Kurzmeier for the original idea, and for their remarkable perseverance in making the final event take place. It’s very good to see an event of this kind happening in Ireland, and I expect it will come to be regarded as a highly significant moment in Irish engagement with the archived Web.

My presentation is available on YouTube. It’s a shortened version of ‘Digital archaeology in the web of links: reconstructing a late-90s web sphere’, forthcoming in Gomes, Davidova, Winters and Risse (eds), The Past Web. Exploring Web archives (Springer, 2021)


New DPC guidance note on researcher use of the archived Web

My new Technology Watch Guidance Note is now available, entitled How researchers use the archived Web. I was delighted to be commissioned (through Webster Research and Consulting) to write this short note on the current state of the art for the good folk at the Digital Preservation Coalition.

In a context of both novelty and diversity, the Guidance Note is designed to orient DPC member organisations, and others engaged in Web archiving (or intending to be), as to the kinds of uses researchers might expect to make of the content they collect. It is hoped that it may support the development of programmes of user research and engagement, and (in turn) inform collection development policies and the design of discovery and access services.

It deals briefly with two questions: what in the archived Web are scholars studying, and how are they studying it?

It is  freely available from the DPC’s Knowledge Base (PDF)

Read more details of Webster Research and Consulting’s research services.

Digital archaeology in the web of links: reconstructing a late Nineties web sphere

At the moment I have a chapter contribution to a book of essays working its way through the publication process. The abstract is below; I’d be very happy to share the paper privately, if people would care to contact me.

A shortened version of the paper is available as a video presentation.

One unit of analysis within the archived Web is the ‘web sphere’, a body of material from different hosts that is related in some meaningful sense (following, broadly, the definition coined by Niels Brügger). This chapter outlines a method of reconstructing such a web sphere from the late 1990s, that of conservative British Christians as they interacted with each other and with others in the United States in relation to issues of morality, domestic and international politics, law and the prophetic interpretation of world events.

Using an iterative method of interrogation of the graph of links for the archived UK web, it shows the potential for the reconstruction of a web sphere from what is in effect an archive that has a finding aid, but one with only classmarks and without descriptions. It also demonstrates the kind of multi-source investigation necessary to uncover the archaeology of the early Web. Big data and small, printed sources, the traces of previous Web archiving efforts (even when unsuccessful), and echoes in the scholarly record itself: all these come into play.

I also propose a conceptual division of Brügger’s web spheres into two kinds, ‘hard’ and ‘soft’, as distinguished by the ease with which its boundaries can be identified, and the speed with which they change.

Understanding national Web domains

Yesterday I was delighted to find in the mail my copy of an important new book of essays: The Historical Web and Digital Humanities: the case of national Web domains. It is published by Routledge and edited by Niels Brügger and Ditte Laursen.

I say it is important because it investigates for the first time a particular issue that is of immediate practical concerns for two quite distinct groups. The first – Web archivists in the world’s national libraries, and particularly those who work within a legal deposit framework – have sometimes to define and then certainly to work within a definition of the ‘national’ Web, and to understand how much of it they are able to archive. As the volume amply demonstrates, that task of definition is not straightforward, and has been dealt with in widely varying ways.

Outside the small but growing community of Web historians, there are many others (not least contemporary historians) who are not primarily interested in the Web itself, but in what a study of it can tell us about everything else. And the definition of the nation, of a shared but bounded space in which a political community speaks together, is the kind of question which has exercised historians of many periods and of other ‘new’ media. As I wrote in my own chapter:

The advent of the web presents historians with a new and somewhat perplexing question: where is it? What does it mean to think of the web in spatial and quasi-geographic terms? How may we write national histories of the web? Where did a particular website ‘live’? Of where was it a resident or citizen, so to speak?

The volume is important, too, because it explicitly tries to connect Web history with the larger field of digital humanities, where hitherto the two fields have been in only the loosest contact (rather to my surprise, I might add.) It is good to see the volume appear in Routledge’s series on Digital Research in the Arts and Humanities, which also carries work in more ‘traditional’ digital humanities areas.

Finally, the volume marks an important moment in the development of the discipline of Web history. Previous collections (in which my own work also appeared), all of them crucial in their way, have have been more specifically methodological in focus, and have been designed to make the case for the importance and the integrity of the discipline. Although each chapter made a contribution to its own particular field, those previous volumes did not contribute as a group to particular questions of history, or religious studies, or sociology. (See, for instance The Web as History (2017) or Web25 (2017), and the Sage Handbook of Web History (2018). This volume is the first for several years which speaks to a substantive issue of politics, history and sociology, as well as to archival science and the methodology of studying the archived Web.

The chapters fall into three sections: collecting and preserving national domains; methodological issues, and results and dissemination. I won’t try to summarise them here, save to say that each group of readers – archivists and scholars – should read each section, since their concerns overlap. As I’ve argued elsewhere, scholars need to understand more than they do about how archives come into existence, and (in this case) about the administrative histories of particular ccTLDs. Archivists will similarly gain a great deal from the discussions of method and dissemination in the second two parts, since those questions go to the heart of both archiving policy and the design of effective systems for discovery, playback and analysis of the archived Web.

Part One: Collecting and preserving a national Web domain
Kees Teszelszky on ‘reconstructing and saving the Dutch national web using historical methods’.
Sally Chambers, Peter Mechant, & Friedel Geeraert, on the PROMISE project in Belgium: ‘Towards a national web archive in a federated country’.
Ian Milligan and Tom Smyth on the Canadian .ca domain, and studying the web ‘in the shadow of Uncle Sam’.
Helen Hockx-Yu, Ditte Laursen, & Daniel Gomes on the curious case of the .eu domain.

Part Two: Methodological challenges
Jane Winters on the many archives of the UK web space.
Anat Ben-David on Palestine, Kosovo and the quest of national self-determination on the fringe of the Web.
My own chapter on Northern Ireland and the limitations of the ccTLD as proxy for the nation.
Niels Brügger, Ditte Laursen, & Janne Nielsen on establishing a corpus of the Danish web.

Part Three: Results and dissemination
Valérie Schafer explores the French web of the 1990s.
Rebecca Kahn on locating a national museum online (the British Museum).
Niels Brügger proposes a way towards the creation of a national web trend index.