Reconstructing a late-Nineties web sphere

I was very pleased to take part this week in ‘Engaging with Web Archives‘, originally supposed to take place at NUI Maynooth in April, but which instead took place online. My thanks to Sharon Healy and Michael Kurzmeier for the original idea, and for their remarkable perseverance in making the final event take place. It’s very good to see an event of this kind happening in Ireland, and I expect it will come to be regarded as a highly significant moment in Irish engagement with the archived Web.

My presentation is available on YouTube. It’s a shortened version of ‘Digital archaeology in the web of links: reconstructing a late-90s web sphere’, forthcoming in Gomes, Davidova, Winters and Risse (eds), The Past Web. Exploring Web archives (Springer, 2021)

 

New DPC guidance note on researcher use of the archived Web

My new Technology Watch Guidance Note is now available, entitled How researchers use the archived Web. I was delighted to be commissioned (through Webster Research and Consulting) to write this short note on the current state of the art for the good folk at the Digital Preservation Coalition.

In a context of both novelty and diversity, the Guidance Note is designed to orient DPC member organisations, and others engaged in Web archiving (or intending to be), as to the kinds of uses researchers might expect to make of the content they collect. It is hoped that it may support the development of programmes of user research and engagement, and (in turn) inform collection development policies and the design of discovery and access services.

It deals briefly with two questions: what in the archived Web are scholars studying, and how are they studying it?

It is  freely available from the DPC’s Knowledge Base (PDF)

Read more details of Webster Research and Consulting’s research services.

Research data management and Web archive studies: a missing piece

A short presentation I gave today to the inaugural meeting of WARCnet, a research network researching web domains and events. It is funded by the Independent Research Fund Denmark. (Due to the COVID-19 outbreak, the meeting occurred online rather than in Aarhus, and I recorded my presentation in my office in the UK.)

I briefly survey the development of ‘Web archive studies’ in recent years, pointing out that the management and reuse of research data is a missing piece in the makeup of the discipline. Though I do not have a solution to propose, I called on the WARCnet network to consider the issue, as one that poses a significant systemic risk.

Understanding national Web domains

Yesterday I was delighted to find in the mail my copy of an important new book of essays: The Historical Web and Digital Humanities: the case of national Web domains. It is published by Routledge and edited by Niels Brügger and Ditte Laursen.

I say it is important because it investigates for the first time a particular issue that is of immediate practical concerns for two quite distinct groups. The first – Web archivists in the world’s national libraries, and particularly those who work within a legal deposit framework – have sometimes to define and then certainly to work within a definition of the ‘national’ Web, and to understand how much of it they are able to archive. As the volume amply demonstrates, that task of definition is not straightforward, and has been dealt with in widely varying ways.

Outside the small but growing community of Web historians, there are many others (not least contemporary historians) who are not primarily interested in the Web itself, but in what a study of it can tell us about everything else. And the definition of the nation, of a shared but bounded space in which a political community speaks together, is the kind of question which has exercised historians of many periods and of other ‘new’ media. As I wrote in my own chapter:

The advent of the web presents historians with a new and somewhat perplexing question: where is it? What does it mean to think of the web in spatial and quasi-geographic terms? How may we write national histories of the web? Where did a particular website ‘live’? Of where was it a resident or citizen, so to speak?

The volume is important, too, because it explicitly tries to connect Web history with the larger field of digital humanities, where hitherto the two fields have been in only the loosest contact (rather to my surprise, I might add.) It is good to see the volume appear in Routledge’s series on Digital Research in the Arts and Humanities, which also carries work in more ‘traditional’ digital humanities areas.

Finally, the volume marks an important moment in the development of the discipline of Web history. Previous collections (in which my own work also appeared), all of them crucial in their way, have have been more specifically methodological in focus, and have been designed to make the case for the importance and the integrity of the discipline. Although each chapter made a contribution to its own particular field, those previous volumes did not contribute as a group to particular questions of history, or religious studies, or sociology. (See, for instance The Web as History (2017) or Web25 (2017), and the Sage Handbook of Web History (2018). This volume is the first for several years which speaks to a substantive issue of politics, history and sociology, as well as to archival science and the methodology of studying the archived Web.

The chapters fall into three sections: collecting and preserving national domains; methodological issues, and results and dissemination. I won’t try to summarise them here, save to say that each group of readers – archivists and scholars – should read each section, since their concerns overlap. As I’ve argued elsewhere, scholars need to understand more than they do about how archives come into existence, and (in this case) about the administrative histories of particular ccTLDs. Archivists will similarly gain a great deal from the discussions of method and dissemination in the second two parts, since those questions go to the heart of both archiving policy and the design of effective systems for discovery, playback and analysis of the archived Web.

Part One: Collecting and preserving a national Web domain
Kees Teszelszky on ‘reconstructing and saving the Dutch national web using historical methods’.
Sally Chambers, Peter Mechant, & Friedel Geeraert, on the PROMISE project in Belgium: ‘Towards a national web archive in a federated country’.
Ian Milligan and Tom Smyth on the Canadian .ca domain, and studying the web ‘in the shadow of Uncle Sam’.
Helen Hockx-Yu, Ditte Laursen, & Daniel Gomes on the curious case of the .eu domain.

Part Two: Methodological challenges
Jane Winters on the many archives of the UK web space.
Anat Ben-David on Palestine, Kosovo and the quest of national self-determination on the fringe of the Web.
My own chapter on Northern Ireland and the limitations of the ccTLD as proxy for the nation.
Niels Brügger, Ditte Laursen, & Janne Nielsen on establishing a corpus of the Danish web.

Part Three: Results and dissemination
Valérie Schafer explores the French web of the 1990s.
Rebecca Kahn on locating a national museum online (the British Museum).
Niels Brügger proposes a way towards the creation of a national web trend index.

Where is the national Web, exactly? A case study

[A summary of my chapter in The Historical Web and Digital Humanities. The case of national web domains, edited by Niels Brügger and Ditte Laursen.
The full title is ‘Understanding the limitations of the ccTLD as a proxy for the national web: lessons from cross-border religion in the Northern Irish web sphere’
See also the full text (PDF)]

The writing of modern history has often depended on a stable idea of the state; on the idea that persons have some form of citizenship, a legal identification with a political unit. Even if they may hold more than one, each citizenship may stand on its own without legal ambiguity. Another fundamental assumption is that geographical space (at least on land) can usually be clearly divided into units under unified and monopolistic systems of law and government. To elaborate an insight of Max Weber, in order for a state successfully to enforce a monopoly on the use of violence, it must first know where its boundaries are.

Scholars have also been interested in the interactions between states and their peoples across borders, but still (by and large) supposing a fixity in those states at any one point in time. Studies of migration presuppose a point of origin and a point of arrival. Printed publications may circulate freely, but their publication is still governed by a national legal framework; something similar may be said of television and other broadcast media.

The advent of the web presents historians with a new and somewhat perplexing question: where is it? What does it mean to think of the web in spatial and quasi-geographic terms? How may we write national histories of the web? Where did a particular website ‘live’? Of where was it a resident or citizen, so to speak?

In most cases, the task of defining a national web domain has begun with one or more country code top-level domains (ccTLD) even if it has not ended with them. Here I examine the nature of the .uk ccTLD as a proxy for the UK web by means of a case study of the web estate of the Christian churches in Northern Ireland.

The society of Northern Ireland is marked by an interlinking of religious and national identity, which may be unique in Europe if not in the world. The chapter uses publicly available data, and including that provided by the British Library, to reconstruct the link relationships between churches in Northern Ireland, examining the regional, national, and cross-border relationships that they imply.

Due to its very particular religious and political history, Northern Irish society has been characterised by an exceptional sensitivity to symbols, to history, and to place. How far has that sensitivity to space and symbol been transferred online? Amongst the churches, Catholic and Protestant, in a province where the symbols of national identity have such prominence, does the location of a website within or outside the .uk domain carry any symbolic weight? Might those churches most associated with unionism be more likely to register in the UK ccTLD than Roman Catholic churches?

Based on the patterns of domain registration for the churches of Northern Ireland in 2015 and 2016, it would seem that Roman Catholic congregations were likely to register domains outside the UK, a finding broadly in line with the initial hypothesis. However, the converse – in relation to the Protestant churches – is not borne out; no particular prioritisation of registration within the UK ccTLD is evident in the data. Both conclusions point to important areas of future research on the nature of national webs, and the limitations of the ccTLD as a proxy for them. If organisations that might be expected to want their web estate to reside within a particular national domain do not in fact register their domains there, it suggests that the ‘gravitational pull’ of the ccTLD is weaker than might be supposed.

The second half of the chapter takes the case of one of the Protestant denominations in Ireland in order to investigate the mapping (or lack of it) between the nation and the ccTLD. It recreates the networks of links between individual Baptist churches on both sides of the border, and asks: are these link networks influenced by the fact of the ccTLD, or are there more geographic and cultural factors in play that determine their shape? It is based on an analysis of the .uk link graph for the period 1996-2010.

I conclude that although less than half of the Baptist web in Northern Ireland is registered in the UK ccTLD, the links between churches show in fact a very tight geographic concentration on the domains of churches in the eastern counties of Antrim, Down and (to a lesser extent) Armagh. Detailed local studies are needed to establish why this might be the case, although some lines of enquiry might be advanced. Is this a representation of a wider divide between rural and urban churches, or a reflection of the greater resources or perceived influence of churches in certain areas, particularly Belfast? Or is the prominence of certain individual churches merely the product of their particular local circumstances and understanding of their role? For whatever reason, the link graph shows little sign of sentiment regarding the common identity of all the Baptist churches in Northern Ireland.

These churches are linked together in a single organisation, the Association of Baptist Churches in Ireland: what evidence is there of link networks in the archived web that might reflect a sense of an all-Ireland identity? Approximately a quarter of Irish Baptist congregations are located in the Republic. What of the links from churches in the north to those in the south? The link graph connects only four Northern Irish congregations to twelve in the Republic, a very small proportion. Little all-Irish sentiment is to be detected in the northern Irish Baptist web.

Why might that be? Is the weakness of link connections between north and south characteristic of all churches in Northern Ireland, or only the Protestant churches, or is it unique to the Baptists? Is the network particularly weak in the Baptist case because of the relative weakness of its national organisational structure? These questions could in part be answered by the application of the approach used here to the web estate of the other churches.

More generally, a history of the web is required that also asks what it is that causes the human actors in control of websites to link to others. A substantial project of oral history interviews and fine-grained examination of individual websites is needed to understand the communicative strategies organisations adopt and their evolution over time. That said, I show what may be observed at a distance with a new kind of data. Macro-level analysis of the web such as this offers an additional tool for historians and other scholars to deploy alongside their existing methods.

The chapter has also pointed out a particular challenge that historians and analysts of national webs face. In the Baptist case, a network of links that is very tightly geographically concentrated is at the same time spread across four different TLDs. Studies of particular web spheres such as this are so far very few. However, if the kind of pattern I have outlined is at all typical of other web spheres, it suggests that for web archivists and scholars alike the ccTLD is a weak proxy indeed for the national web.

In addition, it brings into sharp relief one of the structural disadvantages of the division of world web archiving activities into national programmes. Though many web archives collect national material beyond their ccTLD, no organisation has any statutory responsibility to archive the non-geographic domains such as .com and .org as a whole. Unless and until it becomes possible to access web archives on a transnational basis, scholars will continue to work with fragmentary and non-commensurable data from several archives to reconstruct the national web.