Cultural Heritage Infrastructures in Digital Humanities: a review

Cultural Heritage Infrastructures in Digital Humanities.
Agiatis Benardou, Erik Champion, Costis Dallas and Lorna M. Hughes (eds). Routledge, 2017.

[This review first appeared in the LSE Review of Books.]

The digital turn in humanities research over the last three decades has enabled the asking of new research questions: the availability of fresh tools and techniques, as well as digitised objects to which to apply them, has opened up angles of enquiry on almost any subject. This was already deeply felt in the pioneering humanities computing projects of the 1980s, but the pervasiveness of the internet has prompted thinking at national and international levels, particularly in the 2000s, about how those new kinds of research might best be enabled in a networked, geographically dispersed context.

At the same time, the policy environment for humanities research has been inflected by external forces: the energy directed towards ‘cyber-infrastructure’ for the ‘hard’ sciences; the increasing pressure on the custodians of cultural heritage (publicly funded galleries, libraries, archives and museums (the GLAM sector) in particular) to maximise the use of their holdings. Also, in Europe especially, there has been an emphasis on a common heritage of European civilisation that ought to be studied comparatively as part of a more general projection of the European ideal. As a result, the European Union has funded, and continues to fund, ‘research infrastructures’ for the humanities in varying shapes and sizes, although it is not alone in doing so.

However, as the editors of Cultural Heritage Infrastructures in Digital Humanities point out, there is as yet little reflection available on the impact these research infrastructures have had both on the academy and the wider public (5). This collection of essays is a very valuable contribution to that process of assessment, and deserves to be widely read. It will be of interest not only to humanities scholars, but also to those in the GLAM sector concerned with user engagement and access, as well as policymakers in and around government.

The eleven chapters presented fall into several kinds. Readers most specifically concerned with the broad strategic issues concerning infrastructure provision – i.e. which services these infrastructures should provide, to whom and using which technologies – will be best served by contributions from Seamus Ross; Veerle Vanden Daelen on the European Holocaust Research Infrastructure (EHRI); Agiatis Benardou and Alastair Dunning on Europeana; Sharon Webb and Aileen O’Carroll on digital tools in Ireland, and; Tobias Blanke, Conny Kristel and Laurent Romary’s chapter, and the editorial introduction. Other contributions focus not so much on these wider issues as on individual disciplines or on particular tools and services. Although edited by scholars from Australia, Canada, Greece and the UK, the focus of the collection is weighted towards Europe; while the collection is none the worse for that, there is still room for reflection on the situation in other contexts.

Some common themes stand out. The collection is shot through with a refreshing awareness of how crucial engagement with the user is in creating a service that meets their needs. This is welcome indeed, since there is no shortage of digital services that serve their users less well than they might, largely because those commissioning, designing and building them did not stop to ask what users required. The well-documented example of Project Bamboo in the United States disappointed the hopes placed on it largely for this reason, and is acknowledged here.

This reviewer was also reminded once again of the sheer particularity of humanities scholarship both in terms of method and the kinds of materials in use, which points up the difficulty of creating infrastructures that suit more than a small number of scholars. A juxtaposition of Gertraud Koch’s essay on anthropology and that of Christina Kamposiori, Claire Warwick and Simon Mahony on art history serves to make the point: different humanities disciplines often make use of very different kinds of digital objects, and where they do use the same materials, quite distinct working assumptions are made about them. Furthermore, disciplines are also marked by varying levels of digital skills amongst their practitioners. Given all this, the challenges in designing services that meet all these needs are formidable. The experience of EHRI – providing a service that allowed scholars to discover materials in many archives relating to the Holocaust – shows the difficulty of creating such a service even for what, on the face of it, is a relatively clearly defined class of resources.

Despite all the stimulating and useful material this collection provides, a question mark remains over whether persistent organisational and technical structures are the best way of fostering research in the digital humanities. The question is met head on by Blanke, Kristel and Romary, who rightly acknowledge the complexity of creating large semi-permanent distributed digital services to connect very diverse individual tools and resources, especially as both the needs of users and the technologies available change rapidly and asymmetrically with each other. Beyond this book, however, the wider debate about how to enable distributed humanities scholarship is still often framed in terms of the shape that such infrastructures should take; their desirability in principle is not often stated as such, but is assumed. Andrew Prescott has rightly taken issue with the whole metaphor of infrastructure as an unhelpful way of imagining what is required. To envisage things in terms of infrastructure implies permanence, rigidity, standardisation. An alternative case might be made for the metaphor of the ecosystem, with which Blanke and colleagues also make play.

One can imagine a scholarly ecosystem in which individual libraries and archives concentrate on understanding their own users and designing services to meet their specific needs, whilst exposing data in a maximally open but passive way for others to access as and when they need to. Some of the funding currently directed at intermediate technical structures might instead be used to develop this local capacity. At the same time, funders might also invest in three other things: observing and reporting on the directions in which research in each community is heading (in the manner of the now defunct AHRC Digital Methods Network and other projects, as noted on pages 4-5); developing the individual technical skills of researchers in exploiting those resources and in making their own tools; and supporting the development of many community-specific tools and projects in response to demand, on the condition that those same tools are openly available for reuse and adaptation as needs change. To be sure, several of the existing infrastructures do some or all of these things, but it may be that that is all they should do.

This is not necessarily to advocate such a way of working, but merely to pose the question of whether the infrastructural paradigm is necessarily the only way available. This reviewer, at least, remains to be convinced that the demand for infrastructures that federate access and analysis has been shown to be present amongst end users. But while Webb and O’Carroll rightly suggest that the idea that ‘if you build it, they will come’ (129) has had its day amongst those who create individual tools and services, is it not alive and well at the infrastructural level? As a minimum, we ought to ask whether the dominance of the infrastructual paradigm is not due to its appeal to large providers of content and its comparative simplicity to fund and administer, rather than its intrinsic rightness as a way of fostering research. Readers will differ on the answers to this question, but anyone concerned with the future of digital humanities research will find much to ponder in this timely and important collection of essays.

Advertisements

Web 25: Histories from 25 years of the World Wide Web

Niels Brügger (editor)
Web 25. Histories from the First 25 Years of the World Wide Web
New York, Peter Lang, 2017. Paperback at £36.

It’s always a great pleasure to have sight of a book in which some of your own work appears. In the case of Web 25, it contains my short cultural history of the first 20 years of world Web archiving. But the book as a whole is full of intriguing other things, some of which I draw out here.

One of the most interesting areas (for me) in the emerging field of Web history is that of the early intellectual history of the Web: the modes in which people told stories about how the Web came into being and what it was good for (and the dangers it held). It was just this kind of research that my own paper at the ReSAW conference in June was aiming at ( ‘Utopia, dystopia and Christian ethics in the history of the Web‘ (podcast)), and there are several points of contact with two papers here: Marguerite Barry on the ways in which the Web entered general public conversation; Simone Natale and Paolo Bory on understanding the early history of the Web as one instance of a ‘biography of media’.

There are also several intriguing chapters that examine the concrete histories of particular parts of the Web: Sybil Nolan on one particular news site (the Australian The Age Online); Elisabetta Locatelli on the genre of the blog in an Italian context; Michel Hockx on the development of the Chinese Web; Jean Marie Deken on one particular organisation, the Stanford Linear Accelerator Centre. Here we have case studies at every level of magnification: organisations, particular kinds of content, whole nations.

There is also methodological reflection: from Matthew S. Weber (‘The challenges of 25 years of data: an agenda for Web-based research’); Federico Nanni and Anwesha Chakraborty on integrating archived Web materials with other sources including interviews to build diachronic accounts of the evolution of a particular site; Anne Helmond on the importance of embedded third-party code as a means of understanding what she terms ‘historical website ecology’. It’s a potentially very fruitful approach that complements the kind of analysis of link relations between sites that I’ve attempted here and here. It also connects with Niels Brügger’s own chapter, a short history of the hyperlink.

Finally, in the same section as my own there are chapters on the experience of creating and managing Web archives themselves, both in national library contexts (Paul Koerbin on Australia, and Ditte Laursen and Per Møldrup-Dalum on Denmark) and Camille Paloque-Berges on Usenet as an archive that falls outside the more established patterns into which Web archiving has fallen.

All in all, the volume is another part of an exciting upswing in interest in the idea of Web history, represented by The Web as History, the new journal Internet Histories and the forthcoming Sage Handbook to Web History.

Reading old news in the web archive, distantly

One of the defining moments of Rowan Williams’ time as archbishop of Canterbury was the public reaction to his lecture in February 2008 on the interaction between English family law and Islamic shari’a law. As well as focussing attention on real and persistent issues of the interaction of secular law and religious practice, it also prompted much comment on the place of the Church of England in public life, the role of the archbishop, and on Williams personally. I tried to record a sample of the discussion in an earlier post.

Of course, a great deal of the media firestorm happened online. I want to take the episode as an example of the types of analysis that the systematic archiving of the web now makes possible: a new kind of what Franco Moretti called ‘distant reading.’

The British Library holds a copy of the holdings of the Internet Archive for the .uk top level domain for the period 1996-2010. One of the secondary datasets that the Library has made available is the Host Link Graph. With this data, it’s possible to begin examining how different parts of the UK web space referred to others. Which hosts linked to others, and from when until when ?

This graph shows the total number of unique hosts that were found linking at least once to archbishopofcanterbury.org in each year.

Canterbury unique linking hosts - bar

My hypothesis was that there should be more unique hosts linking to the archbishop’s site after February 2008, which is by and large borne out. The figure for 2008 is nearly 50% higher than for the previous year, and nearly 25% higher than the previous peak in 2004. This would suggest that a significant number of hosts that had not previously linked to the Canterbury site did so in 2008, quite possibly in reaction to the shari’a story.

What I had not expected to see was the total number fall back to trend in 2009 and 2010. I had rather expected to see the absolute numbers rise in 2008 and then stay at similar levels – that is, to see the links persist. The drop suggests that either large numbers of sites were revised to remove links that were thought to be ‘ephemeral’ (that is to say, actively removed), or that there is a more general effect in that certain types of “news” content are not (in web archivist terms) self-archiving. [Update 02/07/2014 – see comment below ]

The next step is for me to look in detail at those domains that linked only once to Canterbury, in 2008, and to examine these questions in a more qualitative way. Here then is distant reading leading to close reading.

Method
You can download the data, which is in the public domain, from here . Be sure to have plenty of hard disk space, as when unzipped the data is more than 120GB. The data looks like this:

2010 | churchtimes.co.uk | archbishopofcanterbury.org | 20

which tells you that in 2010, the Internet Archive captured 20 individual resources (usually, although not always, “pages”) in the Church Times site that linked to the archbishop’s site. My poor old laptop spent a whole night running through the dataset and extracting all the instances of the string “archbishopofcanterbury.org”.

Then I looked at the total numbers of unique hosts linking to the archbishop’s site in each year. In order to do so, I:

(i) stripped out those results which were outward links from a small number of captures of the archbishop’s site itself.

(ii) allowed for the occasions when the IA had captured the same host twice in a single year (which does not occur consistently from year to year.)

(iii) did not aggregate results for hosts that were part of a larger domain. This would have been easy to spot in the case of the larger media organisations such as the Guardian, which has multiple hosts (society,guardian.co.uk, education.guardian.co.uk, etc.) However, it is much harder to do reliably for all such cases without examining individual archived instances, which was not possible at this scale.

Assumptions

(i) that a host “abc.co.uk” held the same content as “www.abc.co.uk”.

(ii) that the Internet Archive were no more likely to miss hosts that linked to the Canterbury site than ones that did not – ie., if there are gaps in what the Internet Archive found, there is no reason to suppose that they systematically skew this particular analysis.

Where should the digital humanities live ?

Don’t get me wrong. The cluster of work that bears the label ‘digital humanities’ is important; very important. I’ve spent the last decade or so of my working life in the gap between historians and application developers, trying to make sure that digital tools get designed in the ways historians need them to be designed. Projects digitising books; collaborative editing platforms; institutional repositories; Open Access journal platforms; web archives: I’ve done a similar job, more or less well, in each case. As well as that, I was (and remain) founding co-convener of the Digital History seminar at the Institute of Historical Research, which looks to showcase finished historical scholarship that would have been impossible without the digital, broadly defined.

But there is a problem with how we understand the term, I think. I receive the term as signifying a community of practice, of scholars employing new technological means to achieve the same ends as they did before ‘the digital’. And as that community of practice grows, one would naturally expect a degree of self-consciousness within it as to the distinctiveness of what we’re all doing. This is inevitable, and almost certainly helpful, as new journals, conferences and online spaces appear to in which work can get published that might be too innovative for traditional channels to handle, and for discussions about method to take place safely.

My worry is over the institutional location of this activity. Several universities have spotted the potential of locating DH people together, and so there are several Schools or Faculties or Departments of Digital Humanities, all centres of real excellence, in universities in the UK and elsewhere. It’s an institutional means of nurturing something important, and it seems to work. My concern is with the long-term.

As in all large organisations, the internal structures of universities have their own force in determining the shape of the work that goes on within them. Structures shape cultures and cultures influence behaviours. It’s nobody’s doing, but the effect is real.

A department has a head, who usually sits at the same table as the head of History, or Philosophy; and funds run down these channels, and reporting lines back up. And my concern is that this Digital Humanities, this enterprise that starts to be treated (in institutional terms) as a discipline in its own right, could become a silo. The unintended consequence of creating a permanent space in which to foster the new approach is that Dr So-and-So in English, or Philosophy, can say “Oh, a digital approach, you say ? You want DH – they’re over in the Perkins Building.” Enterprising individuals and projects can and do bridge these gaps between departments; but the effect of the existence of the silo on the general consciousness has to be reckoned with, and mitigating the effect takes time and effort.

Put it this way. When Microsoft Word came within the reach of university budgets, no-one proposed that a Department of Word-Processed Humanities be set up – although word-processing was a technology that became ubiquitous in a short space of time, and had profound and widespread and general effects on a crucial element of academic practice – just like the digital humanities. And right now, there are not Schools of Social Humanities, to foster communities of practice in the most effective use of Twitter for dissemination and impact. Both these were disruptive technologies which were (and are) promoted across departments, faculties and whole institutions until they needed (or need) promoting no longer.

The end game for a Faculty of DH should be that the use of the tools becomes so integrated within Classics, French and Theology that it can be disbanded, having done its job. DH isn’t a discipline; it’s a cluster of new techniques that give rise to new questions; but they are still questions of History, or Philosophy, or Classics; and it is in those spaces that the integration needs eventually to take place.