What do we need to know about the archived web?

A theme that emerged for me in the IIPC web archiving conference in Reykjavik last week was metadata, and specifically: precisely which metadata do users of web archives need in order to understand the material they are using?

At one level, a precise answer to this will only come from sustained and detailed engagement with users themselves; research which I would very much hope that the IIPC would see as part of its role to stimulate, organise and indeed fund. But that takes time, and at present, most users understand the nature of the web archiving process only rather vaguely. As a result, I suspect that without the right kind of engagement, scholars are likely (as Matthew Weber noted) to default to ‘we need everything’, or if asked directly ‘what metadata do you need?’ may well answer ‘well, what do you have, and what would it tell me?’

During my own paper I referred to the issue, and was asked by a member of the audience if I could say what such enhanced metadata provision might look like. What I offer here is the first draft of an answer: a five-part scheme of kinds of metadata and documentation that may be needed (or at least, that I myself would need). I could hardly imagine this would meet every user requirement; but it’s a start.

1. Institutional
At the very broadest level, users need to know something of the history of the collecting organisation, and how web archiving has become part of its mission and purpose. I hope to provide a overview of aspects of this on a world scale in this forthcoming article on the recent history of web archiving.

2. Domain or broad crawl
Periodic archiving of a whole national domain under legal deposit provisions now offers the prospect of the kind of aggregate analysis that takes us way beyond single-resource views in Wayback. But it becomes absolutely vital to know certain things at a crawl level. How was territoriality determined – by ccTLD, domain registration, Geo-IP lookup, curatorial decision? The way the national web sphere is defined fundamentally shapes the way in which we can analyse it. How big was the crawl in relation to previous years? How many domains are new, and how many have disappeared? What’s the policy on robots.txt (by default) ? How deep was the crawl scope (by default)? Was there a data cap per host? Some of this will already be articulated in internal documents, some will need some additional data analysis; but it all goes to the heart of how we might read the national web sphere as a whole.

3. Curated collection level
Many web archives have extensive curated collections on particular themes or events. These are a great means of showcasing the value of web archives to the public and to those who hold the pursestrings. But if not transparently documented they present some difficulties to the user trying to interpret them, as the process introduced a level of human judgment to add to the more technical decisions that I outlined above. In order to evaluate the collection as a whole, scholars really do need to know the selection criteria, and at a more detailed level than is often provided right now. In particular, in cases where permissions were requested for sites but not received, being able to access the whole list of sites selected rather than just those that were successfully archived would help a great deal in understanding the way in which a collection was made.

4. Host/domain level
This is the level at which a great deal of effort is expended to create metadata that looks very much like a traditional catalogue record: subject keywords, free-text descriptions and the like. For me, it would be important to know when the first attempt to crawl a host was, and the most recent, and whether there were 404 responses received for crawl attempts at any time in between. Was this host capped (or uncapped) at the discretion of a curator differentially to the policy for a crawl as a whole? Similarly, was the crawl scoping different, or the policy on robots.txt? If the crawl incorporates a GeoIP check, what was the result? Which other domains has it redirected to, and which redirect to it, and which times?

5. Individual resource level
Finally, there are some useful things to know about individual resources. As at the host level, information about the date of the first and last attempts to crawl, and about intervening 404s, would tell the user useful things about what we might call the career of a resource. If the resource changes, what is the profile of that: for instance, how has the file size changed over time? Were there other captures which were rejected, perhaps on a QA basis, and if so, when?

Much if not quite all of this could be based on data which is widely collected already (in policy documents, or curator tools, crawl logs or CDX) or could be with some adjustment. It presents some very significant GUI design challenges in how best to deliver these data to users. Some might be better delivered as datasets for download or via an API. What I hope to have provided, though, is a first sketch of an agenda for what the next generation of access services might disclose, that is not a default to ‘everything’ and is feasible given the tools in use.

Towards a cultural history of web archiving

This week I’m writing the first draft of a chapter on the cultural history of web archiving, for a forthcoming volume of essays (details here). It is subject to peer review and so isn’t yet certain to be published, but here’s the abstract.

I should welcome comments very much, and there may also be a short opportunity for open online peer review.

Users, technologies, organisations: towards a cultural history of world web archiving

‘As systematic archiving of the World Wide Web approaches its twentieth anniversary, the time is ripe for an initial historical assessment of the patterns in which web archiving has fallen. The scene is characterised by a highly asymmetric pattern, involving a single global organisation, the Internet Archive, alongside a growing number of national memory institutions, many of which are affiliated to the International Internet Preservation Consortium. Many other organisations also engage in archiving the web, including universities and other institutions in the galleries, libraries, archives and museums sector. Alongside these is a proliferation of private sector providers of web archiving services, and a small but highly diverse group of individuals acting on their own behalf. The evolution of this ecosystem, and the consequences of that evolution, are ripe for investigation.

‘Employing evidence derived from interviews and from published sources, the paper sets out to document at length for the first time the development of the sector in its institutional and cultural aspects. In particular it considers how the relationship between archiving organisations and their stakeholders has played out in different circumstances. How have the needs of the archives themselves and their internal stakeholders and external funders interacted with the needs of the scholarly end users of the archived web? Has web archiving been driven by the evolution of the technologies used to carry it out, the internal imperatives of the organisations involved, or by the needs of the end user?

Evangelicalism and the Church of England: a review

It’s very good to see on the Fulcrum site an extended review article of Evangelicalism and the Church of England in the Twentieth Century, edited by Andrew Atherstone and John Maiden, and featuring an article of mine on Michael Ramsey. The reviewer, Andrew Goddard, has some very kind things to say about that piece, which I reproduce below. (There’s an extended summary of the article here.)
Maiden Atherstone - cover
“Evangelicals in the Church of England are often remarkably confused and ignorant about their recent past. The wider church knows even less about who we are and where we come from as evangelicals despite our growing significance at every level of the church. Often as evangelicals we tell each other a story which fits our particular form of evangelicalism and fails to recognize the complexity and diversity. This volume, the fruit of a conference at Wycliffe Hall, is a wonderful (if sadly expensive) resource which ably rectifies such failings. After a fascinating introductory essay by the editors it presents ten papers from scholarly experts who both distil their previous work and offer new insights and material.
[…]
“Peter Webster focusses on the varied evangelical responses to Michael Ramsey, on whose archepiscopate he has recently published a significant study. One of his most interesting arguments is to challenge “a common conservative evangelical self-image, of a remnant in a hostile church which sought systematically to exclude them, with little alternative than to contend vigorously for truth” (182). In reading this chapter it was impossible not to think of what had changed and what was similar roughly forty years later in evangelical responses to an Archbishop very similar to Ramsey – Rowan Williams – and what lessons we still need to learn as evangelicals in relation to non-evangelical bishops and Archbishops.
[…]
“A constant theme [of the whole volume] is the diversity and sometimes consequent divisions and tensions among self-identified evangelicals revealing a history where “the ability of evangelicals to co-exist should not be overstated, but neither should it be overlooked” (38). Its various accounts raise the question as to whether we need to escape the myth of a golden age where we were all in broad agreement with one another (with the supposedly crucial role of John Stott in securing this consensus) and instead learn the importance of recognizing that last century there were a number of leading evangelical figures (most of them now forgotten to us) and various places of meeting across different groupings that now need to be re-created in order to share in fellowship, discussion and discernment.

“We will undoubtedly face the future better as evangelicals in the Church of England if we know our past – including our recent past – better and so overcome ignorance and misleading, sometimes polemical and self-justifying, narratives. This collection of papers is an indispensable guide which enables us to do just that.

Reviews in History on Michael Ramsey

The latest review of my book on Michael Ramsey is now in, this time in the online Reviews in History, to which I myself have frequently contributed. It is by Sam Brewitt-Taylor of Lincoln College Oxford, to whom my thanks are due for an engaged, critical and constructive review.

I am of course very pleased that the reviewer thinks that the book:
Ramsey - cover
‘is the best introduction to Michael Ramsey’s archiepiscopacy at Canterbury currently available, and should be read by everyone interested in the state of the Church of England in the 1960s. …. As a report from the archives, The Shape of the Church is highly successful. It is eminently readable, it covers a very good range of issues, and it does so using an excellent level of detail. It makes a valuable contribution to a complicated subject, and it opens up some of the Ramsey archive to a wider readership. It should certainly be included in relevant undergraduate and graduate reading lists…. a fine addition to the literature.

Brewitt-Taylor makes a number of detailed criticisms, which are (as he states) suggestions for further work in situating Ramsey in his historical context. These are all welcome, and I may well return to them in print at a later date.

Jewish artists and the Bible in America

[Extracts of a review recently published in Reviews in History.]

Samantha Baskind, Jewish artists and the Bible in twentieth-century America
(Pennsylvania State UP, 2014: 978-0-271-05983-9)

Scholars of contemporary religious history, of art history, and of the immigrant experience will find much to interest them in this fine volume from Samantha Baskind of Cleveland State University, Ohio. Specialists in British art of the 20th century have long needed to reckon with the work of Jewish artists such as Jacob Epstein or Hans Feibusch. The England in which these immigrants arrived had an established tradition of religious painting and sculpture. Not so in the United States. Within American art, such a tradition of historical and religious painting and sculpture was almost non-existent; landscape, domestic scenes and portraiture were dominant. The question arises, then (which Baskind answers definitively), of why Jewish artists – recent arrivals and unsure of their place and status in a new society – should wish to adopt artistic subject matter that was not part of the common stock of that society.

There is a striking but convincing paradox in Baskind’s answer to the question as to why these images were adopted, which offers a suggestive angle from which to view the immigrant experience more generally. Young immigrant Jews to America and the first generation born there soon found themselves without any connection to the lived experience of a homeland. ‘For these younger American Jews, their native land, their homeland, was the Hebrew Bible. Their sense of locale was not the towns around them but biblical geography – the only Jewish soil they knew’. (p. 3) This tended, however, to produce art that was certainly not for use in public worship (a Christian idea), or for private devotion, and only very loosely intended for use in the religious education of the devout. Instead, it functioned as a means of reflecting on and making sense of contemporary events and of recent history at large, and of personal circumstance: a secularised form of the ancient exegetical technique of midrash.

Some of the biblical subjects under discussion are those that might be expected from a Jewish artist: those from the Hebrew scriptures, the Christian Old Testament. Of particular interest to this reviewer were the examples where these Jewish artists addressed themes from the Christian New Testament: appropriations of Christian themes, refracted through a Jewish lens and presented back to Christian America.

All this is fine work in its own disciplinary terms, but readers who are first and foremost historians may have wished for more on the critical and public reception that these works received, precisely to illuminate some of the questions Baskind raises. How did Jewish observers understand these works as midrashic reflections on the lot of American Jewry? How did Christian commentators receive these ‘foreign’ appropriations of New Testament themes? None of this is to criticise this volume for not achieving that which it does not set out to achieve (a besetting sin of reviewers); but Baskind has opened up several fresh and important lines of enquiry for others to pursue.

The press, Pennsylvania State University Press, are to be congratulated for a lavishly produced volume which is a pleasure to hold, with copious reproductions of works of art, and at an improbably low price of $40. The writing is clear and concise, and often elegant, and the work as a whole is admirably brief. It should find an appreciative readership amongst art historians, but also amongst scholars of identity and the immigrant experience, and of the religious history of modern America.

TLS review of Michael Ramsey book

This week I was delighted to notice a review of my book on Michael Ramsey, in the Times Literary Supplement
(October 30th, p.13; online, but not freely so – scanned PDF here). It is by Peter Sedgwick, retired principal of St Michael’s College, Llandaff, the Anglican theological college. It is the second review from a scholar from within the institutions of the Anglican church, rather than from an historian in the universities (see the first, from Graham James, bishop of Norwich).
Ramsey - cover
It was particularly pleasing that Sedgwick thought it a ‘fine book’, and that he stresses the distinctiveness of it from Owen Chadwick’s magisterial 1990 biography of Ramsey. This is in part to do with the edited sources it includes, but also because it concentrates on the central issues facing Ramsey, which the biographical structure of Chadwick’s study tended to obscure; complex and contentious issues through which Ramsey had to chart a course and which I document ‘excellently’. ‘As you read Webster, the debates and challenges become contemporary, and you wonder how the Archbishop’s staff will swerve around the next pothole in the road.’ Sedgwick concludes that the book ‘has brought [its] in some ways unworldly subject alive in a vivid and well-documented way. It is good to hear Ramsey’s voice again. His vision of a Reformed Catholicism lives on, despite everything.’

An English priest in the beloved country

[Another post related to my occasional series on clergy in fiction. This time, not an English author, but an English character working overseas.]

I can think of no other novel in years that has struck me so forcefully as Cry, the beloved country, by Alan Paton. The book was first published in the UK in 1948 by Jonathan Cape; issued as a Penguin Modern Classic in 1958, and subsequently reprinted almost every year until at least 1982, the year in which my copy was printed. Paton was an educationalist, and campaigner for the rights of the native South African population. He was also a friend of Geoffrey Clayton, archbishop of Cape Town, whose biography he published in 1973.

Why am I so struck by it ? Fundamentally it is because the plot has an intense humanity, intertwining themes of place and home, familial loyalty and parental loss, individual moral responsibility and racial injustice. Part of its achievement is that the novel presents the full range of thought and feeling about the ‘native question’, but is not subsumed by it, as political novels sometimes are.

The Penguin Modern Classics edition, with a cover design by Germano Facetti from an original by Marianne Podlashuc.

The Penguin Modern Classics edition, with a cover design by Germano Facetti from an original by Marianne Podlashuc.

What is also surprising to a modern reader is the style. To readers accustomed to a prosodic palette of Orwellian plainness and the crispness of Evelyn Waugh, Paton’s elevation of style is reminiscent of the fiction of the nineteenth century and seems somehow marooned, out of time. Yet it achieves this heightened registration without pomposity; the elevation of the sentiment is always brought low by the brute tragedy of the matter at hand. And this height is achieved by means which are fast becoming inaccessible to modern readers, in that Paton draws freely not only on explicit Biblical images, but also on the rhythm of Biblical prose. In this, the narrator takes on the voice of the preacher, although this kind of preaching is in eclipse in the modern churches.

The plot centres on Kumalo, a black Anglican priest from the country who comes to Johannesburg in search of his son who (it transpires) has been involved in a botched burglary that resulted in the shooting dead of a white man. The dead man, Arthur Jarvis, was himself a vigorous supporter of change in the lot of the black majority, and an active and young Anglican layman. Kumalo is at the Mission House in Johannesburg when the news breaks, at which point it is not known that it is his son who is the culprit, only that Jarvis grew up in the same part of the country as Kumalo.

The reader is told very little of Father Vincent, ‘the rosy-cheeked priest’ of the Mission House who was also there, save for that he is from England. The two had been talking of their respective homes in the countryside: the white man of ‘the hedges and the fields, and Westminster Abbey, and the great cathedrals up and down the land.’ (p.65) After it becomes clear that Kumalo’s son is under arrest, Father Vincent promises whatever aid he can give. It is Father Vincent who marries Absalom Kumalo and the girl who carries his child in the chapel of his prison as he awaits execution, in order to secure the future of the girl and her child, the senior Kumalo’s grandchild. The words of the service are those of the Book of Common Prayer. In the hands of another novelist the scene might be desperate, even horrific; but in Paton’s handling it emerges as dignified, as the couple promise to be faithful for better, for worse, til death should part them.

It is also significant that it is the white priest, an Englishman, who is able to uphold Kumalo, the priest who is also the loser of a son, in a scene of great pastoral sensitivity between two men of the same calling, of which there can surely be very few in modern fiction (Book 1, chapter 15). Despite himself, Vincent manages to resist the temptation to offer facile words in the face of Kumalo’s desolation. Instead, he allows Kumalo to voice his bewilderment at his situation, in which God seems to have turned from him. He then leads Kumalo out of his focus on self to the need to see repentance on the part of his son. Finally he is able to send Kumalo away to prayer, again not for himself, or for some explanation as to why, or for his son alone, but for everyone else touched by the tragedy: for the bereaved family, for the girl soon to be left a single mother and for her child, for Vincent and his colleagues ‘who try to rebuild in a place of destruction, and ‘for all white people, those who do justice, and those who would do justice if they were not afraid.’ It is part of the priestly calling to remember, and to model to others, that ‘it is Christ in us, crying that men may be succoured and forgiven, even when He Himself is forsaken.’