On historians’ electronic ‘papers’

[This post was written for inclusion in the blog of the Institute of Historical Research’s winter conference on History and Biography]

I should say straight away that I am neither an archivist, nor a specialist in digital preservation (in its strict sense.) But I am an historian, and professionally interested in the impact of the digital on our working practices; and during the working day I am on the staff of one of the UK’s main memory institutions. And I’m pleased to have been asked to write this piece by my former colleagues at the IHR, as while there is much going on at present relating to the management of research data, there is much less (that I know of) about the private papers of scholars. What is the infrastructure for preserving these materials, of historians, for historians? Is there even an infrastructure worth the name?

Straight away, there is a problem of definition – of distinguishing between what we might call research data and private papers. In the physical sciences, it is easier to spot the data; lots of numbers in tables, on computers, as opposed to the reams of transcribed or part-transcribed primary sources that I still have from my own Ph.D. And in the physical sciences there has been a much stronger culture of the re-use of data by other scholars. In order to test and refine a hypothesis, it helps to be able to repeat experiments, and for that you need the data. And so that data tends to be ‘cleaner’ – well-defined and structured, with appropriate documentation – and thus easier to share. And so there are services such as Dryad, a discipline-specific data repository designed for specifically this purpose.

Historians have been much less accustomed to this way of working. This is partly because our ‘data’ tends to be angular, asymmetric texts that resist being squashed into anything so restrictive as a table. And there is an attachment among many to the thick description of each source and all its meaning, particular to a time, a place and an individual, and a resistance to abstraction. (To paraphrase J. H. Hexter, the splitters tend to dominate the lumpers.) There are exceptions, but the cliometric urge is not as strong as once it was.

This attachment to the particular is something to be cherished, but I would argue that there is yet more scope for historians to think of their working materials as data, and thus as something that may be shared and re-used. The Old Bailey Proceedings Online is a fine example of a corpus of freely composed texts that has within it a dataset. Not all sources have the degree of regularity of structure that a set of court records has; but there is still much material that languishes on desktop machines that might be set free. But it would require us to think about reuse at the beginning of a project, rather than at the end.

And as well as primary sources that might be shared and reused, there is the question of an historian’s intermediate working materials, that mark the stages by which primary sources are digested and turned into writing. The London Review of Books recently published Keith Thomas’ account of his own working method, thousands of bulging white envelopes full of notes; Christopher Hill was famous for his system of index cards. As evidence of the working practices of a discipline, these paper systems are an artefact to be preserved. As scholars increasingly move to digital systems of managing notes and bibliography, some using proprietary software and some the cloud, we also need to think about how these are best preserved as evidence of how the discipline worked at a particular point in time.
And finally, there is writing. Historians of a certain age will remember a device commonly known as a ‘typewriter’ which impressed characters on a sheet of paper, by a mechanism operated by the pressing of keys. (You can see examples in museums sometimes.) And the use of the typewriter meant that, for every iteration of a piece of writing, there was a physical record. (The typewriter was, as it were, sub-optimally featured for corrections.) The ease of emendation of a word-processed document probably means that these intermediate versions no longer exist. But where they do (and I myself tend to keep numbered versions of articles to reflect each revision), they are a valuable record of the evolution of a piece of writing and the thinking that supports it, and part of intellectual biography.

But who should be preserving these materials? In the past, for the most prominent, an existing connection with an institution tended to lead to their papers being held there: the papers of Noel Annan now reside at King’s College, Cambridge, of which he was Fellow and later Provost; those of E. H. Gombrich at the Warburg Institute at which he spent most of his career. The British Library also receives a certain number of digital archives, but mostly from prominent literary figures, such as the recent deposit from the poet Wendy Cope. But there is a need for a more scaleable solution. Part of this is certainly the recent ventures in services that enable personal digital archiving. But these tend to require a certain level of skill in the issues involved (and for one to be not yet dead) and so there is a place in this new ecology of preservation for organisations, such as the IHR, with an established presence as a repository and clearing-house for a discipline. And as collections of discipline-specific materials grow over time, those collections would become in themselves more than the sum of their parts – part of the stuff of a laboratory for the history of history.

[Picture via matthewtlynch on Flickr, CC BY-NC-SA]