What use is a personal tweet archive ?

A little while ago I wrote a post about the need to plan for archiving the digital “papers” of historians. In that post I talked about research data (what we used to called “notes”); about the systems that form the bridge between that data and the writing process; and about written outputs themselves, and their various iterations. It looked forward to a time when all these digital objects, in multiple formats but from one mind, are available to future students of the way the discipline has developed.

What that post neglected was data about the way I publicise my work. Perhaps one of the reasons we’ve been slow to think about this is that, at one time, most academics didn’t need to. Apart from giving papers at gatherings of the learned, the task of publicising one’s work belonged to the publisher. And if one’s publisher was the right one, then the work would inevitably end up in the hands of the small group of people who needed to know about it. And whilst the media don is not a new phenomenon, most historians might have thought such self-publicity outside the academy something of an embarrassment, even rather vulgar.

How times change. Universities are training their staff in dealing with the traditional media and in the most effective way of using social media. And this opens up a new category of data that ought to be archived, if only to understand how the push for ‘impact’ actually played out in these early years. And some of it is being archived. The Library of Congress are archiving every tweet, although it isn’t yet clear how that archive may be made available for use. The UK Web Archive, along with other national web archives, have been archiving selected blogs (including this one) for several years, and the EU-funded BlogForever project is looking to join those projects up. But this approach, valuable though it is, separates the content from the author, and from the rest of their digital archive. Whilst that link might be retrievable at a higher discovery layer, something important is still lost.

But now the helpful folk at Twitter, in a move that ought to be applauded, have made it very quick and easy to download an archive of one’s own tweets, right back to the beginning. And so I did: 1682 tweets, over 14.5 months. But what to do with it ?

Straight away, scrolling through a long CSV file starts to tell the story of the making of other things: the first retweet of someone else’s work which was subsequently to influence my own; the first traces of an idea, or even of a question I was beginning to ask, which spawned a blog post, and then a paper. I also find that I shared at least one link in more than two thirds of my tweets, which sounds public-spirited until I add that a good proportion were my own posts. I can start mining the data for key terms and themes, and how they ebbed and flowed.

It would be useful if there was a way to keep this data fresh, of course, to avoid going back to Twitter for a new download every so often. And, thanks to @mhawksey, there is a simple way of doing this, using Google Drive. Martin explains all here, with a handy video set-up guide.tweet archive

And so I now have a cloud-based archive of my tweets, complete with a basic search and browse web interface. This is now a lazy man’s look-up of old tweets and the resources they pointed to, searchable by handle, hashtag or key term.

But perhaps this is something about which most people are lazy. Social media provides us with an overwhelming stream of quite-interesting things, in amongst which are nuggets of gold. Those nuggets I can manage in the old way, by recording them properly, perhaps in a bibliography. I might even read them, one day. But the quite-interesting stuff, whilst being too much ever to record properly, will probably remain quite interesting. And so this provides a middle way between formal curation of a webliography and just searching the live web (which assumes I can remember enough about what I’m looking for.)

Might this archive now change my future tweeting ? Early days to judge perhaps. But I think it may, since I may now retweet and share in preference to using favourites, in order to get a link to a resource into the archive. I can also imagine starting to use personal hashtags, as a way of structuring my own archive at the same time as I tweet. Real-time curation perhaps ?

And I might share it too. Since this is now unambiguously my own data, rather than Twitter’s, I can licence it for reuse by others in larger corpora for analysis. Imagine a pooled archive of the tweets of many historians. Now that would be interesting.

On historians’ electronic ‘papers’

[This post was written for inclusion in the blog of the Institute of Historical Research’s winter conference on History and Biography]

I should say straight away that I am neither an archivist, nor a specialist in digital preservation (in its strict sense.) But I am an historian, and professionally interested in the impact of the digital on our working practices; and during the working day I am on the staff of one of the UK’s main memory institutions. And I’m pleased to have been asked to write this piece by my former colleagues at the IHR, as while there is much going on at present relating to the management of research data, there is much less (that I know of) about the private papers of scholars. What is the infrastructure for preserving these materials, of historians, for historians? Is there even an infrastructure worth the name?

Straight away, there is a problem of definition – of distinguishing between what we might call research data and private papers. In the physical sciences, it is easier to spot the data; lots of numbers in tables, on computers, as opposed to the reams of transcribed or part-transcribed primary sources that I still have from my own Ph.D. And in the physical sciences there has been a much stronger culture of the re-use of data by other scholars. In order to test and refine a hypothesis, it helps to be able to repeat experiments, and for that you need the data. And so that data tends to be ‘cleaner’ – well-defined and structured, with appropriate documentation – and thus easier to share. And so there are services such as Dryad, a discipline-specific data repository designed for specifically this purpose.

Historians have been much less accustomed to this way of working. This is partly because our ‘data’ tends to be angular, asymmetric texts that resist being squashed into anything so restrictive as a table. And there is an attachment among many to the thick description of each source and all its meaning, particular to a time, a place and an individual, and a resistance to abstraction. (To paraphrase J. H. Hexter, the splitters tend to dominate the lumpers.) There are exceptions, but the cliometric urge is not as strong as once it was.

This attachment to the particular is something to be cherished, but I would argue that there is yet more scope for historians to think of their working materials as data, and thus as something that may be shared and re-used. The Old Bailey Proceedings Online is a fine example of a corpus of freely composed texts that has within it a dataset. Not all sources have the degree of regularity of structure that a set of court records has; but there is still much material that languishes on desktop machines that might be set free. But it would require us to think about reuse at the beginning of a project, rather than at the end.

And as well as primary sources that might be shared and reused, there is the question of an historian’s intermediate working materials, that mark the stages by which primary sources are digested and turned into writing. The London Review of Books recently published Keith Thomas’ account of his own working method, thousands of bulging white envelopes full of notes; Christopher Hill was famous for his system of index cards. As evidence of the working practices of a discipline, these paper systems are an artefact to be preserved. As scholars increasingly move to digital systems of managing notes and bibliography, some using proprietary software and some the cloud, we also need to think about how these are best preserved as evidence of how the discipline worked at a particular point in time.
matthewtlynch-flickr
And finally, there is writing. Historians of a certain age will remember a device commonly known as a ‘typewriter’ which impressed characters on a sheet of paper, by a mechanism operated by the pressing of keys. (You can see examples in museums sometimes.) And the use of the typewriter meant that, for every iteration of a piece of writing, there was a physical record. (The typewriter was, as it were, sub-optimally featured for corrections.) The ease of emendation of a word-processed document probably means that these intermediate versions no longer exist. But where they do (and I myself tend to keep numbered versions of articles to reflect each revision), they are a valuable record of the evolution of a piece of writing and the thinking that supports it, and part of intellectual biography.

But who should be preserving these materials? In the past, for the most prominent, an existing connection with an institution tended to lead to their papers being held there: the papers of Noel Annan now reside at King’s College, Cambridge, of which he was Fellow and later Provost; those of E. H. Gombrich at the Warburg Institute at which he spent most of his career. The British Library also receives a certain number of digital archives, but mostly from prominent literary figures, such as the recent deposit from the poet Wendy Cope. But there is a need for a more scaleable solution. Part of this is certainly the recent ventures in services that enable personal digital archiving. But these tend to require a certain level of skill in the issues involved (and for one to be not yet dead) and so there is a place in this new ecology of preservation for organisations, such as the IHR, with an established presence as a repository and clearing-house for a discipline. And as collections of discipline-specific materials grow over time, those collections would become in themselves more than the sum of their parts – part of the stuff of a laboratory for the history of history.

[Picture via matthewtlynch on Flickr, CC BY-NC-SA]