In the last few weeks I’ve been to several conferences on the issue of the preservation of online content for research, and in particular social media. This is an issue that is attracting a lot of attention at the moment: for examples, see Helen Hockx-Yu’s paper for last year’s IFLA conference, or the forthcoming TechWatch report from the Digital Preservation Coalition. As I myself blogged a little while ago, and (obliquely) suggested in this presentation on religion and social media, there’s growing interest from social scientists in using social media data – most typically Twitter or Facebook – to understand contemporary social phenomena. But whereas users of the archived web (such as myself) can rely on continued access to the data we use, and can expect to be able to point to that data such that others may follow and replicate our results, this isn’t the case with social media.
Commercial providers of social media platforms impose several different kinds of barriers: These can include: limits on the amount of data that may be requested in any one period of time; provision of samples of data created by proprietary algorithms which may not themselves be scrutinised; limits on how much of and/or which fields in a dataset may be shared with other researchers. These issues are well-known, and aren’t my main concern here. My concern is with how these restrictions are being discussed by scholars, librarians and archivists.
I’ve noticed an inability to imagine why it is that these restrictions are made, and as a result, a struggle to begin to think what the solutions might be. There has been a similar trend amongst the Open Access community, to paint commercial academic publishers as profit-hungry dinosaurs, making money without regard to the public good element of scholarly publishing happens. Regarding social media, it is viewed as simply a failure of good manners when a social media firm shuts down a service without providing for scholarly access to its archive, or does not allow free access to and reuse of its data to scholars. Why (the question is implicitly posed) don’t these organisations do the Right Thing? Surely everyone thinks that preserving this stuff is worthwhile, and that it is a duty of all providers?
But private corporations aren’t individuals, endowed with an idea of duty and a moral sense. Private corporations are legal abstractions: machines designed for the maximisation of return on capital. If they don’t do the Right Thing, it isn’t because the people who run them are bad people. No; it’s because the thing we want them to do (or not do) impacts adversely on revenue, or adds extra cost without corresponding additional revenue.
Fundamentally, a commercial organisation is likely to shut down an unprofitable service without regard to the archive unless (i) providing access to the archive is likely to yield research findings which will help future service development, or; (ii) it causes positive harm to the brand to shut it down (or helps the brand to be seen *not* to do so.) Similarly, they are unlikely to incur costs to run additional services for researchers, or to share valuable data unless (again) they stand to gain something from the research, however obliquely, or by doing so they either help or protect the brand.
At this point, readers may despair of getting anywhere in this regard, which I could understand. One way through this might be an enlargement of the scope of legal deposit legislation such that some categories of data (politicians’ tweets, say, given the recent episode over Politwoops) are deemed sufficiently significant to be treated as public records. There will be lobbying against, surely, but once such law is passed, companies will adapt business models to a changed circumstance, as they always have done. An even harder task is so to shift the terms of public discourse such that a publicly accessible record of this data is seen by the public as necessary. Another way is to build communities of researchers around particular services such that generalisable research about a service can be absorbed by the providers, thus showing that openness with the data leads to a gain in terms of research and development.
All of these are in their ways Herculean tasks, and I have no blueprint for them. But recognising the commercial realities of the situation would get us further than vague pieties about persuading private firms to do the Right Thing. It isn’t how they work.