While working on another project, I’ve had occasion to make some data relating to the blog aggregator site britishblogs.co.uk (now apparently defunct) which occurs in the Internet Archive between 2006 and 2012. I am unlikely to exploit it very much myself, and so I have made it available in figshare, in case it should be of use to anyone else.
Specifically, it is data derived from the UK Host Link Graph, which states the presence of links from one host to another in the JISC UK Web Domain Dataset (1996-2010), a dataset of archived web content for the UK country code top level domain captured by the Internet Archive.
It has 19,423 individual lines, each expressing one host-to-host linkage in content from a single year.
Since the blog as a format seems to be particularly prone to disappearance over time, scholars of the British blogosphere may find this useful in locating now defunct blogs in the Internet Archive or elsewhere. My sense is that the blogs included in the aggregator were nominated by their authors as being British, and so this may be of some help in identifying British content in platforms such as WordPress or Blogger.
Some words of caution. The data is offered very much as-is, without any guarantees about robustness or significance. In particular:
(i) I have made no effort to de-duplicate where the Internet Archive crawled the site, or parts of it, more than once in a single year.
(ii) also present are a certain number of inbound links – that is to say, other hosts linking to britishblogs.co.uk. However, these are very much the minority.
(iii) there is also some analysis needed in understanding which links are to blogs, and which are to content linked to from within those blogs (and aggregated by British Blogs).