The ethics of search filtering and big data: who decides ?

[Reflecting on discussions at the recent UK Internet Policy Forum, this post argues that societies as moral communities need to take a greater share in the decision-making about controversial issues on the web, such as search filtering and the use of open data. It won’t do to expect tech companies and data collectors to settle questions of ethics.]

Last week I was part of the large and engaged audience at the UK Internet Policy Forum meeting, convened by Nominet. The theme was ‘the Open Internet and the Digital Economy’, and the sessions I attended were on filtering and archiving, and on the uses of Big Data. And the two were bound together by a common underlying theme.

That theme was the relative responsibilities of tech providers, end users and government (and regulators, and legislators) to solve difficult issues of principle: of what should (and should not) be available through search; and which data about persons should truly be regarded as personal, and how they should be used.

On search: last autumn there was a wave of public, and then political concern about the risk of child pornography being available via search engine results. Something Should Be Done, it was said. But the issue – child pornography – was so emotive, and legally so clear-cut, that important distinctions were not clearly articulated. The production and distribution of images of this kind would clearly be in contravention of the law, even if no-one were ever to view them. And a recurring theme during the day was that these cases were (relatively) straightforward – if someone shows up with a court order, search engines will remove that content from their results, for all users; so will the British Library remove archived versions of that content from the UK Legal Deposit Web Archive.

Monitor padlock
But there are several classes of other web content about which no court order could be obtained. Content may well directly or indirectly cause harm to those who view it. But because that chain of causation is so dependent on context and so individual, no parliament could legislate in advance to stop the harm occurring, and no algorithm could hope to predict that harm would be caused. I myself am not harmed by a site that provides instructions on how to take one’s own life; but others may well be. There is also another broad category of content which causes no immediate and directly attributable harm, but might in the longer term conduce to a change in behaviour (violent movies, for instance). There is also content which may well cause distress or offence (but not harm); on religious grounds, say. No search provider can be expected to intuit which elements of this content should be removed entirely from search, or suggest to end users as the kind of thing they might not want to see.

These decisions need to be taken at a higher level and in more general terms. It depends on the existence of the kind of moral consensus which was clearly visible at earlier times in British history, but which has become weakened if not entirely destroyed since the ‘permissive’ legislation of the Sixties. The system of theatre censorship was abolished in the UK in 1968 because it had become obvious that there was no public consensus that it was necessary or desirable. A similar story could be told about the decriminalisation of male homosexuality in 1967, or the reform of the law on blasphemy in 2008. As Dave Coplin of Microsoft put it, we need to decide collectively what kind of society we want; once we know that, we can legislate for it, and the technology will follow.

The second session revolved around the issue of big data and privacy. Much can be dealt with by getting the nature of informed consent correct, although it is hard to know what ‘informed’ means; difficult to imagine in advance all the possible uses that data might be used, in order both to put and to answer the question ‘Do you consent?’.

But once again, the issues are wider than this, and it isn’t enough to declare that privacy must come first, as if this settled the issue. As Gilad Rosner suggested, the notion of personal data is not stable over time, or consistent between cultures. The terms of use of each of the world’s web archives are different, because different cultures have privileged different types of data as being ‘private’ or ‘personal’ or ‘sensitive’. Some cultures focus more on data about one’s health, or sexuality, or physical location, or travel, or mobile phone usage, or shopping patterns, or trade union membership, or religious affiliation, or postal address, or voting record and political party membership, or disability. None of these categories is self-evidently more or less sensitive than any of the others, and – again – these are decisions that need to be determined by society at large.

Tech companies and data collectors have responsibilities – to be transparent about the data they do have, and to co-operate quickly with law enforcement. They also must be part of the public conversation about where all these lines should be drawn, because public debate will never spontaneously anticipate all the possible use cases which need to be taken into account. In this we need their help. But ultimately, the decisions about what we do and don’t want must rest with us, collectively.