Automated "hate speech" suppression probably means institutionalized social engineering

Writeup of my Tokyo presentation

Mar 30, 2023

Yes. I’m in Tokyo again.

This city is like totally massive ocd coupled with a kitsch super-commercialized aesthetics that's almost unconsciously ironic, sort of like everyone here really, in their heart of hearts, hates the entire consumerist spectacle and constantly suffers under tremendous strains of repression but because of some weird double-bind can't explicitly and directly engage with it, so everything has to come out in these passive-aggressive ways.

It’s an amazing Byzantine horrorshow and the toilets are totally fascist.

Anyway, hate speech. So what is hate speech?

We’re actually not really sure – or rather, the regulators and the proprietors of the online platforms for whom hate speech suppression is a priority, aren’t really sure.

At the moment, there’s no really coherent definition of the concept that’s generally accepted, and those predominantly in use are vague and to some extent arbitrary – something which invites a whole host of problems, especially when it comes to automatic moderation and censorship.

That doesn’t mean that hate speech, or similar forms of disruptive discourses, are particularly difficult to define in principle, qualitatively speaking. In the Roman Empire, for example, to publicly address someone in contravention of good morals was punishable by law. Similar sorts of injunctions against insults and defamations have been commonplace throughout the world’s legal systems, and the very notion of speech acts that are so disruptive that they need to be suppressed is relatively straightforward.

Thus, there are many ways we could in principle establish a coherent and non-vague definition of something like hate speech. One obvious approach is to establish a set of explicit moral principles that allow us to define which speech acts are to be prohibited. Another rather blunt way to go about this, is to simply from a consensus establish a set list of prohibited and clearly delineated speech acts, such as set of ideas which you are forbidden to express in public.

But if we go with the first option, i.e. if we begin with a set of moral principles, there are then basically three ways we can move forward and apply these moral principles to actual speech, three ways in which we can focus and ethically judge the character of the speech act, which all have somewhat different consequences in practice.

First of all, you can focus the consequences of the speech act and/or the ideas involved, formally speaking. If I make an utterance to you, the effect in context is then what renders the speech act permissible or not (according to one’s chosen set of normative ethics).

Secondly, we can focus the teleology of the speech act, as per Aristotelian metaphysics. Given this perspective, it’s the intentionality and tendency of the complete speech act that’s in focus. The way in which the act is directed to certain ends, irrespectively of any actual consequences. So if I address you with a slur, what renders the speech act impermissible or not is my intention PLUS the actual character of what is being said and the way it will tend to impact upon a certain audience in a particular communicative setting. But since we focus the teleology, i.e. the inherent tendency of the act, it doesn’t matter if we don’t get any negative results. It’s hate speech if it was intended as such, and/or if the speech act had the inherent tendency of hate speech, if it was the kind of statement that would have been received as hate speech, even if nobody hears the tree fall in the forest, so to speak.

Third, we have the purely formal approach. Here, the essential character of the ideas expressed or speech act performed is in focus. So what’s the difference in relation to the other two positions? Well, on this model, it’s strictly the meaning of the ideas expressed that counts. It doesn’t matter what effects the expressions have in practice, or what the inherent tendencies of the ideas or speech acts are – what’s in focus is the objective assertion as such. If the meaning of what’s being said is immoral or impermissible, it gets designated as hate speech.

This is similar to the notion of formal heresy of Catholic Christianity or the notion of shirk within Islam.

So. In the context of our contemporary global digital framework that has a decidedly secularized character, and which spans a vast number of cultures, traditions and worldviews, there’s a significant problem with these approaches towards a comprehensive definition of hate speech that I’ve just discussed, which are based in the application of moral principles. What is that problem?

There’s no real foundation to build on. What are good morals? Which tradition will supply the unambiguous foundations of ethics to guide the automated mass censorship and moderation taking place on the digital platforms throughout the world? I personally have my own set of religious values that I consider normative, for you as well as for me, but representatives of other traditions will not always agree with me – and my values do stand in sharp contrast to those of secular society.

Still, to find common ground is not an insuperable obstacle, at least not between different religious traditions. So I think there’s plenty of room for a general agreement between, say, Catholics, Daoists, Hindus, Muslims and Buddhists, as to what constitutes good morals for interpersonal communication, and it’s not inconceivable that we could establish an ongoing interreligious dialogue to support the moderation of communications in the digital sphere. The Daost emphasis on harmony and balance is perfectly compatible with the Catholic principles of inherent human dignity as well as the Muslim’s principle of the essential unity of the good of all creation inherent in the notion of tawhid; when you suffer, we all suffer.

All of that is well and good. However, I don’t think secular morals by themselves can provide a lasting foundation for this sort of applied ethics, which is probably one of the key reasons as to why the hate speech injunctions so far have been incoherent and arbitrary.

The emergence of a normative structure for global digital communications might in that sense catalyze a return of religion to the public sphere in a quite direct sense.

Still, I think the opposite tendency is more probable – that an arbitrary set of secular ethics will become normative through these systems of social and narrative control, and that this process will tend to push aside the values and worldviews of religious traditions, not least the non-Western ones.

The limits of suppression

So these were more general remarks in terms of issues surrounding automated hate speech injunctions and moderation as such. But as I and professor Mika Hietanen have seen in our research, people are for the most part pretty good at strategizing around these prohibitions. Keyword filters are more or less useless, and even more advanced algorithms trained by internet users to recognize and flag instances of hate speech are relatively easy to circumvent – for instance by the use of indirect speech or communication though evolving code words and symbols. The filter we trained did a decent job of flagging direct hate speech, but basically only flagged more tacit material by chance.

So is an image of Pepe, the green frog of the American Alt-Right movement, stating “it’s ok to be white” – is this an instance of hate speech? Of course not.

And to suppress something like that would mean the end of liberal democracy as we know it. Imagine what happens to freedom of speech and the press if dominant platforms for communication began suppressing content due to vague associations with politically undesirable ideas.

Nonetheless, through the contingent established associations of this particular cartoon and the slogan, its dissemination does indirectly support such discourses that, taken as a whole, fulfill the same function as immediate hate speech. This is a problem.

Then the question follows, which was raised by Mika and I during our work last year; whether there could be more precise ways to automatically filter these complex and indirect acts of communication to minimize arbitrariness - and mimic the complex awareness of indirect signification that human observers possess. A human observer is immediately aware of complex discursive associations that sometimes latch onto otherwise innocuous content, which in itself does not motivate suppression - but sometimes can warrant caution with regard to the interpretation of the content matter and associated material. Could an AI in a similar sense be used to flag certain discursive clusters as warranting caution, and successfully identify potentially disruptive acts of communication of an indirect character? Acts of communication that operate at a higher level of abstraction; through tacit associations, implied symbolic signaling or the like?

The short answer is yes, in theory. In practice – we’re for instance approaching this capacity through the recent rollout of the latest version of ChatGPT and similar forms of technology, which enables complex associations through absolutely massive layers of data. Given this background, there seem to be several models or theoretical frameworks that could be used to enable an AI to accurately flag multi-layered discursive clusters.

Grice’s model of cooperativity in dialogue is one example. His generic principles of cooperativity in dialogue could be employed in a negative sense, i.e. to identify communication that’s maximally uncooperative, and establishing this pattern as a proxy for discursive clusters that are potentially disruptive. Sure. The problem is that you need a comprehensive surveillance of the complete environment of communication. You need to collect the relevant set of discourses that populate the context, and you need some form of assessment of the communicative history of all of the agents involved in the communicative situation. This is probably technically feasible at this moment in time, and the set of potential negative consequences is obviously going to underwrite an immense shitshow.

You could also go with Searle’s speech act theory, focusing communicative events that function as complex performative utterances. Searle’s basic model is three-partite; we have the act of saying something and how that act is framed in a specific context; we have what you’re more specifically doing in saying something, such as making a request or expressing gratitude, and we have the while spectrum of implicit or explicit ways in which the speaker is trying to affect the audience. So you in effect combine an analysis of formal content with rhetoric and performativity theory. To identify discursive clusters using a model such as Searle’s could probably be done with some level of accuracy, sure, but again, you need massive levels of data collection and a thorough surveillance of the agents involved and their respective communicative histories.

Notwithstanding these problems, I’m convinced that these or similar structures of narrative control will be established unless the entire digital sphere collapses or something similarly catastrophic.

And I think all of this is utterly dystopian, to be honest.

But one might ask if we couldn’t get out ahead of this problem and nip it in the bud. If we couldn’t scramble and rapidly foster the use of these technological approaches in some other way than outright or indirect censorship, if there are conceivable alternatives to flagging certain complexes of communicative acts as potentially disruptive, with all that this implies in terms of policy, the institutionalization of immense systems of control, and the indirect infringement of established rights?

There’s a group of researchers employing Grice’s theory of incorporated cooperativity towards something like this very end, their big idea here was to establish a type of machine learning geared to expound upon the meta-communicative situation through strategically “asking the human user for clarification”. So one way to employ this sort of AI would be to introduce it as a party in digital communications online, sort of as a Twitter or Facebook bot that enters into a group discussion and asks pointed questions of the purveyors of potentially disruptive discourses … So when your game loses the match, for instance, and you log on to your Facebook group to point out that the referees are morons, then this AI white knight uncannily reminiscent of a human user comes in and points out that : ”X CAN POSSIBLY PERCEIVE THIS STATEMENT AS DEMEANING”

There might be MANY more options I’m not at all seeing here, but this sort of limited nudging approach seems to me possibly worse than mere censorship and surveillance; this is rather the explicit combination of mass surveillance with targeted and automated social engineering.

+++

So there are the options for you. We have social engineering or outright censorship to choose between in terms of automated hate speech suppression online. Both in concert with supermassive surveillance.

And then we’re right back at the first problem once again. Because however we cut this, we’re going to have to face the question of whose values and worldviews the automated hate speech suppression will reproduce. In any event, this should be made very explicit. We should know exactly that the algorithms will tend to foster values ABC and worldviews XYZ. The entire process is much more likely, however, to be painted out as a neutral endeavor supporting vague, malleable and general conceptual constructs such as “human rights”, while it in practice, and quite tacitly, will reproduce the values and worldviews that facilitate the bottom line of the corporate entities involved.

Oh, and please pray for my dog.

Navyo Ericsen

Mar 30, 2023Edited

Big loving prayers for your dog, Johan...

As regards any regulation or suppression of so-called 'hate speech', forget it. It's the old cookie-jar mentality. If you can't have something, the impulse is to go and get it. Suppression leads to repression and society as a whole can't function with this level of control, and like a fetid pustule it explodes with poison.

The entire idea of 'hate speech' is defined and moderated by the abusrdist woke mob that are hell-bent on creating this dystopia. On the other hand, free speech, meaning completely free speech including what is defined as 'hate' (mostly political and ideological dissent) leaves us to sort it out among ourselves with our own intelligence and maturity.

But the dumbing down of not just America but the entire world enables this moralistic twaddle to continue unabated. Go on, hate me if you can!

Expand full comment

2 replies by laughlyn (johan eddebo) and others

2 more comments...

shadowrunners

Discussion about this post