How Forensic Linguistics is Helping to Identify Online Trolls

Internet anonymity seems to bring out the very worst in us. From going off-topic or goading other internet users to make inflammatory comments that offend everyone, trolls irritate a lot of people. But trolling isn’t just an annoyance – bad behaviour online can also include very serious harassment. Some of the worst examples of this behaviour include leaving malicious messages on tribute pages to people that have passed away.

This has a devastating impact on grieving families and leaves everyone with a sour taste in their mouths. Free speech is important but online anonymity can so easily turn very ugly.

The anonymity trolls enjoy seems to be a major contributing factor to their behaviour. Many internet users believe that anything they do online is consequence-free – for them at least. It’s particularly hard to challenge that behaviour when some trolls use software that conceals their IP address or even impersonate other people online. If trolls are using other peoples’ identity to post offensively online, it’s hard to prove who is really behind the comments.

But maybe linguistics can help track these trolls down. Investigators are starting to consider using linguistic fingerprinting methods to help identify individuals using their speech patterns.

This method works on the assumption that we all have our own individual linguistic quirks and unique vocabulary. Some of the things we say may betray who we are, our experiences and background, or at least where we come from.

Forensic linguistics in action

Linguistic recognition techniques have been useful in a recent New Orleans defamation case in which two internet commentators were particularly active and vicious about a local businessman and various judges – both seemed to have unusually high levels of insight into the case and the workings of the justice department.

Strikingly pompous turns of phrase linked the case to some of the businessman’s previous rivals in an election for public office. Just a small handful of nouns, creatively employed in unique ways, helped linked the troll posts to previously published writings by one lawyer in particular.

Confronted with this evidence, one of the anonymous posters eventually confessed and was demoted. In this case, linguistic identifiers played a key role in identifying who was behind the trolling. Both online commentators had chosen usernames that were not unconnected to them personally – one had even included their birth year in the username.

But it’s hard to definitely prove that trolls are who you suspect them to be – it’s both easy and common for trolls to imitate other people online and falsely implicate them. In the New Orleans case, linguistics just helped provide strong evidence of who the likely candidates were based on how likely it was they authored the posts.

Linguistic identification is less certain than biological identifiers such as DNA and fingerprint evidence. DNA matches between a suspect and the scene of a crime are far more reliable than turns of phrase; after all, language isn’t unique to any particular individual. But it may be enough to change the direction of the investigation.

These techniques were used to help track down the Unabomber, Ted Kaczynski. Similarities in the language used by the bomber were enough to convince the authorities to raise a warrant to search Kaczynski’s cabin.

Using machine learning to track down trolls

Trolling is a huge phenomenon and it’s one that’s a serious threat both to individuals and also to online communities. Twitter’s failure to thrive and attract new users is at least partly attributed to its failure to crack down on noxious commentators and hate speech, particularly aimed at women expressing any opinions online.

But trolling and harassment are very difficult issues to tackle at scale – human moderation is just way too costly. Many sites use AI moderation tools such as those provided by Tint to try to stop online abuse between users and reject messaging deemed offensive. But these tools are based on weeding out offensive keywords – more recently tools have advanced to the point where they can identify abusive messages even if they don’t employ slurs or derogatory language.

The Perspective tool created by a Google subsidiary aims to calculate the degree of toxicity of a comment made online and can be configured to the web editor’s desired settings – enough for a healthy debate without things turning ugly.

Yahoo is working on its own tool to identify hateful speech based not only on the words used but also sentence structure. It’s a sophisticated way to weed out abusive language that works for almost 90% of comments. Whilst this may just mean trolls have to work harder to abuse people online, it’s at least a step towards fighting bad behaviour online.

Advanced AI could now be used to moderate online comment forums and remove the toxicity that mars these discussions. AI’s semantic understanding is coming along in leaps and bounds – in the future, automated moderation could really help crack down on harassment online without needing the painstaking and costly attention of human moderators.

It’s a power that could very easily be abused to suppress freedom of speech but it could help save platforms such as Twitter that regularly lose users that are sick of the harassment they receive. Machine intelligence could help fight trolling to ensure we don’t need to try and find who they are using advanced linguistic profiling.