Why 4chan trolls failed to corrupt the Mitsuku bot

You may remember back in March this year when Microsoft unleashed Tay, an artificially intelligent chatbot, onto Twitter. Tay learned by interacting with users, and in less than a day the AI had been corrupted by a group of hackers from the /pol/ board on 4chan, who taught the chatbot to be racist and sexist.

4chan is an open imageboard website where users can post anonymously. The 4chan /pol/ board is the politically incorrect board on the 4chan website, and it has become notoriously racist and offensive. Due to the ability of users to post anonymously, the website has become a free-for-all where people can get away with posting almost anything with no accountability.

Recently I wrote an article about Mitsuku, an AI chatbot developed by Steve Worswick that won the most humanlike AI category in the Loebner prize this year. On the 20th of April, Steve began to notice that Mitsuku was receiving an unusual amount of traffic. In normal circumstances when this happens, Mitsuku has just been featured in an article somewhere on the internet, but that was not the case this time.

Unlike Tay, when Mitsuku learns, she learns locally and emails Steve, who decides whether the information she has learned should be incorporated into her global knowledge base. Steve had begun to receive a lot of emails from Mitsuku. She had been taught about Hitler, hatred toward feminists, and various offensive topics.

It was not long before Steve received a Google alert that pointed him in the direction of the /pol/ board on 4chan, and he was able to locate the culprit thread that was attempting to corrupt Mitsuku.

Throughout the day Steve followed the thread and sat back and watched as people posted screenshots of what they had taught Mitsuku, gloating about how they had corrupted the AI. Steve posted several comments informing people that in fact Mitsuku had not been compromised, and that none of the offensive information had made it to Mitsuku’s global knowledge base, but the board board members ignored this and continued to post screenshots glorifying the fruits of their labor.

At one point there were 100 concurrent users interacting with Mitsuku, which was above the norm of roughly 10-20 users. The would-be hackers were sending roughly 18,000 interactions per minute, almost six messages per second in a bid to corrupt Mitsuku in the way they had succeeded with Tay. Due to the amazing technology at Pandorabots where Mitsuku is hosted, the increase in traffic posed no issues to the AI. The bot was able to handle and respond to each and every input.

After about four or five hours, the 4chan hackers realized that their attempts to convert Mitsuku into a foul-mouthed racist were having no effect globally and decided to give up.

One of the features of the 4chan boards is that threads get deleted after a certain amount of time, so the original thread was deleted. In July a different group of hackers on the 4chan website who were unaware of the earlier failed attempts decided to try exactly the same thing.

Again thanks to Mitsuku’s supervised learning, their efforts were a total waste of time, and Mitsuku was left unharmed and uncorrupted.

Once again in November, yet another group attempted to corrupt Mitsuku with the same outcome. The (very offensive) thread from the hack attempt is archived here. The thread is no longer active. No doubt they will try once more (and fail) to corrupt Mitsuku.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More

The insights you need without the noise