Researchers at Aalto University in Finland find simple keyword filters & complex AIs are equally vulnerable to workarounds, in 7 systems used for hate-speech detection.
New Scientist reports that some words were particularly effective at masking hateful content because of their strong positive connotations. …For example, a sentence that Perspective assigned a “toxicity” score of 0.99 – with 1 being peak obscenity, could be reduced to 0.15 simply by adding the word “love”.”
Image: Council of Europe