“We’ve built and are now sharing a data set designed specifically to help AI researchers develop new systems to identify multimodal hate speech. This content combines different modalities, such as text and images, making it difficult for machines to understand,” said the company in a blog post.
The company has created a data set to help build systems that better understand multimodal hate speech, which means that it analyses the text and image used in a meme together rather than independently.
The Hateful Memes data set has been released to the broader research community by launching a competition with a $100,000 prize pool, says Facebook.
As part of this Hateful Memes data set, there are over 10,000 newly created examples of multimodal content. “The memes were selected in such a way that strictly unimodal classifiers would struggle to classify them correctly. We also designed the data set specifically to overcome common challenges in AI research, such as the lack of examples to help machines learn to avoid false positives. It covers a wide variety of both the types of attacks and the groups and categories targeted,” said Facebook.
As of now, only researchers will be able to view or use the memes as the company wants to prevent potential misuse of the memes.
According to Facebook, “a direct or indirect attack on people based on characteristics, including ethnicity, race, nationality, immigration status, religion, caste, sex, gender identity, sexual orientation, and disability or disease. We define attack as violent or dehumanizing (comparing people to non-human things, e.g., animals) speech, statements of inferiority, and calls for exclusion or segregation. Mocking hate crime is also considered hate speech.”