GitHub - franciellevargas/HausaHate: HausaHate is a benchmark dataset for Hausa hate speech detection task. it was extracted from West African Facebook pages and comprises 2,000 comments annotated according to a binary class (offensive and non-offensive) and hate speech targets (race, gender and none).

HausaHate: A Benchmark Dataset for Hausa Hate Speech Detection

In African countries, the hate speech phenomenon is especially serious due to a historical problem regarding ethnic conflicts. Specifically, the Western region still lacks more research on hate speech focusing on its indigenous languages. Moreover, as most of the existing hate speech data resources are developed for the English language, the research and development of hate speech technologies for African indigenous languages are less developed. To fill this relevant gap, we introduce the first expert annotated corpus of Facebook comments for Hausa hate speech detection. The corpus titled HausaHate comprises 2,000 comments extracted from Western African Facebook pages and manually annotated by three Hausa native speakers, who are also NLP experts. Our corpus was annotated using two different layers. We first labeled each comment according to a binary classification: offensive versus non-offensive. Then, offensive comments were also labeled according to hate speech targets: race, gender and none. Lastly, a baseline model using fine-tuned LLM for Hausa hate speech detection is presented, highlighting the challenges of hate speech detection tasks for indigenous languages in Africa, as well as future advances. The following table describes in detail the HausaHate categories and documents:

Offensive	Non-Offensive	Total Comments
678	1,322	2,000

Race	Gender	Non-Target	Total
391	65	222	678

What the following is the list of collaborators and authors this project:

ETHICS STATEMENT

We followed the steps to anonymize the data described in Section 4.2.3 in the paper, as it is standard for papers with this kind of data. There is a public corpus of anonymized Facebook comments available. However, since the last change on the Meta platform terms of service was in 2020, we only decided to disclose the ids of the comments (only when requested) in order to allow the reproducibility, while also compelling researchers to pass through Meta’s authorization procedures to access the full data. Note that in order to keep the data anonymization, we publically provide the comments without their ids and links. Hence, please, contact [email protected] to request the corpus with ids and links of the comments.

CITING

Vargas, F., Guimarães, S., Muhammad, H. S., Alves, D., Ahmad, I. S., Abdulmumin, I., Mohamed, D., Pardo, T.A.S., Benevenuto, F. (2024). HausaHate: An Expert Annotated Corpus for Hausa Hate Speech Detection. Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH @ NAACL 2024). pp.52--58. Mexico City, Mexico. https://aclanthology.org/2024.woah-1.5.

BIBTEX

@inproceedings{vargas-etal-2024-hausahate, title = "{H}ausa{H}ate: An Expert Annotated Corpus for {H}ausa Hate Speech Detection", author = "Vargas, Francielle and Guimar{\~a}es, Samuel and Muhammad, Shamsuddeen Hassan and Alves, Diego and Ahmad, Ibrahim Said and Abdulmumin, Idris and Mohamed, Diallo and Pardo, Thiago and Benevenuto, Fabr{\'\i}cio", editor = {Chung, Yi-Ling and Talat, Zeerak and Nozza, Debora and Plaza-del-Arco, Flor Miriam and R{\"o}ttger, Paul and Mostafazadeh Davani, Aida and Calabrese, Agostina}, booktitle = "Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)", month = jun, year = "2024", address = "Mexico City, Mexico", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.woah-1.5", pages = "52--58", }

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
dataset		dataset
README.md		README.md
hausahate-statistics.tgn		hausahate-statistics.tgn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HausaHate: A Benchmark Dataset for Hausa Hate Speech Detection

ETHICS STATEMENT

CITING

BIBTEX

FUNDING

About

Releases 4

Packages

franciellevargas/HausaHate

Folders and files

Latest commit

History

Repository files navigation

HausaHate: A Benchmark Dataset for Hausa Hate Speech Detection

ETHICS STATEMENT

CITING

BIBTEX

FUNDING

About

Topics

Resources

Stars

Watchers

Forks

Releases 4

Packages 0

Packages