YouTube based religious hate speech and extremism detection dataset with machine learning baselines

Abstract

On YouTube, billions of videos are watched online and millions of short messages are posted each day. YouTube along with other social networking sites are used by individuals and extremist groups for spreading hatred among users. In this paper, we consider religion as the most targeted domain for spreading hate speech among people of different religions. We present a methodology for the detection of religion-based hate videos on YouTube. Messages posted on YouTube videos generally express the opinions of users’ related to that video. We provide a novel dataset for religious hate speech detection on Youtube comments. The proposed methodology applies data mining techniques on extracted comments from religious videos in order to filter religion-oriented messages and detect those videos which are used for spreading hate. The supervised learning algorithms: Support Vector Machine (SVM), Logistic Regression (LR), and k-Nearest Neighbor (k-NN) are used for baseline results.

Keywords

Hate speech detection religious extremism detection YouTube comment analysis hate speech dataset

1 Introduction

Social media platforms have helped natural language processing researchers to tackle a variety of problems. The data provided by the platforms can serve to evaluate user-related issues online i.e sexism [10], fake news [5] and emotions [3]. While the users discuss different topics, religious discourse is also seen widely. Social media gives the authority to users to express themselves freely, however, many of the users take advantage of the freedom of speech and use the medium for abuse and hate speech.

Hate speech in the text can be identified as a deliberate attack, an attempt motivated by aspects of the group identity or directed towards a specific group of people [14]. The religious form of hate speech is very important to overcome as these platforms have high reachability and popularity and can help extremist organizations in planning and mobilizing events for extremism and protests [30]. These websites are also used by extremist groups to spread their ideologies and agendas among their viewers [1]. Content related to hate speech can create more impact with time as it gets more viewership and attention [17]. Hence, early detection of religious forms of hate speech is of immense importance.

Social networking websites such as YouTube are sometimes left unchecked as they require an understanding of the context in the form of videos, text and audio.

Many research articles [13 , 41] suggest a trend of extremist content (videos, text, images) are posted in the form of groups. The study [13] gave evidence of many Jihadi videos from Iraq and their links to users on Youtube. Their majority of viewers (under 35 years) belonged to the United States and outside the region of the Middle East and North Africa (MENA). A pattern of this behaviour is replicated in other extremist organizations over the years. Hence, it is important to find novel techniques to identify religious hate speech content through one of the most consumed content websites existing today.

In this research, our focus is on the comments that are posted on religious videos on YouTube. We aim to detect those videos and comments on YouTube that are promoting hatred among individuals and groups. Furthermore, we investigate whether comments are used for initiating hatred among people from a different religion. First, we extracted videos and embedded comments within the videos. After that, data mining techniques [25] were used on the extracted comments. We collected more than 100 YouTube videos and extracted comments from these videos. We discovered that most of the comments were written in English. Extracted comments were then distributed in positive and negative classes based on attributes and features that are basic for hate speech. The paper also discusses the defined rules for the distribution of comments into two classes. Finally, we used machine learning classifiers such as Support Vector Machines, Logistic Regression, and k-Nearest Neighbor to provide baseline results.

The remaining paper is organized as follows: Section 2 discusses the detection of hate speech in websites, newspapers and social networking sites. Section 3 methods discusses the methodology and section 4, highlights the performance of our proposed method. Finally, Section 5 conclusion concludes the findings of the paper.

2 Related work

Natural language processing researchers have tackled many challenging problems in text with the aid of statistical and deep learning methods such as sentiment analysis [21], emotion detection [3], human behavior detection [6], fake news detection [5], question answering [9], depression and threat detection [4, 28].

We have seen researchers in history targeting a religious aspect of hate speech. A study [2] introduced a religious hate speech dataset for Twitter in the Arabic language. The purpose of the study was to distinguish religious targeting from profanity. This proved challenging considering the Arabic morphology and limited resources. The authors used pre-trained word embeddings with Recurrent Neural Network (RNN) architecture and Gated Recurrent Units (GRU) to provide baseline scores. Similarly, another study [38] gathered data from Yahoo and American Jews Congress (AJC) and defined hate speech in their work. The authors used paragraph-level classification and manually annotated the dataset in seven categories. The baseline results were presented using SVM with a linear kernel function. They used tenfold cross-validation to obtain an overall accuracy of 96%, precision 59%, recall 68% and f1-score of 63% . The community has also worked in developing a method [22] for detecting hate speech against the black community. Motivated by work done in [38], a Twitter-based dataset was developed. The severity of arguments was judged by floating a questionnaire to students of different races. The dataset contained a training dataset of 24,582 tweets which was pre-processed and classified using a Naive Bayes (NB) classifier. The classifier showed an accuracy of 86% .

In the early days of hate speech detection on text, a study [36] composed a generic hate speech identification system to identify hate phrases from websites. Initially, they used the NB classifier for selecting the most discriminative features, following a Multinomial Updatable Naive Bayes (MNB) classifier was used to produce labelled sentences. Finally, the Decision Table/Naive Bayes hybrid (DTNB) classifier was used to obtain an accuracy of 68% by using tenfold cross-validation. Studies for identifying hate speech in the respective social settings using Facebook [35], Twitter [32] and Youtube [12, 34] were frequently seen. In one of the influential studies, authors presented a framework [12] for the identification of extremist videos on YouTube. They extracted syntactic, lexical, and content-specific features from user-generated data and used various feature-based classification techniques to classify videos. Another group of researchers [1] designed a crawler-based approach for retrieving YouTube user profiles promoting extremism and hate. However, our approach in this work is different as the related studies are more generic whereas our study has a distinct emphasis on the religious form of hate speech.

Over the years, several machine learning methods are reported for tweet classification of hate speech or antagonistic focusing racism, ethnicity and religion. A study [8] that observed the murder of a drummer, Lee Rigby in Woolwich, UK used a context-free lexical model with the Stanford Lexical Parser to conduct typed dependencies in phase one and Bag of words (BoW) as a feature in phase two in tweets. They experimented with the machine learning classifiers such as Decision Trees, Random Forest, SVM etc and achieved 95% f1-measure using tenfold cross-validation. Another study categorized opinions first and then found their sentiment polarity by using machine learning classifiers (SVM, NB and k-nearest neighbors (k-NN)) on tweets [42]. The study showed SVM to have superior results than NB and k-NN. Researchers have developed multiple individual models [7] for the classification of cyberhate for characteristics such as race, disability and sexual orientation. They used text parsing for extracting typed dependencies. Moreover, they developed a model for cyberhate to improve classification when multiple characteristics are attacked such as race and sexual orientation.

In the recent studies, we see the trend of methodologies being changed for machine learning and feature-based methods to pre-trained transformer models [15 , 33], deep learning [11 , 40] and graph-based approaches. Researchers [37] have identified hate in memes successfully using transformer models such as LXMERT, VILLA, ERNIE-Vil, Visual BERT, ViLBERT, VLP, UNITER. Deep learning algorithms such as Bi-LSTM and CNN have been successful in many other NLP applications. An approach [40] for hate speech using CNN and Bi-LSTM gave us micro-F1 scores of 80% and 69% . A GRU and RNN based approach was applied on Greek-Gazzetta-Corpus [39], whereas, a graph embedding approach [11] was also applied using whole graph embeddings and achieved an F1-score of 89.16% through Graph2vec.

3 Methodology

The proposed methodology as shown in Fig. 1, is composed of two phases (training and testing phase). Four tasks are performed in the training phase: videos and comments collection, feature representation, storing the dataset and training classifier. Whereas, five tasks are performed in the testing phase: videos and comments collection, feature representation, storing the dataset, hypothesis prediction and evaluation. Each task is further elaborated on next.

Fig. 1

Block diagram of proposed methodology.

3.1 Data collection

YouTube API 1 ; last visited: 28-01-2021 is used for collection of data from YouTube. We have collected videos for five different religions (Islam, Hinduism, Jewish, Christianity and Sikhism). We collected top-ranked YouTube videos related to religions, for example, we searched “Islamic scholars” then the most liked scholar for Muslims is Dr. Zakir Naik. Similarly, we adopted the same method for other videos collection. From selected YouTube videos, we choose limited videos based on parameters listed in Table 1. In total, we have selected 400 videos for data gathering.

Table 1
Selection of videos based on selected parameter

Parameter Explanation

Topic Relevance To what degree the video deals with the given topic

Opinion Expression To what degree it contains subjective (opinionated) information

Video Quality Estimate the quality of video. It is assumed here that high quality videos contain more meaningful opinions

Parameter	Explanation
Topic Relevance	To what degree the video deals with the given topic
Opinion Expression	To what degree it contains subjective (opinionated) information
Video Quality	Estimate the quality of video. It is assumed here that high quality videos contain more meaningful opinions

After video selection, the next step was to extract comments from each video. These comments can be used to track what video is expressing and what user thinks about the video. Furthermore, YouTube comments are a valuable resource for tracking sentiments due to following reasons:

YouTube is large source of public videos where opinions can be expressed freely;

It is also a source of different type of religions for spreading their views and thoughts using public videos. It is not only an entertainment source, but it is a source of learning, thoughts, and changing behavior of any person thoughts;

YouTube videos and comments contain some characteristics and the appropriate phraseology of interest [12, 20];

YouTube often contain contents which are used to change human mind towards religions, because people are free to express their negative attitude towards any religions;

Generally, users join YouTube to participate. Therefore, it can be expected that users will reveal their personal data among others.

In total approximately 20,000 comments were collected from selected YouTube videos and the distribution of comments in each religion is shown in Table 2. The dataset is publicly available. 2

Table 2

YouTube videos statistics for each religion

Category wise	Statistics
distributions
Total YouTube videos	120
Total Comments	11K
Islam	4K
Christianity	3.2K
Hinduism	1.9K
Jewish	1.2K
Sikhism	0.7 K

The following are samples of positive comments from YouTube:

Dont stand up Anti-Islam but stand up Anti-Rape, Anti-Violence, Anti-Murder, Anti-Terrosim?

My mother is catholic and my father is muslim and raised in islam. all i can say to you is i love you! may allah blessed us! specially your family and relatives.

Muslims have made it clear how they feel about the the uk. If they choose sharia over the uk they need to leave. Plain and simple. After all most of them are here because of Islam in the first place. Go figure.

I enjoyed this documentary but left it not really knowing anymore about the Sikh religion than I did before watching it. It shows, at least to me, that even good religious people who start out with good intentions lose sight of those when they gain power and influence over others...so many deaths throughout history in the name of religion.

Subhanallah,, i love Allah, Jesus and Muhammad.

Har har maha Dev I proud to be a Hindu.

The following are samples of negative comments from YouTube:

The only thing Islam is equal to is a steaming pile of shit.

Would you be Anti Christian if you find out that your little boy got molested in church? what is the difference between child marriage and child molestation in church.

white people... terrorising the world for 2000+ years.

The quran says anyone who reads this book is a moron.

In the first place Hinduism is not religion. It is Hindu dharma. It tells how a human being should lead life by virtue of birth.

3.2 Pre-processing

Most of the comments collected contained raw data which needed to be changed to the proper format so that data mining techniques could be applied. This involved mainly tokenization (or featuring), feature weighting and data cleaning (removal of irrelevant features). We removed the emoji, punctuation marks, diacritics, added spaces after digits and stop words. The pre-processing task has key importance in the classification of textual data since proper pre-processing not only improves the effectiveness of the classifier but also makes the classifier time efficient [23]. In pre-processing we have also eliminated comments which are not opinionated.

There is mainly no word concerning include/exclude hashtag removal, spam filtering, stop words removal, emoji description, and orthographical variations due to language adoption on social media, including YouTube comments. Authors must describe the inclusion/exclusion of those features.

3.2.1 Attributes and feature selection

Feature selection is an important stage in classification and data mining. Methods for feature selection deals with the curse of dimensionality or high dimensionality of feature space. The measure that we used for feature scoring is the Information Gain (IG) [23]. Collected comments are divided into two classes: Positive and Negative.

Positive shows that user like the video, and he/she has not changed mind towards religions;

Negative means that mostly users are against the video and these are the videos which are playing roles in hate against each other’s.

The differentiation of comments into the positive and negative class is done on the basis of attributes such as Like Count, Total Reply Count, Rating of Comments. These are the attributes that have high Information Gain (IG). Other attributes which we have saved during data collection are listed in Table 3 and linguistic features that we used for comments classification is listed in Table 4. These linguistic features are selected on the basis of TF-IDF score per class base [19]. On the basis of selected attributes and linguistic features, we built a regular expression, which classifies comments into positive and negative.

Table 3
Attributes which are saved during data collection

Attributes Names

Author Display Name

Is Comment is Public

Like Count

Total Reply Count

Rating of Comments

Publish Date

Attributes Names
Author Display Name
Is Comment is Public
Like Count
Total Reply Count
Rating of Comments
Publish Date

Table 4

Linguistic features for distribution of comments into positive and negative classes

Positive	Negative
Mashallah, positive, love, good, wonderful, best, great, super, beautiful, peace, graceful, lovely, nice, meaningful, blame, subhanallah, Allah-oo-akbar, humanity, help others, influential, flawless, encourage, harmony, submission, love, stand up, anti(non)-violence.	shut the f**k, rape, murder, fun, destroyed, evil, killed, bloody, cancer, ridiculous religious, attacks, moron, terrorists, barking, idiot, wrong, shit, fanaticism, bastard, speaks from hell, fucker, terrorism, living animals, falsehood, raping, propaganda, violent, hate, extremist, sabotage, outrageous, silly, terror.

Other non-important attributes such as Author Google plus Profile link, Video id and Author Profile image were removed during pre-processing phase. Following regular expressions are defined to differentiate comments into respective classes:

Position of certain words is checked in comments, the position of a token within a comment e.g. (In the middle of comments or at the end of comments) have an important effect in the polarity of sentiments. This approach leads towards subjectivity of comments;

Negation has an important concern in opinion mining. For example, “I like Islam” and “I don’t like Islam” change words meaning only with a single word “not”. We have also classified comments into positive and negative additionally using this approach;

Topic oriented features were also used for classification of comments into positive and negative. Comments which contain prediction words, such as “Islam will role world”, “Go Hinduism again”, “Jewish will win” play an important role while distribution;

We have also identified comparative comments such as “Life in Islam is much better than Hinduism”. In this sentence we have different tokens such as “Life” serve as a feature, there is a relation “better” and two entities “Islam” and “Hinduism”.

Finally, we considered syntactic features, a sentence containing an adjective and “!” could indicate the existence of an opinion, and a noun followed by a positive adjective is a positive comment while a noun followed by a negative adjective is a negative comment. We used a set of modifier features (e.g., very, mostly, not) because the presence of these features indicates the presence of appraisal.

3.2.2 Hypothesis and evaluation

For evaluating the efficiency of this representation, SVM, LR and k-NN classifiers are used. The outcome of training the classifiers can be used for detecting hate speech in the comments. Standard tenfold cross-validation is used to evaluating the performance of SVM. In cross-validation, models are evaluated and it also examines the independent data set generalization evaluation over model statistical results. For evaluating the performance of classifiers, precision, recall and f-measure are used. Precision (P), recall (R) and F-measure for the positive class are defined in equations as follows:

$Precision (P) = \frac{NPP}{TPC},$ (1)

$Recall (R) = \frac{NPP}{NPC},$ (2)

$F sub > 1 / sub > - measure = \frac{2 \times P \times R}{P + R} .$ (3)

Where NPP represents number of positive predictions, TPC represents total number of positive comments and NPC represents number of positive comments. Here in the equations, Precision talks about how precise/accurate your model is out of those predicted positive whereas, how many of the actual positives our model capture through labeling it as positive.

4 Experiments and results

For experiments, we used WEKA [18]. WEKA is a popular machine learning tool that offers techniques for tokenization, stop words removal, attribution selection, feature weighting, regression, classification, clustering and modelling algorithms. The dataset is first transformed into feature representation to train and evaluate SVM, LR and k-NN classifiers. The performance of three classifiers for five religions is shown in Tables 5 , 6 and 7. The performance of SVM for five religions shows that the proposed method can classify tweets with a precision of 84.70% and recall of 89.60% . The highest obtained precision was for Islam and Christianity, 84.70% and 84.00% respectively. Islam and Christianity have the highest number of tweets as compared to other religions. That is why the precision is high for Islam and Christianity. In SVM performance, the recall was high than precision for Christianity. The same pattern is also found in the result of LR and k-NN (for Islam). For such cases, the recall was high as the classifier was able to correctly classify most of the positive tweets in the dataset. On the other hand, a low precision value indicates that SVM classified most of the noisy (irrelevant and neutral) tweets into positive classes.

Table 5
SVM results for comments classification into positive and negative classes

SVM Precision Recall F₁

Islam 84.70 84.50 84.30

Hinduism 81.20 81.10 81.10

Sikhism 73.60 73.10 73.00

Christianity 84.00 89.60 85.60

Judaism 72.00 72.10 72.10

SVM	Precision	Recall	F₁
Islam	84.70	84.50	84.30
Hinduism	81.20	81.10	81.10
Sikhism	73.60	73.10	73.00
Christianity	84.00	89.60	85.60
Judaism	72.00	72.10	72.10

Table 6

LR results for comments classification into positive and negative classes

LR	Precision	Recall	F₁
Islam	82.70	82.90	82.30
Hinduism	84.00	84.60	84.20
Sikhism	72.10	72.00	72.00
Christianity	80.20	80.10	80.10
Judaism	70.00	71.10	71.10

Table 7

k-NN results for comments classification into positive and negative classes

k-NN	Precision	Recall	F₁
Islam	81.70	81.90	81.30
Hinduism	79.00	79.60	79.60
Sikhism	70.00	70.00	70.00
Christianity	78.00	78.00	78.00
Judaism	69.00	69.00	69.00

Overall, SVM performed better for comments classification as compared to LR and k-NN due to the following reasons: (1) It has a robust internal over-fitting mechanism which enables it to perform better for high dimensional data [27], (2) It relies on a small number of examples, and (3) It can produce accurate results on a sound theoretical basis, even in cases when the input data are non-monotone and non-linearly separable.

For accuracy comparison of three classifiers, paired t-test (corrected) is performed in WEKA. Statistical paired t-test compares two datasets in which observations in one dataset can be paired with other dataset observations. The main objective of this test is to investigate the statistical evidence that the mean difference from paired observations from two datasets on a particular outcome is significantly different. More detail on paired t-test can be found in [19]. Classifiers comparison for 5 religions is shown in Table 8. LR is selected as baseline classifier and each classifier was run 10 times and obtained accuracy is the mean and the standard deviation in rackets of those 10 runs.

Table 8

Accuracy performance of three classifiers

Classes	SVM	LR	k-NN
Islam	83.70	82.90	80.30
Hinduism	84.00	82.60	80.60
Sikhism	82.00	84.00	80.10
Christianity	83.50	82.10	79.90
Judaism	79.80	81.10	82.20

For three religions (Islam, Hinduism and Christianity), SVM performed better as compared to LR and k-NN. On average, SVM achieved a classification accuracy of 82.6%, LR 82.5% accuracy and k-NN 80.06% accuracy. Obtained results indicate that some differences in accuracy of SVM, LR and k-NN exist. However, differences in classifiers accuracy are not statistically significant. This research will be instrumental in monitoring the YouTube videos by eliminating hate speech using public sentiments. Public sentiments are a great source of information to detect hate speech. Moreover, we have seen exponential growth in video traffic recently [29], therefore detection of hate speech on YouTube using public sentiments is necessary.

5 Conclusion

Social Networks are most widely used to express one’s feelings and thoughts. Individuals of various religions, races, and nationalities express their right to speak freely about problems on social networking sites. However, many users use this free will for malicious purposes. They make individuals against each other. Religion is the most sensitive issue discussed worldwide on social networks. To avoid hate speech, regulating authorities of these social networks are required to find automated mechanisms to identify and remove such videos from YouTube. In this paper, we have selected the most sensitive issue “religious hate” and put up a methodology that can filter religious opinions and then can highlight hate speech from those religious views. We have used supervised learning algorithms SVM, LR and k-NN to show the effectiveness of the methodology. These algorithms were used to classify video comments. For comments classification, SVM performed better than LR and k-NN.

Footnotes

Acknowledgments

The work was done with support from the Mexican Government through the grant A1-S-47854 of the CONACYT, Mexico and grants 20211784, 20211884, and 20211178 of the Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The authors utilize the computing resources brought to them by the CONACYT through the Plataforma de Aprendizaje Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE, Mexico. There was no additional external funding received for this study.

References

Agarwal

and Sureka

, A focused crawler for mining hate and extremism promoting videos on youtube. In Proceedings of the 25th ACM conference on Hypertext and social media (2014), pp. 294–296.

Albadi

, Kurdi

and Mishra

, Are they our brothers? analysis and detection of religious hate speech in the arabic twittersphere. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (2018), pp. 69–76. IEEE.

Ameer

, Ashraf

, Sidorov

and GÃşmez Adorno

, Multi-label emotion classification using content-based features inTwitter, Computación y Sistemas24(3) (2021), 02.

Ashraf

, Mustafa

, Sidorov

and Gelbukh

, Individual vs. group violent threats classification in online discussions. In Companion Proceedings of the Web Conference 2020, WWW ’ 20, (2020), pp. 629–633, New York, NY, USA. Association for Computing Machinery. ISBN 9781450370240.

Ashraf

, Butt

, Sidorov

and Gelbukh

, CIC at CheckThat! 2021: Fake news detection using machine learning and data augmentation. In CLEF 2021 – Conference and Labs of the Evaluation Forum, Bucharest, Romania, 2021.

Bashir

, Ashraf

, Yaqoob

, Rafiq

and Ul Mustafa

, Human ggressiveness and reactions towards uncertain decisions, International Journal of ADVANCED AND APPLIED SCIENCES6(7) (2019), 112–116.

Burnap

and Williams

M.L.

, Us and them: identifying cyber hate ontwitter across multiple protected characteristics, EPJ DataScience5 (2016), 1–15.

Burnap

and Leighton Williams

, Hate speech, machine classification and statistical modelling of information flows on twitter: Interpretation and communication for policy decision making. 2014.

Butt

, Ashraf

, Fahim Siddiqui

M.H.

, Sidorov

and Gelbukh

, Transformerbased extractive social media question answering onTweetQA, Computación y Sistemas25(1), 2021.

10.

Butt

, Ashraf

, Sidorov

and Gelbukh

, Sexism identification using BERT and data augmentation - EXIST2021. In International Conference of the Spanish Society for Natural Language Processing SEPLN 2021, IberLEF 2021, Spain, 2021.

11.

Cecillon

, Labatut

, Dufour

and Linares

, Graph embeddingsfor abusive language detection, SN Computer Science2(1) (2021), 1–15.

12.

Chen

, Denning

, Roberts

, Larson

C.A.

, Yu

and Huang

C.N.

, Chapter 1-revealing the hidden world of the dark web: Social media forums and videos, Intelligent Systems for Security Informatics (2013), pp. 1–28.

13.

Conway

and McInerney

, Jihadi video and autoradicalisation: Evidence from an exploratory youtube study. In European Conference on Intelligence and Security Informatics, (2008), pp. 108–118. Springer.

14.

de Gibert

, Perez

, García-Pablos

and Cuadros

, Hate speech dataset from a white supremacy forum. arXiv preprint arXiv:1809.04444, 2018.

15.

Devlin

, Chang

M.-W.

, Lee

and Toutanova

, BERT: Pre-trainingof deep bidirectional transformers for language understanding. In(Long and Short Papers), Proceedings of the 2019 Conference of the North AmericanChapter of the Association for Computational Linguistics: HumanLanguage Technologies1 (2019), pp. 4171–4186.

16.

Djuric

, Zhou

, Morris

, Grbovic

, Radosavljevic

and Bhamidipati

, Hate speech detection with comment embeddings. In Proceedings of the 24th International Conference on World Wide Web, WWW ĂŹ15 Companion, (2015), pp. 29–30, New York, NY, USA. Association for Computing Machinery. ISBN 9781450334730.

17.

Gagliardone

, Gal

, Alves

and Martinez

, Countering online hate speech. Unesco Publishing, 2015.

18.

Hall

, Frank

, Holmes

, Pfahringer

, Reutemann

and Witten

I.H.

, he weka data mining software: an update, ACM SIGKDDExplorations Newsletter11(1) (2009), 10–18.

19.

Hsu

and Lachenbruch

P.A.

, Paired t test.Wiley StatsRef: statistics reference online, 2014.

20.

Kandias

, Stavrou

, Bozovic

, Mitrou

and Gritzalis

, Can we trust this user? predicting insider’s attitude via youtube usage profiling. In 2013 IEEE 10th International Conference on Ubiquitous Intelligence and Computing and 2013 IEEE 10th International Conference on Autonomic and Trusted Computing (2013), pp. 347–354. IEEE.

21.

Khan

, Amjad

, Ashraf

, Chang

H.-T.

and Gelbukh

, Urdu sentiment analysis with deep learning methods, IEEE Access (2021), pp. 1–1. doi: 10.1109/ACCESS.2021.3093078.

22.

Kwok

and Wang

, Locate the hate: Detecting tweets against blacks. In Twenty-seventh AAAI conference on artificial intelligence, 2013.

23.

Ullah Lali

, Ul Mustafa

, Saleem

, Nawaz

, Zia

and Shahzad

, Finding healthcareissues with search engine queries andsocial network data, International Journal on Semantic Web andInformation Systems (IJSWIS)13(1) (2017), 48–62.

24.

Lan

, Chen

, Goodman

, Gimpel

, Sharma

and Soricut

, ALBERT: A lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.

25.

Liu

and Zhang

, A survey of opinion mining and sentiment analysis. In Mining text data (2012), pp. 415–463. Springer.

26.

McNamee

L.G.

, Peterson

B.L.

and Peña

, A call to educate,participate, invoke and indict: Understanding the communication ofonline hate groups, Communication Monographs77(2) (2010), 257–280.

27.

Ul Mustafa

, Saqib Nawaz

, Ikram Ullah Lali

, Zia

and Mehmood

, Predicting the cricket match outcome using crowd opinionson social networks: A comparative study of machine learning methods, Malaysian Journal of Computer Science30(1) (2017), 63–76.

28.

Ul Mustafa

, Ashraf

, Shabbir Ahmed

, Ferzund

, Shahzadan

and Gelbukh

, A multiclass depression detection in social media based on sentiment analysis. In Shahram Latifi, editor, 17th International Conference on Information Technology– NewGenerations (ITNG 2020) (2020), pp. 659–662, Cham, 2020. Springer International Publishing. ISBN 978-3-030-43020-7.

29.

Ul Mustafa

, Moura

and Esteve Rothenberg

, Machine learning approach to estimate video qoe of encrypted dash traffic in 5g networks. In 2021 IEEE Statistical Signal ProcessingWorkshop (SSP) (2021), pp. 586–589. IEEE.

30.

Muthiah

, Huang

, Arredondo

, Mares

, Getoor

, Katz

and Ramakrishnan

, Planned protest modeling in news and social media. In Twenty- Seventh IAAI Conference, 2015.

31.

Radford

, Wu

, Child

, Luan

, Amodei

and Sutskever

, Language models are unsupervised multitask learners, OpenAIblog1(8) (2019), 9.

32.

Silva

, Mondal

, Correa

, Benevenuto

and Weber

, Analyzing the targets of hate in online social media. In Tenth international AAAI conference on web and social media, 2016.

33.

Srivastava

, Hasan

, Yagnik

, Walambe

and Kotecha

, Role of artificial intelligence in detection of hateful speech for Hinglish data on social media. arXiv preprint arXiv:2105.04913, 2021.

34.

Sureka

, Kumaraguru

, Goyal

and Chhabra

, Mining youtube to discover extremist videos, users and hidden communities. In Asia information retrieval symposium (2010), pp. 13–24. Springer.

35.

Ting

I-H.

, Chi

H.-M.

, Wu

J.-S.

and Wang

S.- L.

, An approach for hate groups detection in facebook. In The 3rd International Workshop on Intelligent Data Analysis and Management (2013), pp. 101–106. Springer.

36.

Turney

P.D.

, Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. arXiv preprint cs/0212032, 2002.

37.

Vashistha

and Zubiaga

, Online multilingual hate speechdetection: Experimenting with Hindi and English social media, Information12(1) (2021), 5.

38.

Warner

and Hirschberg

, Detecting hate speech on the world wide web. In Proceedings of the second workshop on language in social media (2012), pp. 19–26.

39.

Wulczyn

, Thain

and Dixon

, Ex machina: Personal attacks seen at scale. In Proceedings of the 26th International Conference onWorldWideWeb, WWW ĂŹ17, pp. 1391–1399, Republic and Canton of Geneva, CHE, 2017.

40.

Zampieri

, Malmasi

, Nakov

, Rosenthal

, Farra

and Kumar

, Predicting the type and target of offensive posts in social media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies1 (Long and Short Papers), pp. 1415–1420, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.

41.

Zhou

, Reid

, Qin

, Chen

and Lai

, Us domestic extremistgroups on the web: link and content analysis, IEEE IntelligentSystems20(5) (2005), 44–51.

42.

Zia

, Shehbaz Akram

, Saqib Nawaz

, Shahzad

, Abdullatif

A.M.

, Mustafa

R.U.

and Ikramullah Lali

, Identification of hatred speeches on twitter. In Proceedings of 52nd The IRES International Conference (2016), pp. 27–32.

YouTube based religious hate speech and extremism detection dataset with machine learning baselines

Abstract

Keywords

1 Introduction

2 Related work

3 Methodology

3.2.1 Attributes and feature selection

Table 3 Attributes which are saved during data collection Attributes Names Author Display Name Is Comment is Public Like Count Total Reply Count Rating of Comments Publish Date

Table 5 SVM results for comments classification into positive and negative classes SVM Precision Recall F1 Islam 84.70 84.50 84.30 Hinduism 81.20 81.10 81.10 Sikhism 73.60 73.10 73.00 Christianity 84.00 89.60 85.60 Judaism 72.00 72.10 72.10

Footnotes

Acknowledgments

References

Table 3
Attributes which are saved during data collection

Attributes Names

Author Display Name

Is Comment is Public

Like Count

Total Reply Count

Rating of Comments

Publish Date

Table 5
SVM results for comments classification into positive and negative classes

SVM Precision Recall F₁

Islam 84.70 84.50 84.30

Hinduism 81.20 81.10 81.10

Sikhism 73.60 73.10 73.00

Christianity 84.00 89.60 85.60

Judaism 72.00 72.10 72.10