Abstract
The internet and social networks produce an increasing amount of data. There is a serious necessity for a recommendation system because exploring through the huge collection is time-consuming and difficult. In this study, a multi-modal classifier is introduced which makes use of the output from dual deep neural networks: GRU for text analysis and Faster R-CNN for image analysis. These two networks reduce overall complexity with minimal computational time while retaining accuracy. More precisely, the GRU network is utilized to process movie reviews and the Faster RCNN is used to recognize each frames of the movie trailers. Gated Recurrent Unit (GRU) is a well-known variety of RNN that computes sequential data across recurrent structures. Faster RCNN is an enhanced version of Fast RCNN, it combines with the rectangular region proposals and with the features is extract by the ResNet-101. Initially, the trailer of the movie is manually splitted into frames and these frames are pre-processed using fuzzy elliptical filter for image analysis and the movie reviews are also tokenized for text analysis. The pre-processed text is taken as an input for GRU to classify offensive and non-offensive movies and the pre-processed images are taken as an input for Faster R-CNN to classify violence and non- violence movies based on the extracted features from the movie trailer. Afterwards, the four classified outputs are given as input for fuzzy decision-making unit for recommending best movies based on the Mamdani fuzzy inference system with gauss membership functions. The performance of the dual deep neural networks was evaluated using the specific parameters like specificity, precision, recall, accuracy and F1 score measures. The proposed GRU yields accuracy range of 97.73% for reviews and FRCNN yields the accuracy range of 98.42% for movie trailer.
Keywords
Introduction
Individuals have been left perplexed by the proliferation of online information in recent years [1]. A recommender system is a technology that filters information so that it can direct users to goods or services they are interested in based on their preferences [2]. As one of the many information sharing systems, the recommender system is vital to enhancing businesses and assisting users [4]. The internet and social networks create an increasing amount of information. There is a serious need for a recommendation system because exploring through the huge collection is time-consuming and difficult [7]. Recently, several movie recommendation systems have been developed based on this context. They provide specific movies based on past user data instead of all types of genre movies [8]. Currently, the traditional recommendation system can make a suggestion based on specific characteristics, such as the popularity of a movie, user behaviors, or movie content [12].
The analysis of video automatically could be used to resolve a number of issues that are currently too expensive or too time-consuming for people to resolve themselves [15]. There are several effective ML techniques that can classify photos with around 94% accuracy according to 1,000 different labels, however video-based applications have proven to be more challenging [16]. An ML algorithm of this type is not capable of handling such a task effectively and efficiently due to its high level of complexity. The concept of automatic video analysis is wide and offers a variety of study opportunities, including context analysis, element detection, and action recognition [17, 18]. Recent research uses Deep Convolutional Neural Networks (CNNs) for video processing, yielding promising preliminary findings that may pave the way for further investigation of a wide range of applications [20, 21]. The most advanced technique for supervised image classification is convolutional neural networks (CNNs), which use ideas from image analysis to provide some level of size, location, and distortion partially invariant. These theories are based on the recognition of main features, such as aligned edges, end-points, and corners, which may be found anywhere in the image [26, 27]. After blending these visual features, the successive layers detect novel, higher-order characteristics, mimicking human learning. As a result, this study introduces a dual deep neural network with Gated Recurrent Unit (GRU) and Faster R-CNN for classifying offensive and non-offensive movies based on reviews and trailers [32]. The main contribution of our proposed model are as follows, The main purpose of this study is to present a best movie recommendation system based on the novel fuzzy decision making based dual deep neural networks. Initially, the movie reviews are gathered from the Internet Movie Database (IMDB) and the movie trailers are gathered from the MovieLens, in which this video sequences are splitted into frames. In pre-processing stage, the noise artifacts of each frames are eliminated by fuzzy elliptical filter and tokenization is applied for text analysis. The pre-processed text is taken as an input for GRU for classifying offensive and non-offensive movies based on the reviews and pre-processed images are taken as an input for Faster R-CNN for classifying violence and non-violence movies based on the extracted features from the movie trailer. Afterwards, the four classified outputs are given as input for fuzzy decision-making unit for recommending best movies based on the Mamdani fuzzy inference system with gauss membership functions. The proposed deep neural networks are evaluated based on its sensitivity, specificity, accuracy and F1 score.
The remainder of this study was organized with five sections as follows. Section-2 outlines with the literature survey, Section-3 includes the proposed Dual deep neural network, Section-4 comprises with results and discussion and finally Section-5 encloses with conclusion and future enhancement.
Literature survey
In recent days several deep learning and machine learning techniques were introduced by the researchers mainly to progress the accuracy for recommending movies. Some of those techniques are studied briefly in this section.
In 2022 Chen.X., et al., [3] devised an ensemble framework based on the analytic network process (ANP) and fuzzy comprehensive evaluation model for calculating the impact of the Arctic oil spill. The fuzzy comprehensive evaluation model was used to quantify the various factors and determine the level of risk associated with an oil spill. According to experiment results, this framework accurately quantified the influence of an oil spilling event in Arctic area, and thus the level of potential risk was accurately determined.
In 2021 Widiyaningtyas, et al., [4] had devised a User Profile Correlation-based Similarity (UPCSim) that analyze the genre data and the user profile data. Each user data was utilized to calculate the weights of similarity between the user rating values and the user behavioral values. To determine the weights of both similarities, correlation coefficients are used between user profile data and user ratings and behaviors. A reduction in MAE and RMSE of 1.4% makes this approach more accurate than the prior algorithm.
In 2021 Sujithra., et al., [5] had developed a hybrid method for recommending movies to the target user that takes into account both recent purchases and demographic information. The demographic characteristics help with the cold start issue. Based on the experimental results that the proposed approach was successful when more transaction data was taken into account along with higher ratings.
In 2021 Behera, et al., [6] had introduced a hybrid model for a constrained boltzmann machine and content K-nearest neighbors to propose movies. In order to capture both content-based filtering and collaborative filtering based on higher-order models, researchers developed a weighted hybrid approach based on content k-NN and RBM. The suggested method combines the impacts of both collaborative and content-based filtering with MovieLens benchmark datasets to suggest best films to the active users.
In 2021 Lang, et al., [9] had proposed the field-aware factorization machine (FFM) to address the issue of movie rating prediction and assist users in choosing the right films for academic purposes. Clustering technique is also implemented in FFM for adding additional fields, which will further increase the availability of the technique. In order to address the sparseness of the input data, the clusters are used as a special field in the FFM approach. The experimental findings show that the suggested work has minimize RMSE value.
In 2021 Tahmasebi, et al., [10] had developed a social movie recommendation system using deep autoencoder network with information from Twitter. This hybrid approach, which may be easily expanded to other aspects of product-based recommendation system relies on deep autoencoder networks, content-based and collaborative filtering. In order to attain improved accuracy, MovieTweetings and the Open Movie Database (OMDB) were employed that minimizes sparsity and improves the amount of data needed.
In 2021 Choudhury, et al., [11] had devised trust matrix measure, the combination of weighted trust propagation with user similarity. A non-cold user went through many algorithms with a trust filter, and a cold user produced the best possible score using their own preferences. To suggest the appropriate movie to the user, four different recommendation structures, including the BPNN model, SVD model, DNN, and DNN with Trust, were evaluated. In terms of accuracy, the DNN with trust model came out on top with 83% and a 0.74 MSE value.
In 2021 Wan.Z., et al., [15] had developed a fuzzy DEMATEL approach to classify 10 factors that have influenced Northern Sea Route development into four categories. An uncertain and complex decision-making environment was used to determine the cause-effect chain using fuzzy decision-making and the DEMATEL technique. A method that examines the relationship between political factors, sea ice conditions, and climate elements, as well as their policy implications that can influence the decisions of shipping corporations, was developed with this method.
In 2021 Yang., et al., [16] proposed an approach to categorizing spare parts via deep CNN based on transfer learning theory. Initially, the network was built using the pre-trained VGG-19. The hierarchical inference model of spare components was then constructed to visualize data and convert images. Based on the five-fold cross-validation, the model performed exceptionally well with an average accuracy of 96.36%.
In 2020 Bathla, G., et al., [17] had devised AutoTrustRec model which integrates deep learning architecture and increase the recommendation accurateness for large scale heterogeneous complicated data. A common layer was used in the autoencoder, this strategy feeds ratings, direct trust values, and indirect trust values into neural networks. The three public datasets were used to prove that MAE and RMSE are enhanced with AutoTrustRec model.
In 2020 Gupta, et al., [19] had developed Movie Recommendation System with collaborative filtering and K-NN models were employed primarily focus on improving the accuracy of output in contrast to content-based filtering. This method eliminates the shortcomings of content-based filtering while relying on cosine similarity and k-nearest neighbor in order to employ a collaborative filtering method.
In 2020 Singh, et al., [22] had proposed a movie recommendation approach that would provide all-encompassing suggestions based on movie popularity and/or genre. It involves several deep learning techniques to implement Content-Based Recommendation Systems. Moreover, this model describes the challenges faced by content-based recommendation systems and the solutions we have come up with in order to overcome those challenges.
In 2019 Aljunid and Manjaiah [23] had introduced a movie recommendation framework with ALS and Apache Spark. It focuses on the selection of ALS algorithmic variables that affect the effectiveness of a structure with a robust RS. The algorithmic elements of ALS that can affect the performance of a movie recommendation engine. In order to evaluate a model, different criteria were used, including execution time, RMSE of rating prediction, and rank at which the model was trained.
In 2014 Baziuke., et al., [24] had developed a fuzzy logic approach for merging rank and quantitative data with respect to initial correctness using the Marine Life Information Network. This method combines expert knowledge and fuzzy logic techniques, and it gives marine environmental decision support systems the option of utilizing numerical scales. It can also be used to assess sensitivity explicitly. This method can also be employed for other environmental indicators and assessments.
In 2012 Filippo., et al., [25] designed an Artificial Neural Network (ANN) to improve sea level predictions. The ANN was trained using hourly time series of atmospheric pressure, wind, and harmonically generated tides during 1982 as input data, and hourly time series of measured tides as output data. Among the Cananéia and Ilha Fiscal time series, the coefficients of correlation between projected and measured water levels were 0.88 and 0.98, respectively.
From the literature review, various deep and machine learning techniques were focused on reviews or trailers for recommending the best movies. Moreover, the existing techniques take more time since they require exploring a large collection of movies, which is the most challenging task. The proposed dual deep neural network addresses these issues by using both reviews and trailers as input without comprising on accuracy. The pre-processed text is taken as an input for GRU for classifying offensive and non-offensive movies based on the reviews and pre-processed images are taken as an input for Faster R-CNN for classifying violence and non-violence movies based on the extracted features from the movie trailer. By utilizing these two networks the overall complexity is reduced with minimal computational time while persevering the detection accuracy. Afterwards, the Mamdani fuzzy inference system with gauss membership functions is used for recommending best movies.
Proposed movie recommendation system
In this section, we proposed the dual deep neural networks with GRU and Faster R-CNN for recommending best movies based on fuzzy decision-making system. The overall workflow of the proposed methodology is depicted in Fig. 1.

The overall workflow of the proposed movie recommendation system.
The movie reviews and movie trailers are gathered from the openly available datasets. Initially, the video sequences are splitted into frames and the noise artifacts of each frames are eliminated by fuzzy elliptical filter and tokenization is applied for text analysis. The pre-processed text is taken as an input for GRU for classifying offensive and non-offensive movies and pre-processed images are taken as an input for Faster R-CNN for classifying violence and non-violence movies based on the extracted features from the movie trailers. Mamdani fuzzy inference system with gauss membership functions is used for recommending best movies.
Dataset for movie reviews
IMDB dataset having 50000 movie reviews for text analytics. The dataset contains much more data than previous benchmark data sets for binary sentiment classification. The reviews are divided into 25000 each for training and testing. Each set contains 50 percent favorable reviews and 50 percent negative reviews. So, predict the number of negative and positive reviews used for classification algorithms.
Dataset for movie trailers
MovieLens is a social networking site and web-based recommendation system that recommends movies to users based on their movie ratings and movie trailers. There have been a number of studies based on the MovieLens data sets. It is estimated that there are approximately 11 million ratings for approximately 8500 movie trailers. In total, 270,000 users provided 26,000,000 ratings and 750,000 tag applications for 45,000 movies. Furthermore, it includes information for 1,100 tags with 12 million relevance ratings.
Data pre-processing
Pre-processing of movie reviews
A preprocessing method is crucial to enhancing the effectiveness of the models. To transform irregular texts into proper format and improve the quality of the text dataset, the pre-processing stage is the first step in text mining. In this study, the movie reviews are pre-processed by tokenization. Tokenization involves separating a stream of texts into tokens such as phrases, words, symbols, or other useful items. The goal of tokenization is to examine the words in a sentence or phrases.
Pre-processing of movie trailers
Afterwards, the movie trailers are splitted into frames and these frames (images) are pre-processed using fuzzy elliptical filter. Automatic movie trailer classification is one of the several tasks that may be accomplished through video analysis. In order to determine when a shot in a video occurs, each frame must be compared to its neighboring frame. In low-similarity frames, a scene border is detected. The HSV (Hue, Saturation, Value) colour system is used to calculate frame similarity through histogram intersection. Each histogram has 16 bins that include eight bins for hue, four bins for saturation, and four bins for value. Equation (1) describes the calculation of the histogram intersection.
Such a method is effective in identifying instantaneous scene changes. Consequently, when gentle transitions take place, the approach fails. To overcome this issue Mr is iteratively smoothen with the fuzzy elliptical filter. The fuzzy elliptical filter measures the degree of membership for processed data using a logic variable with multi-dimensional functional membership. By determining the ripple factor and linearity of the input data, it determines the linearity and angle of the network. The elliptical filter has a separate adjustment for equalizing ripple behaviour with a faster transition rate between the passband and stopband. The nth order of magnitude frequency response is expressed as;
Where, Cr (ω) denotes the Chebyshev rational function of ω evaluated from definite ripple characteristics. In-order to evaluate the denoising efficiency, mean square error (MSE) is used as the measure. Filtering frequencies with significant attenuation with an elliptical filter with sharp cut-off frequency is ideal since errors on both sides of the cut-off frequency are minimised. As a result of fuzzy systems, the optimal combination of processed signals is selected to reduce the MSE of the reconstructed images.
The proposed Dual deep neural networks are combination of GRU shown in (Fig. 2) and Faster R-CNN (Fig. 3) for classifying movies based on pre-processed movie reviews and movie trailers. Based on the output of the both neural networks the non-offensive movies are recommended for the users.

Structure of GRU cell.

Structure of Faster R-CNN.
The GRU is a well-known recurrent neural network (RNN) that can compute sequential data using its recurrent architecture. The GRU architecture consists of five layers: input, embedding, recurrent, output, and classification as softmax. This architecture uses word embeddings as inputs and then sends them through a GRU learn layer to gather semantic information before passing them on to a classification layer. An LSTM unit with a gating system that regulates data flow, but without dedicated storage. In order to handle the flow of data from prior activation when calculating a new candidate state, GRU calculates two gates known as update and reset gates. The update gate ties past activation and new candidate activation into fresh activation. Every hidden state at time-step t is computed using these equations
In GRU networks for text categorization, softmax regression is widely used to distinguish between offensive and non-offensive words. A probabilistic explanation is presented with the results of the computation. In this model, a fixed-dimensional input is provided by the lower layer, which is then refined by the classification layer, and then a softmax activation function is used to calculate the probability that two categories will predict each other and it is derived as,
In the faster R-CNN, a region proposal network (RPN) was used, and the input is the feature map from the first CNN, which creates a bounding box and the possibility that it contains an object. From the RPN we can deduce the most likely bounding boxes. Despite their lack of precision, these bounding boxes can still be analyzed by pooling regions of interest (RoI). The most accurate bounding box coordinates can be determined after RoI pooling for each region.
A small network is slides on top of the convolutional feature map to create the proposal and it generates a series of rectangular object proposals with the objectness score. The concept of anchor box is introduced to avoid pyramids of filters or pyramids of images. After that, each region is mapped to each reference anchor box, allowing for the object detection at different scales. An anchor is focused at the sliding window and it is linked with a scales and aspect ratios. At every sliding-window spot, in chorus forecast several region proposals, where k is denoted as the maximum number of possible proposals for each location region. So, the regression layer has 4k outcomes represents the coordinates of k anchor boxes, and the classification layer produces 2k scores outcomes that assess the probability of an object or not being an object for each proposition and the parameterization of k proposals is based on k reference boxes. In addition, for choosing an anchor, the anchors are categorized into two types that are foreground class and background class.
As a classification index, an anchor box is choosing with an intersection-over-union (IoU) relationship between ground-truth box and anchor box. The IoU overlap ratio is expressed as,
Where w refers to predicted class value, x denotes the true class score, y is the true box coordinates and z is the predicted coordinates. The loss function of classification is defined as,
The loss function of regression is represented as,
Furthermore, this model accepts each feature map as input and reducing the computational cost significantly and enhances the learning capability of the model.
Based on these classification results, Mamdani fuzzy is built with a minimum number of rules for the recommending the best movies for the users. The fuzzy logic controller enables the translation of a linguistic control approach based on expert data into an autonomous control strategy. MFIS consists of the following five blocks: Database, Rule Base, Decision–making Unit, Fuzzification interface unit and Defuzzification interface unit as shown in Fig. 4.

Structure of Decision-making fuzzy inference system.
The fuzzification module converts the crisp value of the four classification outputs (violence, non-violence, non-offensive and offensive) into fuzzy quantities. The Decision-making unit includes the fuzzy rule base which comprises of three rules that could recommend the best movies by evaluating the inputs such as violence, non-violence, non-offensive and offensive. Mamdani Inference system with the gauss membership functions (MF) was used in the fuzzy rule base system that ranges from 0 to 1 for inputs. Four rules were implemented in this proposed system and it has shown in Table 1. These rules are crucial in the classification of various groups of subjects.
Rules for fuzzy inference system
The input parameters of fuzzy logic
Fuzzy rules
* If review is offensive and trailer is violence then movie is not recommended
* If review is offensive and trailer is non-violence then movie is recommended
* If review is non-offensive and trailer is violence then movie is not recommended
* If review is non-offensive and trailer is non-violence then movie is recommended
According to a set of rules, fuzzy inference interprets the values in an input and assigns values to the output. A fuzzy interference rule is applied to the output of the Dual deep neural networks. The created fuzzy system incorporates two membership functions inputs from the networks to represent an output for recommending best movies. There are two input elements, each with four linguistic terms, resulting in a total of 2∧2 = 4 fuzzy rules.
Where R refers to the rule databased and p(i) denotes the chance of i-th rule. The centre of gravity defuzzification method was used to transfer the Fuzzy Inference output to the crisp output,
Whereas μA(z) denotes the Membership function of the gathered output. By using Equation (14) the crisp values of the fuzzy inference system outputs are calculated for recommending best movies to the users based on the classification results.
In this section, the proposed Dual deep neural networks are evaluated with different measures namely accuracy, specificity, precision, recall and F1 score based on the collected datasets. The benchmark includes the total accuracy rate, which expressly specified and evaluated, and the performance of the proposed Dual deep neural networks. Furthermore, the evaluation of the proposed Dual deep neural networks with classic ML and DL models is also provided.
Performance analysis
The efficiency of the proposed Dual deep neural networks can be measured using evaluation metrics accuracy, specificity, precision, recall and F1 score.
Performance evaluation of the proposed network

Performance analysis of the Proposed Dual deep neural networks.
The performance result acquired by the proposed Dual deep neural networks for two different datasets i.e., IMDB for reviews and MovieLens for movie trailers is exposed in Table 3.
Among these two datasets, the proposed FRCNN model has acquired high accuracy in MovieLens dataset as shown in Fig. 5. The performance is based on the specificity, precision, recall, f1 score and accuracy, and FRCNN obtains the overall accuracy of 98.42%. Likewise, the overall accuracy of the proposed GRU is 97.73% based on movie reviews from IMDB dataset.
The ROC generated for two datasets is depicted in Figs. 6 7. The MovieLens dataset has a higher AUC. The proposed FRCNN model achieved higher AUC of 0.98 for the MovieLens dataset that can be measured via TPR and FPR parameters.

ROC of the proposed GRU based on the IMDB dataset.

ROC of the proposed FRCNN based on the MovieLens dataset.
Figure 8 displays the accuracy curve with epochs on x-axis and accuracy on y-axis, the accuracy of the method improves when the epochs are increased.

Testing and Training accuracy of GRU.
Figure 9 shows the epoch versus loss curve, which illustrates when the epochs are increased, the loss of the model diminishes. The GRU model achieves high accuracy range for IMDB dataset.

Training and Testing loss curve of GRU.
Figure 10 displays the accuracy curve with epochs and accuracy on both the axis, the accuracy of the method improves when the epochs are increased. Figure 11 shows the epoch versus loss curve, which illustrates when the epochs are increased, the loss of the model diminishes. So, the predicted outcomes of the proposed FRCNN framework are highly reliable for MovieLens dataset. The study started by determining the number of training epochs necessary to attain the maximum level of testing accuracy. According to the results, the classification accuracy of GRU and FRCNN was achieved at 5 and 50 training epochs, with testing accuracy of 97.73% and 98.42% respectively.

Training and Testing accuracy curve of FRCNN.

Training and Testing loss curve of FRCNN.
The efficiency of each ML classifiers was estimated to validate that the results of the proposed Dual deep neural networks achieves high accuracy. The comparative evaluation was made between the proposed GRU and FRCNN with four machine learning classifiers such as SVM, MLP, ELM and RF. The performance evaluation was done using various metrics like specificity, precision, recall, f1 score and accuracy of each ML techniques and the accuracy attained by proposed classifier is 99.07%, which was higher than the traditional ML methods.
From Table 4, the comparison was made between four ML techniques based on the performance metrics for recommending best movies. The proposed GRU achieves the accuracy range of 97.73% for reviews and the proposed FRCNN achieves the accuracy range of 98.42% for movie trailers. However, the traditional techniques not performed well compared to the proposed classifier. GRU model progresses the overall accuracy range by 9.44%, 8.62%, 0.13% and 2.38%; the FRCNN progresses the overall accuracy range by 10.07%, 9.26%, 0.83% and 3.06% better than RF, SVM, MLP and ELM respectively.
Comparison between ML classifiers
Comparison between ML classifiers
From Table 5, the comparison was made between different DL networks based on the performance metrics by obtaining the appropriate percentage of accuracy in classification. However, the traditional networks not performed well compared to the proposed networks. GRU improves the overall accuracy range by 6.68%, 8.62%, 5.35% and 4.22%; the FRCNN the overall accuracy range by 7.33%, 9.06%, 6.01% and 4.87% better than AlexNet, LeNet, ResNet and LSTM respectively.
Comparison between Deep neural networks
The proposed GRU and FRCNN outperforms in this spot and obtains higher accuracy range of 97.73% and 98.42%. This network was trained with two different datasets for different classification. Figure 12 result shows that the proposed model does not require a model with complex architecture to recommend best movies with high accuracy than the traditional networks. The proposed GRU yields accuracy range of 97.73% for reviews and FRCNN yields the accuracy range of 98.42% for movie trailer.

Performance comparison between the traditional DL networks.
Table 6 illustrates the accuracy evaluation of proposed model with different datasets such as IMDB, OMDB, MovieLens, MovieTweeting and Flixter. Due to the inclusion of both reviews and trailers in the MovieLens dataset, the proposed network achieves low accuracy. The accuracy is 97.25% for MovieLens dataset which is comparatively 1.18% low than the attained results for movie trailers using same dataset. Moreover, the proposed model yields better accuracy for text analysis using IMDB than MovieLens. Compared to MovieTweeting and Flixter datasets the proposed model obtains better results in OMDB dataset. From this comparison, the proposed network shows high accuracy level in IMDB dataset, but it does not perform well in the different datasets.
Accuracy comparison of Proposed network with different datasets for text analysis
Table 7 portrayals the performance comparison of the proposed Dual deep neural network with the existing methods interms of accuracy. The proposed network attains the average accuracy of 98.07% for recommending best movies by using IMDB and MovieLens datasets. According to this comparison analysis, CNN-LSTM [28] achieves 9.91% lower accuracy than the proposed GRU based on IMDB dataset. Moreover, the prior models like VGG-16 [29], DNNRec [30] and LSTM-RNN [31] attains low level of accuracy than the proposed FRCNN based on MovieLens dataset. The FRCNN model progresses the overall accuracy range by 5.47%, 6.39% and 1.83% better than [29], [30] and [31] respectively. The proposed Dual deep neural network yields higher accuracy than the existing models for both the movie recommendation datasets.
Accuracy comparison amid the proposed and existing models
In this study, a multi modal classifier is introduced which makes use of the output from dual deep neural networks with GRU for text analysis and Faster R-CNN for image analysis. Initially, the trailer of the movie is manually splitted into frames and these frames are pre-processed using fuzzy elliptical filter for image analysis and the movie reviews are also tokenized for text analysis. The pre-processed text is taken as an input for GRU to classify offensive and non-offensive movies and the pre-processed images are taken as an input for Faster R-CNN to classify violence and non- violence movies based on the extracted features from the movie trailer. Afterwards, the four classified outputs are given as input for fuzzy decision-making unit for recommending best movies based on the Mamdani fuzzy inference system with gauss membership functions.
The performance of the dual deep neural networks was evaluated using the specific parameters like specificity, precision, recall, accuracy and F1 score measures. The proposed GRU yields accuracy range of 97.73% for reviews and FRCNN yields the accuracy range of 98.42% for movie trailer. The proposed Dual deep neural networks attain the average accuracy of 98.07% for recommending best movies by using IMDB and MovieLens datasets. The proposed model progresses the overall accuracy range by 9.91%, 5.47%, 6.39% and 1.83% better than CNN-LSTM, VGG-16, DNNRec and LSTM-RNN respectively. From the experimental fallouts, the proposed dual deep neural networks produce better results in both text and image analysis. But the proposed model does not attain better accuracy in text analysis. The classification results can be improved using word embedding in conjunction with advanced deep learning networks using the MovieLens dataset. In future work, we will develop a novel 3D deep learning architecture that captures the relationship among frames within a trailer in order to recommend the best movies.
