Abstract
Web service recommender systems have a fundamental role in the selection, composition and substitution of services. Indeed, they are used in several application areas such as Web APIs and Cloud Computing. Likewise, Deep Learning techniques have brought undeniable advantages and solutions to the challenges faced by recommendations in all areas. Unfortunately, the field of Web services has not yet benefited well from these deep methods, moreover, the works using these methods for Web services domain are very recent compared to the works of other fields. Thus, the objective of this paper is to study and analyze state-of-the-art work on Web services recommender systems based on Deep Learning techniques. This analysis will help readers wishing to work in this field, and allows us to direct our future work concerning the Web services recommendation by exploiting the advantages of Deep Learning techniques.
Keywords
Introduction
The main undeniable advantages of Service-Oriented Architectures (SOA) are reuse, interoperability, loose coupling, simplicity, speed of access functionalities and in particular the rapid construction of new functionalities [1]. Indeed, in recent years there has been a growing popularity of Service-Oriented Computing (SOC) and the Internet of Services (IoS); thus, a growing number of developers are benefiting from the reuse of Web services, generally, in the form of Web APIs [2]. Building software as a service is achieved by the integration of several software components belonging to different vendors to achieve a useful software composition like Mashups [2]. However, selecting the desired services from a massive service repository is increasingly becoming a difficult task for service consumers [3]. Thus, the research in Web services recommendation and selection is considered a very active topic.
The approaches carried out in this field are mainly based on one of the two recommendation strategies, namely Collaborative Filtering [4] and Content-Based filtering [5], or on their combination [6]. Collaborative Filtering (CF) uses Quality of Service (QoS) ratings by users to recommend a service to an active user. However, Content-Based (CB) filtering uses Web service descriptions to recommend the service whose features match the query. In addition, several researchers have used side information relating to context, location, time etc. [7, 8]. Others have used knowledge graphs to take advantage of users’ and services’ social relations [9–12]. Authors in [13] proposed a QoS value prediction approach by combining user-based and item-based CF methods. Approaches proposed in [14–16] are based on location aware QoS by combining the model-based and memory based CF algorithms. Other approaches based on Personalized CF [17, 18] take into account the personalized influence of services or user’s experiences when computing similarity measurement between users and services. Authors in [19, 20] presented CF approaches to predict unknown values for QoS. The surveys [10–12] contain more information on CF and CB based Web service recommendation approaches.
With the rapid increase in the number of services on the Internet, the recommendation task has become more important and more difficult [21]. As a result, Web service recommender systems still suffer from the classic recommender problems relating to CF and CB strategies such as cold-start, accuracy, scalability and sparseness, in addition to problems specific to Web services regarding their functional and non-functional properties such as complex service interactions.
To remedy these problems, several authors have, very recently, used Deep Learning (DL) techniques in Web services recommendation. Indeed, Deep Learning has already been successfully used in recommender systems in various fields. It significantly improves their performance, because it simplifies the features development thanks to its ability to extract user interactions, learn latent factors and find ideal conceptualizations of characteristics [22].
The first use of Deep Learning techniques for recommender systems dates back to 2007 with the work of Salakhutdinov et al. [23, 24]. The authors proposed a recommendation system based on Restricted Boltzmann Machines (RBM) for Collaborative Filtering. DL was first used to recommend Web services in 2017 with the work of Bai et al. [2]. They used Stacked Denoising Auto-Encoder (SDAE) to build their Framework for recommending long-tail Web services.
This literature review shows that since 2017 the majority of works relating to Web services recommendation systems use DL techniques. It also underlined that the recommender systems, in Web services field, have not yet benefited from all Deep Learning techniques advantages, compared to recommender systems in other fields (movies, music, products, . . . etc.). However, recommendation and Deep Learning techniques can provide many advantages in the discovery, selection, composition and substitution of Web services. Properly exploiting these operations, allows developers to derive maximum benefit from the Web services reuse and collaboration in the form of composite services called Web APIs or Mashups.
Therefore, a good state-of-the-art analysis of DL-based Web services recommendation is necessary to help researchers, especially since the Web services area has its specificities and many differences with other recommendation domains. From our point of view, the three existing surveys on Web services recommendation are too restricted and don’t include works based on DL. Indeed, the two surveys by Sebastian et al. [10] and Puri et al. [11], study only some recommendation systems based on Collaborative Filtering and QoS, while a third recent survey by Pandharbale et al. [12], focuses on Web service recommender systems based on CF and CB strategies.
In this context, this paper presents a comprehensive survey and analysis of studies on DL-based Web service recommendation. The SLR (Systematic Literature Review) method is adopted by following a few steps [25, 26]. Our contributions can be summarized as follows: Study several works on Web service recommender systems based on Deep Learning, using SLR method. Present main challenges faced by recommender systems for Web services, then the solutions brought to face these challenges using Deep Learning techniques. Classify studied works according to several criteria, namely: recommendation strategies, Web services properties taken into account, DL techniques used, data sets and performance evaluation metrics. Present quantitative analysis of these works and give guidelines that will help researchers in this field by considering research shortcomings. Provide a reliable background for our future contributions in this area.
The remainder of the paper is structured as follows: Section 2 is devoted to basic concepts definition. In Section 3 we present our review methodology. Section 4 is dedicated to the survey results with an analysis of different approaches and future research directions in this domain. Finally, in Section 5, we end this paper with a conclusion and some perspectives.
Background
In this section, we present the main paradigms and concepts relating to the studied state-of-the-art approaches, namely Web services, namely Web services, Web service recommendation systems, and Deep Learning (DL). This section ends with the advantages of using DL in recommendation.
Web services
A Web service is a software component publicly exposed via an interface which describes its functionalities and characteristics using XML (eXtensible Markup Language). Thanks to its interface, the service is discovered by other services or software. The Web service exchanges XML messages with its environment using an Internet protocols stack. It is identified using a URI (Uniform Resource Identifier) [27].
Web service technology is based on the messages exchange using standards, which allows applications to communicate via the Web independently of their platforms and their languages [28]. This technology therefore allows application collaboration and interoperability.
Using Web services involves three actors communicating together: the provider who owns the service, publishes it and offers it publicly; the customer who discovers the service and requests its use and; the service directory, allowing services’ publication by their suppliers and their discovery by customers.
Before the Web services messages exchange, they are first discovered and selected thanks to their descriptions. A Web service is described by its functional properties which can be used to select it, in addition non-functional properties for differentiating services with the same functionality. Qualities of Service (QoS) are non-functional performance properties.
The Web service selection is based particularly on QoS which determines the service importance and usefulness. QoS attributes used for service selection include availability, throughput, response time, and success capability [12].
Web services, coming from different platforms, are heterogeneous and autonomous, despite this they have the ability to collaborate and interoperate creating composite services with greater granularity. Indeed, the composition process is one of Web service main advantages including the substitution of services by other equivalents can take place [29, 30].
A Web service implemented for specific functionality, and published by its provider, is called an API. For implementing complex functionalities, several APIs are combined in a composite service called Mashup [31].
Web services recommendation systems
The recommendation starts first, with an item rating prediction (preference) by a user, thereafter; the system recommends to user the items with the highest predictions. However, there are two categories of Web service properties used by recommendation methods, namely, functional attributes, such as service descriptions and tags, and non-functional attributes such as Quality of Service (QoS) [31]. Indeed, service recommendation system first identifies services corresponding to user’ functional requirements. Then, a recommendation is made based on non-functional or performance requirements. The latter is measured by Quality of Service (QoS) attributes such as execution time, training time and accuracy [22].
Quality of Service (QoS) is one of the most widely used criteria for recommending services. The QoS prediction is an important task in the Web services selection and recommendation [21].
Service recommendation systems are based on content-filtering’ or collaborative-filtering’ strategies, or on their hybridization.
Content-Based (CB) Web service recommendation algorithms compare features, existing in service descriptions, to user needs to recommend appropriate services [31]. They use the keyword-based approach (keyword similarities) or semantic approaches (ontologies or latent-semantics). Those methods are limited by textual or semantic similarities, and manual annotation high cost [11, 31].
Collaborative Filtering (CF) is a process based on similarity calculation between users on the one hand, and between Web services on the other hand, for recommending to a given user the services chosen by users that are similar to him [11]. It is mainly based on the interactions’ explicit and implicit history between users and services. Through these interactions and transactions, recommender systems can infer missing values. The recommender system often takes as input the service invocation matrix, from which it determines the users’ or services’ similarities to generate recommendations to the active user [31].
Collaborative Filtering algorithms are classified into two types: Model-based algorithms first start by training a model taking as input a QoS matrix; the trained model is then used to generate service predictions. There are several model-based CF approaches; in particular, the Matrix Factorization (MF) which is the most important and the most popular [11]. Memory-based algorithms, as their name suggests, use data stored in memory to generate predictions. These data relate to users, services and the Qualities of Services. These algorithms can be classified into two subcategories, namely: Nearest Neighbor (NN) algorithms and Top-N recommendation algorithms. It should be noted that the Nearest Neighbor (NN) algorithms are the most widely used [11]. They are based on similarity to discover the closest users or services. They are then classified into user-based and item-based algorithms [32].
In terms of accuracy, the Matrix Factorization (MF) technique, with latent factor models, is more efficient for QoS prediction than the Nearest Neighbor technique. MF at first allows learning two dense models, one for users and another for services. Then, scalar product is calculated between the two latent models respectively of user and service, to obtain the QoS prediction [21].
To cope with the CF limitations that persist with the increase in services number, the Factorization Machine (FM) was first proposed by Rendle et al [33] by integrating the advantages of traditional factorization models and Support Vector Machines (SVM). FM efficiently processes rare data and therefore improves service recommendation accuracy. It also reduces recommendation latency caused by various factors. FM has better scalability since it can be easily combined with contextual information from users and services [34]. There are many Factorization Machine models: basicFM, DeepFM (Factorization-Machine based Neural Network) [35], AFM (Attentional Factorization Machine) [36, 37], NAFM (Neural and Attentional Factorization Machine) [38] and xDeepFM (eXtreme Deep Factorization Machine) [39].
In recent years, several hybrid approaches were successful by combining multiple techniques as well as different types of information to propose adequate recommendations [40]. Most of the approaches studied in this paper use hybridization by integrating additional feature information, or by using content information and usage history.
Deep learning
Deep Learning (DL) is a Machine Learning subfield which mainly uses Neural Networks. It is based on learning multiple layers of data abstractions. A Deep Learning model has a hierarchy of several neurons layers, in which the lower level concepts help to define the higher level concepts [24].
According to Zhang et al. [41], if a neural architecture allows optimizing a differentiable objective function using a variant of Stochastic Gradient Descent (SGD), then we consider this architecture as a Deep Learning structure [41].
Currently, Deep Learning approaches produce state-of-the-art solutions to many problems, including computer vision, Natural Language Processing (NLP), speech recognition and Collaborative Filtering. Whether in the context of supervised or unsupervised learning, these approaches have brought remarkable success in all areas.
Big data and computing power are the main factors that promote Deep Learning as an advanced Machine Learning technique. Indeed, the use of big data with a Deep Learning model allows it to efficiently learn representations and have better results. The Graphics Processing Units (GPUs) use will provide Deep Learning models with the processing power needed for complex calculations [24].
The Deep Learning models that have been used in Web service recommendation tasks will be briefly introduced in the remainder of this subsection. Auto-Encoder (AE) is a feed-forward neural network consisting of at least three layers. The input layer and the output layer have the neurons’ same number; the middle layer called hidden layer allows encoding the input layer data in the form of salient entities’ more compact representation. AE uses this representation to reconstruct an inputs’ copy at the output layer [42]. This unsupervised model’ learning process includes two successive transformation stages called respectively encoding and decoding. The encoder formed from the input layer and the hidden layer enables the data in the high-dimensional input layer to be encoded into a smaller-dimensional representation in the hidden layer. The decoder formed from the hidden layer and the output layer has the role of reconstructing the input layer data using the hidden representation. Depending on their structures, AEs have several variants such as Sparse Auto-Encoder, Variational Auto-Encoder (VAE), Contractive Auto-Encoder, Denoising Auto-Encoder (DAE), Marginalized Denoising Auto-Encoder, and Stacked Denoising Auto-Encoder (SDAE) which is formed by several DAEs [24, 41]. SDAE have demonstrated their power in reducing the input data dimension with the most conformal copy’ reconstruction of this data in the output layer [30]. Deep Neural Network (DNN) is an Artificial Neural Network (ANN) that has multiple layers in addition to input and output layers. Most DNN networks are feed-forward models whose main advantage is their power in modeling complex nonlinear relationships [43]. MLP, RNN and CNN are three different types of DNN. MultiLayer Perceptron (MLP) according to its name is a Neural Network composed of several layers with at least one hidden layer between the input layer and the output one. It is a feed-forward algorithm where the output of each layer feeds the next layer. The MLP does not always represent a binary classifier and one can choose for the perceptron any arbitrary activation function. MLPs can be thought of as nonlinear mappings’ stacked layers between inputs and output, learning relationship between linear and non linear data. MLPs are mostly used as universal approximators [41]. Recurrent Neural Networks (RNNs) do not use the feeding technique of feed-forward algorithms; they contain loops and memories to memorize previous calculations. RNNs are neural networks allowing sequential data modeling [41]. They also allow the non-linear representation of the relationship between users’ and items’ latent characteristics and their co-evolution over time. Unlike the usual approaches based on Nearest Neighbor and Matrix Factorization, RNNs are very efficient when it comes to short-term predictions and recommendation coverage. Moreover, RNNs are chosen, especially for recommender systems integrating users’ implicit behaviors to their preferences and for session-based systems [24]. The main RNNs variants are the Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). They are usually used because of their power to solve the vanishing gradient problem [41]. Convolutional Neural Network (CNN) is a structure that applies, in at least one of its layers, a convolutional filtering operation instead of general matrix multiplication. It is considered a special supervised feed-forward model. A CNN consists of four layers’ types which are stacked to build CNN architectures to map the input volume to an output volume. We have the convolution layer, the pooling layer, the ReLU correction layer and the fully connected layer [24]. CNNs are often chosen in several fields because they are able to significantly improve the systems efficiency and accuracy due to their power to capture global and local features [41]. The application areas where CNNs demonstrated great success are image and object recognition, audio processing and self-driving cars [24] as well as recommender systems. Graph Neural Network (GNN) is a set of neural layers capable of learning graph representations [44]. The GNN principle consists, during the propagation process, in aggregating feature information from neighbors and then integrating this aggregated information with the current representation of the central node. The network is a stack of propagation layers with aggregation and update operations [45]. Different variants of GNN are developed by combining CNN and Graph Representation Learning (GRL); they allow distilling structural information and learning high-level representations [46]. Graph Convolutional Network (GCN) and Graph ATtention network (GAT) are the most used models. GNNs are mainly used for social services recommendation.
Advantages of DL techniques in recommendation
The advantages of using Deep Learning techniques for recommender systems are multiple, namely: Deep Neural Networks increase systems expressiveness because they are able to model the data non-linearity with non-linear activation functions like Sigmoid and ReLU. They are also able to approximate extremely complex functions. This allows to capture complex user-item interaction patterns and to accurately reflect user preferences [32, 49]. Deep Neural Networks are effective in learning optimal representations of features and latent factors from input data by stacking multiple layers and merging features at different levels [32]. This reduces feature design efforts, and frees users from feature engineering. They allow recommendation models to include diverse content information like text, images, audio, and video [22, 47]. Deep Neural Networks are suitable for sequential modelling tasks such as machine translation, speech recognition . . . etc. As well as the sequential pattern extraction task [47]. Flexibility is the strong point of Deep Learning techniques, thanks especially to modular frameworks such as Tensorflow, PyTorch, Keras . . . etc. This makes possible the composition of several models to benefit from their hybridization and to have powerful recommender systems capable of describing the characteristics as well as the factors [47].
Review methodology
Need for a review and research questions
Although Deep Learning techniques have been widely used in recommender systems in various fields, such as movies, music, e-commerce . . . etc, they have only been introduced for Web services very recently. Since the number of Web services increases exponentially, these deep methods have the ability to solve the problems related to these services by improving their recommendation, for their selection, their composition as well as their substitution. Hence, the need to conduct a survey to present a complete review of published studies, mainly using SLR (Systematic Literature Review) method.
The research questions (RQ) that our review will answer are the following: RQ1: What are the problems faced by recommender systems in the Web services field? RQ2: What are the Deep-Learning techniques used to build service recommendation systems, and to solve these problems? RQ3: What are the metrics used to evaluate the performance of Web service recommendation systems based on Deep Learning? RQ4: What are the future research directions for Web service recommender systems based on Deep Learning?
Selected papers
To find publications relevant to our SLR, we used popular and rich digital libraries, including IEEE Explore, the Association of Computing Machine (ACM) Digital Library, Hindawi, ScienceDirect, and Springer.
As already mentioned, Deep-Learning techniques were only introduced in Web service recommendation systems from 2017. Journal papers and conference proceedings were manually searched since 2017. Therefore, the papers selected for this study are 37 in number and cover the period from 2017 to 2023. They are from popular journals or specialized conferences, and all written in English (see Table 1). The conducted literature study indicates that, since 2017, more than 50% of works relating to Web services recommendation use DL techniques.
Selected papers
Selected papers
The chosen papers deal with Web services recommendation including, necessarily, one or more Deep Learning techniques and at least one recommendation strategy.
Some papers were chosen and then discarded after thorough reading. Indeed, their titles and/or their keywords are related to the considered theme, but their content is far from it.
The distribution of these studies according to their type is illustrated in Fig. 1. It shows that 73% are journal articles, and only 27% are conference proceedings.

Distribution of studies according to their publication type.
As indicated by Fig. 2, the publications number per year is increasing between 2017 and 2022, with a maximum of 8 publications in 2022. This shows that this is a recent research field and not yet well explored.

Number of papers per year.
This section contains the responses to the research questions previously mentioned (Section 3.1). For that, subsection 4.1, focusing on the challenges faced by RS, will answer the RQ1. Subsections 4.2 and 4.3, describing several studies and concluding with a discussion and analysis, will answer the research questions RQ2, RQ3 and RQ4.
Challenges of web services recommender systems
There are main challenges faced by Web services Recommender Systems (RS) are: Classic RS problems such as accuracy, historical data scarcity, cold-start, latent factors’ over-fitting...etc. The poor quality of Web service description content [2]. The interactions between Web applications and their component services are complex, and are difficult to capture with linear models, which affects RS performance. QoS scores are missing in collections (scarcity), and they vary greatly over time. This induces additional complexity in the recommender systems construction [21]. QoS depends on contextual information from service providers and consumers such as geographic location and network environments [3]. Therefore, accurately predicting the unknown QoS is a challenge. In service-oriented application scenarios, monitoring of Web services’ QoS invoked at different times is a time-consuming and unrealistic task for service providers, which raises the temporal prediction problem of unknown QoS values [3]. Weak use of performance criteria to evaluate a service. Most often, only the Quality of Service metric is considered [22]. However, the system can have better results by using several criteria together. In an edge computing environment, existing methods cannot learn users’ or services’ deep functionality [57]. The computational complexity limits the Factorization Machine (FM), and therefore the high-dimensional features are not fully utilized [34]. Using social information to improve time-aware recommendation is a challenge [69]. In order to recommend services that can collaborate in the Mashups creation, the challenge is to consider the constraints between the candidate services to choose only the services with complementary functionalities [71]. As the services social environment changes, the QoS as well as their functionalities evolve [71]. Recommendation performance is significantly affected by semantic gaps between service and Mashup representations [71].
Deep learning for web services recommendation
To solve the above problems, researchers have used DL techniques to build recommender systems for Web services. In this subsection, several papers on Web service recommender systems based on Deep Learning will be summarized. For each study, we present the issues raised, the proposed approach, the dataset and the evaluation metrics used. The approaches are classified according to the DL models used, namely, Auto-Encoders and Deep Neural Networks including MLP, RNN, CNN and GNN (section 2.3), in addition to their hybridization (see Fig. 3).

Classification of DL based Web services recommendation systems.
According to B. Bai et al. [2], several developers compose non-popular Web services in Mashups, but there are very few studies solving long-tail Web service recommendation problems. The authors [2] employed a Stacked Denoising Auto-Encoder (SDAE) to propose a long-tail Web service recommendation framework. It extracts features to solve the insufficient quality problem of description’ content. The model is able to achieve better conceptualization of composite services and queries. To solve the scarcity problem they learn preference models from developer Mashups. To mitigate over-fitting and cold start problems, they use matrix factorization based on suitable features and other predefined elements such as update time, uses number of long tail services, etc. Bai et al. conduct their experiments on real-world datasets from ProgrammableWeb and use as evaluation metrics Recall@5, Recall@50 and Recall@250 [2].
Matrix Factorization presents some problems such as: sparseness and over-fitting. To mitigate these issues, M. Smahi et al. [21] improved MF techniques by proposing an Auto-Encoder based approach. First, to get the best accuracy scores, they used country ID and provider ID to divide the QoS score dataset into clusters thereby reducing data sparseness. Thereafter, they perform the latent factors learning of each reduced matrix (or cluster) by exploiting an AE. To alleviate the over-fitting problem, they perform cross-validation during training to infer the best size of the hidden layer (the number of latent factors). Finally, the missing QoS scores are produced by the AE and added to the initial data matrix. Smahi et al. performed several experiments on WS-DREAM dataset. They diversified the data sizes as well as the rarity levels for a powerful evaluation of their approach. They used MAE and RMSE as evaluation metrics [21].
The work of Y. Yin et al. [52] is based on several techniques’ hybridization, they propose a hybrid model and two individual models; the three form an ensemble model integrating the recommender systems’ two strategies namely model-based Collaborative Filtering and Neighborhood-based Collaborative Filtering. They solve the problems of cold start, sparse data and capturing the QoS complex structure, by using a powerful Auto-Encoder capable of performing a calculation earlier to estimate the QoS missing values and obtain the hidden characteristics efficiently. Furthermore, to improve prediction accuracy, they use the Euclidean distance and propose a method to avoid overestimation error problem. For their experiments, Yin et al. used WS-DREAM dataset and to measure prediction accuracy, they used two evaluation metrics namely MAE and NMAE.
For accurate and fast QoS prediction, G. White et al. [55] used the Dropout technique on a Stacked Auto-Encoder which allowed them not only to counter the overfitting problem but also to reduce the training and query time. They showed that their approach is significantly better than traditional Matrix Factorization algorithms. For their experiments, White et al. use WS-DREAM dataset and the MAE and RMSE evaluation metrics.
In [61] M. Smahi et al. propose an approach based on Deep Learning for QoS prediction. They combine a Deep Auto-Encoder (DAE) based Matrix Factorization model with a geographic feature-based clustering technique to improve prediction efficiency. The authors conducted experiments on Web service QoS repository, WS-DREAM. They use MAE and RMSE metrics.
F. Z. Merabet and D. Benmerzoug [65] address challenges faced by recommender systems including data scarcity, prediction accuracy and over-fitting. For this, they divide the initial matrix into several denser matrices thanks to the similar neighbours’ clusters previously formed. This allows them to reduce data parsimony. The proposed framework consists of an Auto-Encoder with the Neighbour-based Collaborative Filtering recommendation strategy. This neural network greatly reduces the over-fitting problem because it can determine an ideal latent factors’ number by learning deep features. For their experiments, the authors used a QoS dataset from WS-DREAM with data sparseness’ different levels. They chose MAE and RMSE as performance evaluation metrics.
According to H. Chen et al. [66], the cause of producing bad recommendations is the developers’ inexperience; who do not formulate well their Mashup creation requests. The authors propose a model based on an Auto-Encoder neural network with reinforcement learning. They use Steiner’s tree search approach with a service-keyword correlation graph. For their experiments they use a dataset from the ProgrammableWeb.com platform and several evaluation metrics: Cardinality, Efficiency, Compatibility, Freshness, Diversity, and Optimality.
Multi-layer perceptron based approaches
According to B. Alghofaily and C. Ding [22], the main limitations that hinder Machine Learning (ML) recommenders’ effectiveness are: the high cost of complex calculations and the non-retention of information concerning previous experiments’ executions such as their data sets, parameters and results. In this context, Alghofaily and Ding [22] proposed a model based on the multilayer perceptron (MLP) for ML service recommendation. A service is recommended based on its expected performance on the input data set. Since service features and input data affect the QoS attributes of ML services, the authors focused on adding these two types of secondary information into the recommendation process by using service and dataset integration layers. For their experiments they use various classification services running on several datasets from OpenML. They employ different evaluation metrics namely NRMSE, Precision, Recall and F1 [22].
To improve the accuracy of Web API QoS predictions, L. Shen et al. [60] use the hybridization of Factorization Machines (FM), a MultiLayer Perceptron (MLP) and contextual information to achieve a Framework that learn complex nonlinear high-order interactions and produce adequate recommendations. For their experiments they use WS-DREAM dataset with MAE and RMSE metrics.
To have more accurate QoS predictions, J. Xu et al. [32] proposed a Collaborative Filtering model based on Multi-Layer Perceptron (MLP) and Matrix Factorization. Their approach allows the modeling of nonlinear and complex user-service interactions by adding different secondary information related to the context. To obtain more correct and reliable predictions, they use contextual bias and multitasking learning. For their experiments, they used the WS-DREAM dataset and the MAE and RMSE evaluation metrics.
To solve the data scarcity’ and performance variation’ challenges, Q. Wang et al. [34] combined a MultiLayer Perceptron (MLP) with the Factorization Machine technique by integrating location information. Indeed, the obtained model augments the QoS dataset with the user location vectors. It allows the extraction of low and high dimensional features simultaneously thanks respectively to FM and deep techniques offered by the neural network. The model uses the weighting and entropy of features to reduce their bias and increase positive outcomes. The authors conducted their experiments on the WS-DREAM dataset with different data density levels, and they use MAE and RMSE as evaluation metrics.
In [47], Y. Zhang et al. used the Collaborative Filtering technique through a location-aware Deep Neural Network to improve the prediction accuracy. The proposed model is able to solve the data scarcity problem. Indeed, the MultiLayer-Perceptron (MLP) can capture high-dimensional data and nonlinear user-service interactions by using dense vectors to store location features. To improve QoS values prediction, they integrated a similarity corrector in the output layer of the network. They performed several experiments on the WS-DREAM dataset with MAE and RMSE measurements as performance evaluation metrics of their system.
Deep neural network based approaches
X. Zhang et al. [35], combine the service clustering technique and QoS to propose a Web API recommendation approach. For the extraction of the service multidimensional attributes on the one hand, and the exploitation of their complex relations on the other hand, the authors used Deep Factorization Machine model. Indeed, the proposed Framework integrates a Deep Neural Network (DNN) component and a Factorization Machine (FM) component. Authors conduct their experiments on the ProgrammableWeb dataset and use the following evaluation metrics: Precision, Recall, Purity, Entropy, DCG@K (Discounted Cumulative Gains) and HMD (Higher hamming distance).
To improve the QoS prediction accuracy H. Wu et al. [48] presented a Deep Neural network Model (DNM) incorporating contextual information to predict multiple QoS attributes. Their approach is to build a network with an interaction layer and perception layers to capture the semantics of features as well as their complex interactions by sharing a latent space and stacking multiple layers of neurons. For their experiments they used WS-DREAM dataset, and three evaluation metrics: MAE, RMSE and Normalized MAE (NMAE).
To remedy the linearity problem of Matrix Factorization (MF) and its weakness in capturing the users’ and services’ complex characteristics, G. Zou et al. [58] proposed a model based on Collaborative Filtering and a Deep Neural Network (DNN). The proposed approach consists in combining Matrix Factorization technique with Neighborhood of selected in addition to merging auxiliary information concerning the user location and the invocation QoS. The model significantly improves the performance of the proposed model, according to the values of the evaluation metrics (MAE and RMSE) obtained during the experiments carried out on the WS-DREAM dataset.
G. Kang et al. [38], combined a Deep Neural Network (DNN) with an attention mechanism to improve the Factorization Machine (FM). This allows them to build a service recommendation Framework capable of capturing the features’ non-linear interactions as well as their important differences. From a service repository, they integrated multidimensional data into their model. To validate the proposed approach, the authors conducted experiments on real data from ProgrammableWeb and used two evaluation metrics Logloss (cross entropy) and AUC (Area Under ROC Curve).
To solve the user cold-start problem, Y. Ma et al. [40] proposed a recommendation approach integrating in a Deep Neural Network (DNN) different service-Mashup interactions which can be of implicit, explicit and content types. With this approach the model is able to efficiently learn the latent factors and therefore obtain accurate predictions of recommended service scores for Mashups. They conducted experiments on a real-world data set from ProgrammableWeb and used common evaluation metrics namely NDCG (Normalized Discounted Cumulative Gains), MAP, Precision, Recall, F1.
To remedy the problems of data scarcity, insufficiency of latent information and non-consideration of their weight, L. Ding et al. [62], propose a QoS prediction for Web services through the combination of several techniques namely deep methods, Factorization Machine, Collaborative Filtering and multi-component graph convolutional. To extract the latent factors, they proceed to the edges decomposition of the bipartite service-user graph. They get latent parts with attention in nodes. The model uses a Deep Neural Network (DNN) to optimize and know the weights of the feature components. Then it combines them to obtain the final integration of the user and the service, which are placed in the DeepFM model to obtain the QoS prediction. For experiments, the authors used a data set from WS-DREAM with the evaluation metrics MAE and RMSE.
Dang et al. [31] proposed a Web Service Recommendation approach by integrating several methods. To capture knowledge relationships between Mashups and Web services, the authors introduce side information into the recommendation model such as knowledge graph which improves accuracy. They take advantage of an attention mechanism to model the relationships between candidate services and requests. A Deep Neural Network (DNN) is used to capture the complex and nonlinear Mashup-service interactions in a very sparse dataset. For their experiments they used ProgrammableWeb data set and Precion@N, Recall@N and F1@N as evaluation metrics.
W. Liang et al. [64] used the Collaborative Filtering technique with Content-Based similarity calculation and proposed a Web services recommendation algorithm. It includes two modules; the first is based on collaborative security filtering whose role is the representation of complex Mashup-services interaction relationships. The second content similarity module uses word embedding technology to allowing the extraction of semantic similarity characteristics between the Mashup and Web services. A Deep Neural Network (DNN) takes as input the results of both modules to predict Mashup evaluation values and produce a recommendation list. For their experiments they used a dataset from Programmable Web, and they measure the performance of their model using four metrics: MAP, NDCG, accuracy, and recall.
S.G. Kumar et al. [72] built a Deep Neural Network (DNN) model with an adaptive learning algorithm to analyse the performance of the Mishmash technique. They reduce the time of the next requests by composing the relevant Web services dynamically.
Recurrent neural network based approaches
D.Chen et al. [49] proposed a Deep Learning based approach to improve the QoS predictions accuracy. They created a new Recurrent Neural Network (RNN) composed of several stacked LSTM (Long-term and Short-Term Memory) layers. The authors analyzed data using classification and regression. They used several regularization techniques such as Dropout. Their experiments were conducted on two subsets of WS-DREAM data with several evaluation metrics namely, Precision, Recall, F1 and MSE.
Since MF ignores dynamic users-services dependencies, J. Zhou et al. [59] proposed a new model named Recurrent Factorization Machine (RFM). In which, the authors integrate Gated Recurrent Unit with Self-Attention (SAGRU) and Projected Factorization Machine (PFM), providing users with personalized services that use sequential historical records. They use Self-Attention to extract a user-service-time matrix and suppress noise using a neck structure. They conduct their experiments on WS-DREAM dataset with MAE and RMSE as evaluation metrics.
Most of the researches concerning the temporal QoS prediction do not exploit well the chronological relations and the invocation information of user-service interactions and omit implicit feature representation, which cause poor QoS prediction accuracy [3]. To address these challenges, G. Zou et al. [3] proposed a model that performs the Time-Sensitive service QoS Prediction task. They used Gated Recurrent Units (GRUs) to learn and extract temporal characteristics among users and services. They train their model by parameter optimization and apply it to extract temporal aggregated features across multiple time slices, which can more effectively capture the implicit nonlinear relationship between users and services. For the experiment they use a WS-DREAM temporal QoS data set. Their results show that their approach outperforms existing ones in terms of evaluation metrics (MAE and RMSE).
It should be noted that, according to the work of S. -F. Lin et al. [73], the RNN model is more efficient for QoS prediction than its extension models, GRU and LSTM.
Convolutional neural network and graph neural network based approaches
According to Y. Yin et al. [57], given the shortcomings of existing methods of recommending services; they cannot be directly adopted in an advanced computing environment. Indeed these methods cannot learn the deep functionalities of users or services. In order to fully utilize these hidden features, the authors [57] proposed a new QoS prediction model for service recommendation in an edge computing environment. This model is based on Matrix Factorization (MF) with deep feature learning, using a Convolutional Neural Network (CNN). At the same time, to improve the accuracy of Neighbour selection, the model uses a new similarity calculation method. The CNN learns the characteristics of the Neighbours, both user side and service side, forms a feature matrix and infers the characteristics of the target user or the target service. Experimental results on WS-DREAM dataset prove that the proposed approach can achieve higher QoS prediction values in terms of MAE and RMSE metrics [57].
According to Y. Xia et al. [67], extracting and learning deep user or service characteristics from different sources information to enhance the accuracy of QoS predictions is still a challenge. For this, the authors proposed Deep Neural Networks based on the Convolutional Neural Network (CNN). It is able to extract features from various sources and learn the interaction of these features. Indeed, by integrating Matrix Factorization and neural networks, this framework captures the implicit functionalities of the QoS matrix, then the functionalities coming from different sources, combined by explicit functionalities extracted from the documents describing the services in addition to the semantic data. This framework can learn complex user-service interactions of local and global characteristics. It therefore provides QoS predictions through mixed characteristics. For their experiments, they used WS-DREAM dataset with the MAE and RMSE evaluation metrics.
To remedy the problems of cold start, data scarcity and therefore accuracy X. Li et al. [44] introduced auxiliary information using a knowledge graph and a sampling method to have meta-path instances. They propose a Web service recommendation model based on a Graphical Neural Network (GNN). To avoid the information loss, they use the attention mechanism. This allows their model to learn the information propagation weights between neighbors in the meta-path. This model performs several guided aggregations that result in a final integration of nodes which greatly improves accuracy and interpretation. The authors conduct several experiments using real data from ProgrammableWeb website with three evaluation metrics NDCG@K, Recall@K and Precision@K.
In order to improve the recommendation and its interpretability, C. Wei et al. [68] propose a Graph Neural Network (GNN) for social services recommendation. Their framework combines graph convolution techniques and attention mechanisms to integrate higher-order social relations. It is made up of two main components to encode a user’s preferences. The first component is built from a stack of high-order social integration propagation layers that allow representing the general user preference. However, the second component allows having its specific preference for a service by using an attention mechanism at the neighbor level. Thanks to these representations of preferences, scores are attributed to the candidate services in order to classify them. The authors conducted experiments on a dataset of the Steam platform that offers gaming services similar to Web services. They use two evaluation metrics, Normalized Discounted Cumulative Gain (NDCG) and Hit Ratio (HR).
Hybrid approaches
H. Labbaci et al. [30] are interested in predicting future service interactions, by proposing a deep hybrid model. They used a Stacked Auto-Encoder to obtain a reduced data representation while preserving the initial attributes, and to learn the latent functionalities of Web services. Using learned service features as well as composition and substitution history, they train Deep Neural Networks (Multi-Layer Perceptron). This approach greatly improves the predictions accuracy. To validate their approach, they performed experiments on a real dataset from the site programmableWeb.com. The results obtained show a considerable improvement in the accuracy.
Xiong et al. [50] evoke the problem of Matrix Factorization which is a linear model of latent factors; its use does not capture well the complex interactions between Web applications and their component services in a low density interaction matrix, resulting in an inadequate service recommendation and poor performance. In this context, the authors [50] take advantage of several neural networks’ hybridization and recommendation strategies. The framework consists of three parts: the first is composed of a Deep Neural Network based on Collaborative Filtering while the second is composed of a Content-Based Deep Neural Network. These two networks are combined by the concatenation of their last hidden layers and integrated into a third Feed-Forward Neural Network. The three networks are trained together. To evaluate the proposed approach, Xiong et al. conducted a series of experiments using real-world Web services crawled from ProgrammableWeb [50].
To have accurate QoS predictions, W. Ma et al. [51] combined several Deep Neural Networks (DNN) using Matrix Factorization techniques with Dropout regularization. The Framework obtained improves service recommendation performance. Indeed, the experiments carried out on the WS-DREAM dataset show that the model offers predictions with high accuracy in terms of MAE and NMAE metrics.
In [53], Y. Jin et al. proposed a QoS prediction approach that integrates two Deep Learning techniques namely Multi-Layer Perceptron (MLP) and Convolutional Neural Network (CNN) to capture user-service relationships that are non-linear and complex. They combined information of user neighbors and service neighbors in this hybrid and Deep Learning model, which further improves the QoS prediction accuracy. In order to evaluate their approach they used WS-DREAM data set with MAE and RMSE metrics.
To address the scarcity’ and neglect of feature weight disparity’ issues, M. Shi et al. [54] proposed a text extension and a deep architecture based on LSTM and MLP to automatically extract features for service recommendation. They introduced two types of attention mechanisms, a functional attention mechanism that considers tags as functional before exploiting function-related features of services and Mashups, and a contextual attention mechanism that takes Mashup requests as an application scenario in order to select the most suitable service. Functional’ and contextual’ attention mechanisms introduced significantly improve accuracy. The authors used WS-DREAM dataset with the Recall@N, Precision@N and F-measure@N evaluation metrics. They also propose Diversity@N metric.
To solve the data scarcity problem of service-Mashup invocation matrix, J. Ke et al. [56] proposed a hybrid model integrating Collaborative Filtering, Convolutional Neural Network (CNN) and attention mechanism in Deep Neural Networks (DNN). The resulting Web Services Recommendation Framework enables the modeling of non-linear and complex service-Mashup relationships. For the validation of their models they used a dataset from ProgrammableWeb with several performance evaluation metrics namely MAP, Precision@N, Recall@N, F1@N and NDCG@N.
To predict QoS values in service recommendation, P. Sahu et al. [63] benefited from the Collaborative Filtering strategies’ integration, namely model-based and Neighborhood-based methods with Deep Neural Networks. They used one MLP to learn user correlation and another to learn service correlation. The resulting feature vectors of the two networks are concatenated to form the input of a third MLP. They conducted their experiments on the WS-DREAM dataset with using the MAE and RMSE metrics for the evaluation of their system’s performance.
To improve the accuracy of time-aware service recommendation, C. Wei et al.[69] performed hierarchical modeling of service-level differences and friend-level differences along with the Mashup history. For this, they proposed a hybrid Framework composed of a Recurrent Neural Network (RNN) to model the target user behavior, an Attentional Encoder to capture a user’s interested services and a friend-level Graphical Attention Network to model user preferences. For the evaluation of their model the authors conducted experiments on real data from two datasets namely Gowalla and Yelp with the metrics NDCG@K and HR@K.
To improve the accuracy and efficiency of Web services recommendation, B. Cao et al [70] proposed a hybrid model integrating two main techniques. They used a Bilinear Graph Attention Network (BGAT) for service representation and classification in addition to eXtreme Deep Factorization Machine (xDeepFM) to model and learn the interactions between quality attributes and therefore produce accurate QoS predictions. For their experiments, they used real data from ProgrammableWeb with several evaluation metrics namely, Accuracy, Precision, Recall, Macro-F1, AUC_ROC and Logloss.
To solve the problem of long-tail services and take into account the constraints between recommended services, M. Liu et al [71] proposed a service bundle recommendation model based on a Dynamic Graph Neural Network (DGNN), capable to learn services’ evolving representations and reduce the semantic gap. Except Mashup descriptions, their model does not require to keep other information which allows overcoming the cold start problem. To improve Mashup’ expression capacity, the authors use a MultiLayer Perceptron (MLP) and to model services’ evolution, they use a Recurrent Neural Network (RNN). They conduct several experiments on a real dataset from ProgrammableWeb using several evaluation metrics namely Precision@k, DCG@k, Propensity-Scored Precision@k and Propensity-Scored DCG@k.
Discussion and analysis
The studied papers are classified in Table 2 according to several chosen criteria:
Classification of studied DL based Web services recommendation systems
Classification of studied DL based Web services recommendation systems
The recommendation strategy which consists of two techniques: Collaborative Filtering (CF) with its two algorithms (Matrix Factorization (MF)/Factorization Machine (FM) and Nearest Neighbor (NN)) in addition to Content-Based filtering (CB) (see subsection 2.2) The properties of Web services taken into account, functional (F) and non-functional (NF) (see subsection 2.1) The adopted Deep Learning technique (see subsection 2.3) The used evaluation metrics and dataset (see subsections 4.3.1 and 4.3.2).
Table 2 that almost all works have been based on Collaborative Filtering by exploiting the Web services non-functional properties including QoS and auxiliary information. The latter can be, for example, contextual, temporal, social or geographical. Many of these works have used, in addition to CF, Content-Based filtering by exploiting the services’ functional properties which mainly consist of textual descriptions and tags.
This subsection discusses RQ3. Indeed, the evaluation of recommender systems performance is essential; it is carried out thanks to several evaluation metrics. In the papers studied in this survey, the evaluation metrics used are (see Table 2): Rating prediction metrics help determine the recommendation accuracy in terms of error. They calculate the difference between predicted and actual scores. Thus, the metrics’ lower values show greater precision [25]. The metrics of this type used in the studied approaches are Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) Classification accuracy metrics assess how well the system classifies items based on user interest. The evaluation measures used by the studied approaches are accuracy, recall, precision, score F-measure (F1), Log loss (cross-entropy), and AUC (Area Under Curve). Ranking metrics include Normalized Discounted Cumulative Gains (NDCG) which considers that the very relevant elements give more satisfaction than those which are badly classified and the Mean Average Precision (MAP) which takes into account the precisions of the first K-recommended classified elements [25]. In addition, HMD (Higher Hamming Distance) is used to evaluate the recommendation diversity.
As shown in Fig. 4 the two main most used metrics most used are MAE and RMSE:

Frequency of using evaluation metrics.
MAE: is calculated by the average of the absolute difference between the values predicted by the system and the actual values. It is represented by Equation (1):
Where N is the number of ratings used in the test,
RMSE: similar to MAE, this measure is more prone to error because the difference is squared before it is summed. It is represented by Equation (2):
As shown by Table 2 and Fig. 5, the most used datasets in the Web services recommendation are those from WS-DREAM with a rate of 54%, followed by ProgrammableWeb, which is used mainly for Mashups creation, with a rate of 34% and finally others datasets (OpenML, Steam, Gowalla and Yelp) with a relatively low rate of 12%. OpenML is used for ML services recommendation. Steam, Gowalla and Yelp are used in social networks Web service recommendation:

Distribution of datasets.
WS-DREAM
1
[75]: is a Distributed REliability Assessment Mechanism for Web Services. ProgrammableWeb
2
(PW) [76]: is certainly the largest online Web service and Mashup registry. OpenML
3
[77]: Datasets provide training data for machine learning models. OpenML datasets are uniformly formatted and come with rich meta-data to allow automated processing. Steam
4
[78]: is a platform providing gaming services similar to Web services, software, as well as many social networking features. Gowalla
5
[69]: is a known website where users can record and share offline services they consume by checking-in. In addition to its service recording function, Gowalla also allows social networks to connect and coordinate users with people or events that match their interests.
As shown by Table 2 and Fig. 6, the Deep Learning models used for Web services recommendation are Auto-Encoders (SDAE, DAE) with a rate of 16% and mainly DNNs (DNN, MLP, RNN, CNN and GNN) with a total rate of 84%. MLP and DNN are the most used with respective rates of 25% and 24%. Other models have not yet been used for this domain. There are some studies using hybridization of DL techniques (last ten studies in Table 2).

Frequency of using DL models.
The review and the analysis of the DL based Web service recommendation studies and their results show that these techniques brought effective performances to the systems, either for the selection, the composition or the substitution of services. Indeed, DL techniques are used to effectively solve several problems such as accuracy, scarcity, cold start, over-fitting . . . etc.
The works studied have used several DL methods with various objectives. Stacked Denoising Auto-Encoders (SDAE) are used for extracting features to solve the unsatisfactory quality problem of Web services’ content description. With the SDAE, we obtain services’ and composite requests’ better representation. This allows finding which service is functionally relevant for the creation request of the Web API [2]. Auto-Encoders (AE) improve Matrix Factorization (MF) techniques, by building users’ and services’ latent models. This permits the prediction of unknown QoS scores of SWs based on their histories [21, 61]. In addition AE reduce training and query time [55], they reduce over-fitting too [52, 65].
MultiLayer Perceptron (MLP) and Deep Neural Network (DNN) are used to capture complex interactions between Web applications and their component services, and to predict service performance based on historical QoS [22]. They are used to characterize the complex relationships between Mashups and services [31]. DNN are integrated with the Matrix Factorization to improve it by obtaining the Factorization Machine [34, 38].
Recurrent Neural Networks (RNN, GRU, LSTM) allows to capture and extract the functional, contextual or temporal characteristics of services to predict QoS [3, 49]. Recurrent Factorization Machine provides users with personalized services that use sequential historical records [59].
The Convolutional Neural Network (CNN) and Graph Neural Network (GNN) learn the neighbor’s characteristics, both user side and service side, form a feature matrix and infer the characteristics of the target user or the target service [44, 71]. CNNs and GNNs are used to learn high-order local and global feature interactions [67].
The main strengths of the Deep Learning techniques use for recommending Web services drawn from this study are: Extracting features from service descriptions in order to generate service representations and composite queries using the Content-Based filtering strategy. Extracting latent factors, from non-functional properties of services, for providing more accurate predictions using the Collaborative Filtering strategy. Improving the technique of Matrix Factorization thanks to their non-linearity property especially that the services’ interactions are complex. Extracting non-functional auxiliary information, temporal or geographical for example, and use it in the hybrid recommendation process. Increasing considerably the performances of Web services recommendation systems by the hybridization of DL techniques [30, 63].
This subsection responds to RQ4 which aims to identify potential future research opportunities for Deep Learning-based Web service recommender systems. The future directions of research that we identified in the reviewed studies are as follows: Exploring DL techniques which were not used in DL based Web services recommendation studies such as restricted Boltzmann machines (RBM), Adversarial Networks (AN) . . . etc. Exploring more intensively DL techniques’ hybridization which were less investigated in comparison with recommendation strategies’ and auxiliary information’ hybridizations. Incorporating auxiliary information, as non-functional properties of the service or the Mashup, has enormously improved the recommender systems performance. This is always an interesting field of research. Using temporal and local characteristics to promote QoS prediction in a dynamic environment. Using attention mechanisms to allow efficient extraction of functional, non-functional and contextual features. Using online interactive recommendation during a session to generate follow-up recommendations based on user feedback. This will improve Mashups creation [79]. Integrating images in so-called multimodal knowledge graphs. Alleviating noise problem in GNNs.
Conclusion and perspectives
In recent years, Deep Learning and recommender systems have become cutting-edge research topics which are constantly evolving. At the same time, the development of service-based systems (such as Mashups) has become increasingly popular last years. Very recently, several researchers tend to integrate some Deep Learning techniques into Web service recommender systems. Indeed, these approaches improve the selection, composition and substitution of services, and provide solutions to the challenges faced by Recommender Systems in this area.
In this context, this survey provides a comprehensive review of existing studies on Deep Learning-based Web service recommender systems. The existing publications are classified according to several criteria, namely recommendation strategies, functional and non-functional properties of Web services, DL techniques, used dataset and performance metrics. The various problems encountered by recommender systems in Web services field are inventoried as well as the solutions provided by Deep Learning techniques to these problems.
The shortcomings in the use of deep methods in the Web services field are discussed to guide future works, and help researchers wishing to explore this field. Indeed, Deep Learning techniques have not all been applied in the Web services recommendation, as well as their hybridization. This constitutes interesting research topics to solve the challenges that still persist such as accuracy, scalability, users’ privacy and security.
Footnotes
Acknowledgment
The authors would like to thank the DGRSDT (General Directorate of Scientific Research and Technological Development) - MESRS (Ministry of Higher Education and Scientific Research), ALGERIA, for the financial support of LISCO Laboratory and LABGED Laboratory.
Declarations
Ethical approval
Not applicable.
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Authors’ contributions
Karima MECHERI had the idea for the article, performed the literature research and data analysis and wrote the manuscript. Sihem KLAI and Labiba SOUICI-MESLATI critically revised the work.
Funding
The authors did not receive support from any organization for the submitted work.
Availability of data and materials
The public datasets, used by the studies analyzed in this review article, are cited in the reference list and their URLs are mentioned in the manuscript.
