Data-driven product ranking: A hybrid ranking approach

Abstract

The sudden COVID-19 epidemic has caused consumers to gradually switch to online shopping, the increasing number of online consumer reviews (OCR) on Web 2.0 sites has made it difficult for consumers and merchants to make decisions by analyzing OCR. Much of the current literature on ranking products based on OCR ignores neutral reviews in OCR, evaluates mostly given criteria and ignores consumers’ own purchasing preferences, or ranks based on star ratings alone. This study aims to propose a new decision support framework for the evaluation and selection of alternative products based on OCR. The decision support framework mainly includes three parts: 1) Data preprocessing: using Python to capture online consumer comments for data cleaning and preprocessing, and extracting key features as evaluation criteria; 2) Sentiment analysis: using Naive Bayes to analyze the sentiment of OCR, and using intuitionistic fuzzy sets to describe the emotion score; 3) Benchmark analysis: a new IFMBWM-DEA model considering the preference of decision makers is proposed to calculate the efficiency score of alternative schemes and rank them according to the efficiency score. Then, the OCR of 15 laptops crawled from JD.com platform is used to prove the usefulness and applicability of the proposed decision support framework in two aspects: on the one hand, the comparison of whether the preference of decision makers is considered, and on the other hand, the comparison with the existing ranking methods. The comparison also proves that the proposed method is more realistic, the recommendations are more scientific and the complexity of the decision is reduced.

Keywords

Intuitionistic fuzzy multiplicative best-worst method (IFMBWM)data envelopment analysis (DEA)online consumer reviews (OCR)product ranking sentiment analysis

1 Introduction

Since the COVID-19 epidemic, the epidemic lockdown and isolation measures have driven consumers into the internet market and accelerated the digital transformation of e-commerce. The share of e-commerce in global retail trade increased from 14% in 2019 to about 17% in 2020. People’s shopping destinations have also gradually shifted from offline physical stores to online consumption platforms. However, due to the virtual nature of the network, it is difficult to guarantee the quality of goods. With the support of Web 2.0 technology, the emergence of a large number of OCRs on e-commerce platforms [1]. Consumers are more likely to use OCR to make a purchase decision that best suits their preferences than product information provided by the merchant [2, 3]. By using OCR, the instability of online shopping can be reduced, which also has high reference value for consumers to make purchase decisions. Enterprises also receive genuine feedback from consumers via OCR, utilize various types of available data to better understand consumer preferences and demands [4], and formulate corresponding measures to improve brand value [5, 6].

Despite the fact that many websites now give quantitative criteria for product assessment, understanding things through OCR is still suggested because to the subjectivity and unpredictability of customer evaluations. According to the 48th statistical report on China’s Internet Development released by China Internet Network Information Center (CNNIC) 1 in Beijing in August 2021, in the first half of 2021 alone, China’s online retail sales have reached 6113.3 billion yuan. The explosive growth of big data has also brought unprecedented challenges to OCR’s data mining. Because of the vast number of OCRs updated every day, customers spend a lot of time reading, this makes it difficult for them to effectively use OCR to make purchase decisions [7]. How quickly and effectively information can be extracted from OCR plays an important role in the decision making of consumers and merchants [8]. As a result, it is vital to develop a method to assist consumers in making purchasing decisions using OCR [9]. Through literature review, it is found that the research on ranking products by OCR generally includes two parts: 1) Analyzing and studying OCR, 2) Product ranking methods.

The first part is the process of analyzing and studying OCR, namely Sentiment Analysis (SA). As OCR is unstructured textual data, it is difficult to use it directly for decision analysis, so it is important to transform OCR into data that can be used for analysis [10]. The SA of OCR text is a process of analyzing, processing, summarizing and reasoning subjective text with emotional color, aiming at extracting structured opinions from unstructured text [11]. SA methods are mainly divided into sentiment dictionary-based SA methods [8 , 11–14] and machine learning-based sentiment analysis methods [15 –17]. However, sentiment dictionary-based SA methods cannot consider sentiment words in a context-specific setting and require constant updating of the sentiment dictionary to improve accuracy [18]. Therefore, in this paper, we use the plain Bayesian (NB) method of machine learning-based SA methods for OCR sentiment analysis, which is not only simple, but also fast, accurate and reliable [19]. The most important research content of SA is to judge the polarity of emotions, that is, to judge whether the views in the text are positive, negative or neutral [20]. However, in the existing studies, neutral emotions are often ignored [16 , 21–27], which will lead to a lack of valuable information in the decision-making process. In fact, consumers’ neutral comments indicate that they are hesitant and uncertain about the product, and this information cannot be ignored [28]. The results of SA are usually expressed in the form of real numbers, triangular fuzzy numbers, intuitionistic fuzzy numbers and interval intuitionistic fuzzy numbers. In order to comprehensively consider the positive, neutral and negative emotions in OCR, nothing is more appropriate than Intuitive Fuzzy Set (IFS) theory, IFS can describe and characterize the fuzzy nature of the objective world more precisely [29].

The second section deals with product ranking methods. Commonly used ranking methods are TOPSIS, VIKOR, TODIM, PROMETHEE II. and other integrated product ranking methods [9]. Table 1 shows the main recent studies that have used OCR to rank products. Despite the significant impact of these studies, gaps in the research can be expressed as follows: some of the literature on ranking products based on OCR is based on star ratings alone [30, 31]; review criteria are given by experts based on previous experience; most rank products based on previous OCR and then make recommendations to consumers, ignoring consumers’ own purchase preferences; and the decision-making process that partially takes into account consumer preferences is again too complex and not easy to operate.

Table 1
References for ranking products based on OCR

Authors ’names Year Rank method Evaluation criteria are extracted from OCR DM’s preference Sentent polarity Case study

Pos Neu Neg

Alrababah et al. [27] 2017 TOPSIS VIKOR √ √ √ √ Electronic products

Liu et al. [41] 2017 IFWA PROMETHEE II √ √ √ √ SUV

G. Kumar et al. [24] 2018 WSM TOPSIS √ √ √ Mobile phones

N. Gobi et al. [42] 2018 Fuzzy TOPSIS √ √ √ Mobile phone and camera

Zhang et al. [43] 2018 IF-TODIM √ √ √ √ Mobile phone

Liu et al. [44] 2019 PL-TODIM √ √ √ √ SUV

Sedef Çalı et al. [9] 2019 IF-ELECTRE VIKOR √ √ √ Hotel

H. Sharma et al. [45] 2019 TOPSIS √ √ √ Hotel

Zhang et al. [43] 2020 IF-TODIM √ √ √ Mobile phone

G. Kumar et al. [21] 2020 AHP TOPSIS √ √ √ Smartphones

Zhang et al. [22] 2020 VIKOR √ √ Automobiles

Heidary Dahooie J et al. [46] 2021 MCDM √ √ √ √ Smartphones

A. Ahani et al. [10] 2021 MADM √ √ √ √ Mobile phone

Zhang et al. [47] 2022 IF-TODIM √ √ √ √ √ Smartphones

Qin et al. [48] 2022 IHF-TOPSIS √ √ √ √ Tourist Attractions

Bi et al. [49] 2022 MCDM √ √ √ √ Hotel

Tao et al. [50] 2022 MCDM √ √ √ √ Hotel

A. Darko et al. [51] 2022 PL-ELECTRE I √ √ √ √ Mobile payment

This article 2022 IFMBWM-DEA √ √ √ √ √ Laptops

Authors ’names	Year	Rank method	Evaluation criteria are extracted from OCR	DM’s preference	Sentent polarity	Case study
Alrababah et al. [27]	2017	TOPSIS VIKOR	√	√	√		√	Electronic products
Liu et al. [41]	2017	IFWA PROMETHEE II		√	√	√	√	SUV
G. Kumar et al. [24]	2018	WSM TOPSIS	√		√		√	Mobile phones
N. Gobi et al. [42]	2018	Fuzzy TOPSIS			√	√	√	Mobile phone and camera
Zhang et al. [43]	2018	IF-TODIM		√	√	√	√	Mobile phone
Liu et al. [44]	2019	PL-TODIM		√	√	√	√	SUV
Sedef Çalı et al. [9]	2019	IF-ELECTRE VIKOR			√	√	√	Hotel
H. Sharma et al. [45]	2019	TOPSIS			√	√	√	Hotel
Zhang et al. [43]	2020	IF-TODIM			√	√	√	Mobile phone
G. Kumar et al. [21]	2020	AHP TOPSIS	√		√		√	Smartphones
Zhang et al. [22]	2020	VIKOR		√			√	Automobiles
Heidary Dahooie J et al. [46]	2021	MCDM	√		√	√	√	Smartphones
A. Ahani et al. [10]	2021	MADM		√	√	√	√	Mobile phone
Zhang et al. [47]	2022	IF-TODIM	√	√	√	√	√	Smartphones
Qin et al. [48]	2022	IHF-TOPSIS		√	√	√	√	Tourist Attractions
Bi et al. [49]	2022	MCDM		√	√	√	√	Hotel
Tao et al. [50]	2022	MCDM		√	√	√	√	Hotel
A. Darko et al. [51]	2022	PL-ELECTRE I		√	√	√	√	Mobile payment
This article	2022	IFMBWM-DEA	√	√	√	√	√	Laptops

Data envelopment analysis (DEA) is a non-parametric method for measuring the efficiency of decision-making units (DMUs). DEA is a relatively common method for ranking DMUs or alternatives and has become a popular decision-making technique [32, 33]. There are also studies combining DEA and PROMETHEE to rank DMUs [34] and combining DEA and VIKOR to rank suppliers [35]. However, these studies were conducted in a deterministic setting and due to the complexity of decision making, there is a need to extend the DEA decision making approach in an uncertain setting, Muhammad Akram et al. extended DEA to Fermatean fuzzy sets to solve multi-objective transport problems [36]. Traditional DEA models are unable to take into account the uncertainty of input and output data and have no restrictions on the weights of inputs and outputs, which makes it impossible to distinguish and compare effective DMUs [37]. Researchers have proposed four approaches including general weight, weight restriction, ensuring area and taper ratio [38]. Weight restriction is a powerful method for incorporating decision maker preferences into DEA, and Hu et al. propose a method for ranking units by mixing AHP and DEA in a fuzzy setting [39]. The best-worst method (BWM) has a lower number of pairwise comparisons than AHP and yields more consistent and reliable final weights [40]. There are also few studies that have applied DEA to ranking products based on OCR.

In view of the above problems, this paper develops a decision support framework in an intuitionistic fuzzy environment and proposes a new method for ranking alternative products based on OCR that takes into account the preferences of decision makers. The method is easy to operate and provides reasonable ranking results to maximize the reference value of OCR. The main contributions of this study are as follows:

Extracting key attributes from OCR as evaluation indicators is more objective and in line with the actual situation;

Considering positive, neutral and negative emotional orientations, IFS are used to express the results of SA;

A new IFMBWM-DEA model is proposed, which takes into account the preferences of decision makers and reduces the flexibility of weight;

Through numerical experiments, the effectiveness and applicability of the proposed model are verified by using 101405 online reviews of 15 laptops captured from JD.com.

The rest of this article is as follows. Section 2 is the preliminary knowledge. Section 3 describes the decision support framework proposed in this paper. Section 4 is a numerical experiment, a case study of ranking 15 kinds of notebook computers. Section 5 is a summary, emphasizing the main contributions of this paper, the limitations of the research and future work.

renewcommand theequation thesection.arabic equation

2 Preliminaries

2.1 Concepts of IFS and IFMPR

Definition 2.1. [52] Let X be a nonempty set, we call A = {〈 x, μ_A (x) , ν_A (x) 〉 |x ∈ X} the intuitionistic fuzzy set (IFSs), where μ_A (x) and ν_A (x) are the membership and non-membership functions of the IFS A on X, respectively. $\begin{matrix} μ_{A} : X \to [0, 1], x \in X \to μ_{A} (x) \in [0, 1] \\ ν_{A} : X \to [0, 1], x \in X \to ν_{A} (x) \in [0, 1] \end{matrix}$

And satisfying that 0 ⩽ μ_A (x) + ν_A (x) ⩽1, x ∈ X, π_A (x) =1 - μ_A (x) - ν_A (x) , x ∈ X, which denotes the hesitancy or uncertainty of the element x ∈ X to the set A. The ordered pair (μ_A (x) , ν_A (x)) is called an intuitionistic fuzzy value (IFV).

Definition 2.2. [53] For an IFN α = (μ_α, ν_α), we call s (α) = μ_α - ν_α the score function of α, and h (α) = μ_α + ν_α the accuracy function of α. To compare two IFNs α₁ = (μ_α₁, ν_α₁) and α₂ = (μ_{α
₂}, ν_α₂), the following laws can be given:

$\begin{matrix} (1) If s (α_{1}) > s (α_{2}), then α_{1} > α_{2}; \\ (2) If s (α_{1}) = s (α_{2}), then \\ (a) If h (α_{1}) > h (α_{2}), then α_{1} > α_{2}; \\ (b) If h (α_{1}) = h (α_{2}), then α_{1} = α_{2} . \end{matrix}$

Definition 2.3. [54] Let X is a fixed set, and the intuitionistic fuzzy multiplicative set (IMS) is defined as: $D = (< x, ρ_{D} (x), σ_{D} (x) > | x \in X)$

Where (ρ_D (x) , σ_D (x)) is the intuitionistic multipliers number (IMN), ρ_D (x) and σ_D (x) are the membership and non-membership of each element x respectively, and satisfy 1/9 ⩽ ρ_D (x) , σ_D (x) ⩽9, ρ_D (x) σ_D (x) ⩽ 1, ∀ x ∈ X, ρ_D (x) and σ_D (x) are represented by Saaty’s scale (see Table 1).

Definition 2.4. [55] For the IFMN α = (ρ_α, σ_α), we call it a score function about s (α) = ρ_α/σ_α, which is an exact function about h (α) = ρ_ασ_α. To compare two IFMNs α₁ = (ρ_α₁, σ_α₁) , α₂ = (ρ_α₂, σ_α₂), the following comparison rules for intuitionistic fuzzy numbers can be used:

The following operation rules are also met:

$\begin{matrix} α_{1} \oplus α_{2} = (ρ_{α_{1}}, σ_{α_{1}}) \oplus (ρ_{α_{2}}, σ_{α_{2}}) \\ = (\frac{(1 + 2 ρ_{α_{1}}) (1 + 2 ρ_{α_{2}}) - 1}{2}, \\ \frac{2 σ_{α_{1}} σ_{α_{2}}}{(2 + 2 σ_{α_{1}}) (2 + 2 σ_{α_{2}}) - σ_{α_{1}} σ_{α_{2}}}) \\ α_{1} \otimes α_{2} = (ρ_{α_{1}}, σ_{α_{1}}) \otimes (ρ_{α_{2}}, σ_{α_{2}}) \\ = (\frac{2 ρ_{α_{1}} ρ_{α_{2}}}{(2 + 2 ρ_{α_{1}}) (2 + 2 ρ_{α_{2}}) - ρ_{α_{1}} ρ_{α_{2}}}, \\ \frac{(1 + 2 σ_{α_{1}}) (1 + 2 σ_{α_{2}}) - 1}{2}) \end{matrix}$

Definition 2.5. [55] If X = {x₁, x₂, ⋯ , x_n} is an alternative set, the preference relation of intuitionistic fuzzy multiplicative consistency (IFMPR) can be expressed as A = (α_ij) _n×n ⊂ X × X. α_ij = (ρ_{α_ij}, σ_{α_ij}) (i, j = 1, 2, ⋯ , n) is the IFMNs, ρ_{α_ij} represents the preference degree of alternative x_i over x_j, and σ_{α_ij} indicates the degree of preference of the alternative x_i overx_j. ρ_{α_ij} and σ_{α_ij} satisfy ρ_{α_ij} = σ_{α_ji}, σ_{α_ij} = ρ_{α_ji}, ρ_{α_ii} = σ_{α_ii} = 1, for all i, j = 1, 2, ⋯ , n. For convenience, (ρ_{α_ij}, σ_{α_ij}) will be expressed as (ρ_ij, σ_ij) in the rest of this article.

2.2 Summary of IFMBW [56]

The steps of Intuitionistic fuzzy multiplicative best-worst method (IFMBWM) are described as follows:

Step 1: Determine a set of decision criteria C = {C₁, C₂, . . . , C_m}, and the decision maker compares the criteria set by using Saaty’s scale (Table 2) to obtain the preference relationship IFMPR matrix A = (α_ij) _n×n of Intuitive Fuzzy multiplication consistency.

Table 2
Saaty’s scale 1–9

1–9 scale Meanings

1/9 Extremely not preferred

1/7 Very strongly not preferred

1/5 Strongly not preferred

1/3 Moderately not preferred

1 Equally preferred

3 Moderately preferred

5 Strongly preferred

7 Very strongly preferred

9 Extremely preferred

Other values between 1/9 and 9 Intermediate values used to present compromise

1–9 scale	Meanings
1/9	Extremely not preferred
1/7	Very strongly not preferred
1/5	Strongly not preferred
1/3	Moderately not preferred
1	Equally preferred
3	Moderately preferred
5	Strongly preferred
7	Very strongly preferred
9	Extremely preferred
Other values between 1/9 and 9	Intermediate values used to present compromise

$A = {(\begin{matrix} (ρ_{11}, σ_{11}) & (ρ_{12}, σ_{12}) & \dots & (ρ_{1 n}, σ_{1 n}) \\ (ρ_{21}, σ_{21}) & (ρ_{22}, σ_{22}) & \dots & (ρ_{2 n}, σ_{2 n}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ (ρ_{n 1}, σ_{n 1}) & (ρ_{n 2}, σ_{n 2}) & \dots & (ρ_{nn}, σ_{nn}) \end{matrix})}_{n \times n}$

Step 2: Sort the standards according to IFMPR to determine the best standard C_B and the worst standard C_W, C_B = C₁ ≻ C₂ ≻ ⋯ ≻ C_n = C_W.

Step 3: Suppose that the optimal weight vector $W^{*} = w_{1}^{*}, w_{2}^{*}, \dots, w_{n}^{* T}$ , $w_{j}^{*} = (τ_{j}^{*}, υ_{j}^{*}) (j \in N)$ is IFNs, $τ_{j}^{*}$ can be solved by minimizing the sum of the maximum absolute differences |τ_best/τ_j - ρ_best,j| and |τ_j/τ_worst - ρ_j,worst|, $υ_{j}^{*}$ can be solved by minimizing the sum of the maximum absolute differences |υ_best/υ_j - σ_best,j| and |υ_j/υ_worst - σ_j,worst|. The following mathematical models can be established: $\begin{matrix} min ξ \\ s . t . | τ_{best} / τ_{j} - ρ_{best, j} | ⩽ ξ, \\ | τ_{j} / τ_{worst} - ρ_{j, worst} | ⩽ ξ, \\ \sum_{j = 1}^{n} τ_{j} = 1, \\ τ_{1} ⩾ τ_{2} ⩾ \dots ⩾ τ_{n}, \\ ξ ⩾ 0, τ_{j} ⩾ 0, for all j \in N . \end{matrix}$ (2.1) $\begin{matrix} min ζ \\ s . t . \\ | υ_{best} / υ_{j} - σ_{best, j} | ⩽ ζ, \\ | υ_{j} / υ_{worst} - σ_{j, worst} | ⩽ ζ, \\ \sum_{j = 1}^{n} υ_{j} = 1, \\ υ_{1} ⩽ υ_{2} ⩽ \dots ⩽ υ_{n}, \\ ζ ⩾ 0, υ_{j} ⩾ 0, for all j \in N . \end{matrix}$ (2.2)

The standard intuitionistic fuzzy multiplication optimal weight vector can be easily obtained by solving models 2.1 and 2.2: $W^{*} = w_{1}^{*}, \dots, w_{n}^{* T} = ((τ_{1}^{*}, υ_{1}^{*}), \dots (τ_{n}^{*}, υ_{n}^{*}))^{T}$

2.3 Concepts related to DEA

DEA is a typical nonparametric linear programming performance evaluation model. The concept of single input and single output engineering efficiency is extended to the effectiveness evaluation of the same DMU with multiple inputs and outputs, which greatly enriches the production function theory and its application technology in microeconomics. At the same time, subjective factors are avoided, algorithms are simplified the advantages of reducing errors cannot be underestimated. To judge whether the decision-making unit is DEA effective is essentially to judge whether the decision-making unit falls on the production frontier of the production possibility set. The production frontier is the Pareto surface of linear multi-objective programming, which is composed of the effective part of data envelopment surface. Generally, DEA models are divided into output oriented and input-oriented models. The output-oriented DEA model maximizes the output of a given number of input factors, and the output-oriented DEA model minimizes the input factors required for a given output level. Suppose there are n DMUs to be evaluated, and each DMU has m input and n output. In more precise form, the input-oriented DEA-CCR model is as follows [57]: $\begin{matrix} θ_{o}^{CCR} = max \sum_{i = m + 1}^{m + s} w_{i} y_{io} \\ s . t : \sum_{i = m + 1}^{m + s} w_{i} y_{ij} - \sum_{i = 1}^{m} w_{i} x_{ij} ⩽ 0, j = 1, \dots, n, \end{matrix}$ $\begin{matrix} \sum_{i = 1}^{m} w_{i} x_{io} = 1, \\ w_{i} ⩾ 0, i = 1, \dots, m + s . \end{matrix}$ (2.3)

Where, x_ij, y_ij respectively represents the i^th input and r^th output of the evaluated j^th DMU, and w₁, ⋯ , w_m, w_m+1, ⋯ , w_m+s respectively represents the weight of the input and output. Moreover, x_ij, y_ij are known parameters and w_i are unknown variables. The efficiency score of the DMU to be evaluated is expressed as $θ_{o}^{CCR}$ , when the efficiency value is equal to 1, the DMU is effective; When the efficiency value is less than 1, the DMU is weakly efficient.

renewcommand theequation thesection.arabic equation

3 Decision-support framework

This study introduces a comprehensive decision-support framework in an intuitionistic fuzzy environment to rank alternative merchandises through online consumer reviews for decision makers to make better purchasing decisions.

The decision-support framework proposed in this paper is divided into three predominant components, including data pre-processing, sentiment analysis (SA) and benchmark analysis, of which are shown in Fig. 1. The process and essential role of each component will be depicted in detail below.

Fig. 1

Decision-support framework for merchandises evaluation.

1) Data pre-processing:

Online consumer reviews of merchandises were crawled from the Jindong platform using python, data cleaning and pre-processing were performed on the crawled data, and then key features were extracted as evaluation indicators for judging the ranking of alternative merchandises.

2) Sentiment analysis (SA):

The cleaned online review sentences were clustered into the extracted key attribute groups, and then sentiment analysis of online reviews was conducted based on Naive Bayesian (NB), using intuitionistic fuzzy sets to describe the positive, neutral and negative sentiment scores of online consumer reviews, and constructing an intuitionistic fuzzy decision matrix based on the key features.

3) Benchmark analysis:

The proposed IFMBWM-DEA model is used to calculate the efficiency scores of the alternative solutions and then rank the alternative products according to their efficiency scores.

3.1 Data pre-processing

3.1.1 Crawl online reviews

Although many online shopping websites additionally supply quantitative standards such as positive rating and star rating about products (Fig. 2 shows an example of a product review crawled from the Jindong platform), it is recommended to recognize the specifics of merchandise based totally on online consumer reviews due to the subjective and variable nature of their quantification leading to polarized opinions. This paper uses python to crawl online consumer review data from JD.COM (www.jd.com) for 15 laptops of different brands (Dell, Acer, Huawei, HP and Lenovo) and different price points (below 5000, 5000 to 10000, noted as A = {A₁, A₂, . . . , A_n}, to obtain good, medium and bad reviews respectively 100 pages (Jindong has a restriction that only the first 100 pages can be viewed at most, less than that all are obtained). The crawled online comment data is stored in xls format for easy import of data by computer programs.

Fig. 2

Example of a product review crawled from JD.COM.

3.1.2 Pre-processing of crawled data

The data pre-processing is the first task of sentiment analysis, and is the subsequent step in textual content mining, which directly affects the accuracy of subsequent feature extraction and sentiment analysis. The following are the steps of data pre-processing.

Data cleaning: this is essentially a massive re-examination and verification of data, removing duplicate information and correcting incorrect data to ensure consistency and improve data availability.

data, etc., to make sure consistency and increase the availability of data.

Tagging: Using the jieba Chinese splitting program (A statistically based approach to phrase splitting, with support for manually adding relevant specialist words to improve the lexicon and enhance the quality of the split), precise slice and dice of sentences throughout the text using exact patterns.

Deactivation: Deactivators are words that don’t convey any meaning and are useless for text mining, and commonly encompass conjunctions, prepositions, pronouns, punctuation, logical characters and special characters. In this paper, we refer to the HIT deactivation lexicon to effectively remove invalid words from the text and reduce the noise data generated throughout the word separation process.

Lexical annotation: The nature of words in online reviews is determined as well as annotated in conjunction with the contextual background, classifying words as adjectives, nouns, verbs, etc. For example, the Chinese sentence “the laptop runs very smoothly” is split and labelled as “laptop/n, runs/v, very/d, smooth/a”, where “n”, “v”, “d” and “a” are expressed as “noun”, “verb”, “adverb” and “adjective”. Correct lexical labelling will bring a great deal of convenience to the subsequent work. The nouns will give an idea of the product features that consumers are concerned about, and the adjectives will give an idea of the user’s emotional inclination towards the product.

3.1.3 Extraction of key features

The purpose of this step is to identify key features of alternative merchandises based on online consumer reviews (each OCR reflects a different consumer preference). Term Frequency-Inverse Document Frequency (TF-IDF) provides the statistical information needed to assess the importance of words based on document and text frequency The required statistical information is provided, so candidate keywords in online reviews are selected by calculating the TF-IDF value corresponding to each review. Using word2vec, the word vector weights were trained, after which the top 200 words with the highest frequency were obtained by splitting the words and removing those that didn’t meet the criteria. The word vectors of the acquired words were then clustered using the K-means clustering algorithm (Algorithm 1) to extract the key features and create normalized labels.

3.2 Sentiment analysis (SA)

As Web 2.0 continues to evolve, the number of OCR has increased dramatically. the impact of OCRs on purchase decisions between consumers is also increasing, and researchers are increasingly using SA to aid their decision-making process. SA is the fine-grained study of people’s opinions, sentiments, emotions and attitudes, and context mining of text is used to identify and extract subjective information from textual data. The aim of this section is to calculate a sentiment score for each feature of each product; note that here the sentiment score is denoted as IFV. Most current research on online reviews only considers positive and negative reviews, and there is a lack of research on neutral reviews, which can lead to a loss of information for decision making. This is because neutral OCRs can also be used to avoid retaliation from sellers, but are not necessarily objective. Therefore, this paper considers a combination of positive, neutral and negative emotions in OCR, enabling consumers to make faster and more accurate purchase decisions; helping merchants to further upgrade their products.

Algorithm 1: K-means algorithm to extract key features
1. Input: Sample set of crawled word vectors D = {x₁, x₂, . . . , x_n}, k is the number of clusters.
2. Output: Cluster division (key features)
3. C = {C₁, C₂, . . . , C_k}.
4. Select samples randomly from D as the initial mean vector {μ₁, μ₂, . . . , μ_k};
5. Let C_i = ∅ (i = 1, 2, . . . , k);
6. fori = 1, 2, . . . , n, do
7. Calculate the distance between x_i and μ_j (j = 1, 2, . . . , k): $d_{ij} = {∥ x_{i} - μ_{j} ∥}_{2}^{2}$ ;
8. Cluster labeling of x_i based on the closest mean vector: $λ_{i} = arg min_{i \in {1, 2, . . ., k}} d_{ij}$ ;
9. Assign sample x_i to the appropriate cluster: C_{λ _i} = C_{λ _i} ∪ {x_i};
10. end for
11. fork = 1, 2, . . . , K, do
12. Calculate the new mean vector: $μ_{j} = \frac{1}{\| C_{j} \|} \sum_{x \in C_{j}} x$ ;
13. if $μ_{i}^{'} \neq μ_{i}$ , then
14. Update the current mean vector μ_i to $μ_{i}^{'}$ ;
15. else
16. Keep the current average value constant;
17. end if
18. end for
19. Until none of the current mean vectors have been updated;
20. returnC = {C₁, C₂, . . . , C_k};

This section contains three parts, 1. Clustering of OCR based on key attribute; 2. Sentiment analysis of OCR using Naive Bayes; 3. Building an IF decision matrix based on key attribute.

3.2.1 Clustering of OCR based on key attribute

The pattern matching function “match” in the R programminglanguage (https://stat.ethz.ch/R-manual/R-devel/library/base/html/match.html)was applied to the clustered sentences and the OCR were clustered according to key feature sub-products using Algorithm 2.

Algorithm 2: Clustering of online reviews by product
Index, Set and Parameter:
1. i: Index of sentences in comments (i = 1, 2, . . . , I)
2. j: Product Index (j = 1, 2, . . . , J)
3. k: Key Features Index (k = 1, 2, . . . , K)
4. x_i: i^th word vector extracted
5. C_k: k^th key features
6. S_ij: Sentence about the j^th product in the i^th review
7. F_kj: The k^th key feature of the j^th product
8. Method:
9. Whilej ⩽ Jdo {
10. Whilek ⩽ Kdo {
11. k Apply the “match” function to detect if C_k exists in S_ij;
12. Classify S_ij into F_kj, if C_k matches one of the lexicons in S_ij;
13. }
14. }

3.2.2 Sentiment analysis of OCR using Naive Bayes

Naive Bayes (NB) is a machine learning method that belongs to supervised learning. Weighted Naive Bayes is an extension of NB in which attributes have different weights. The process of sentiment analysis consists of first transforming the vector, then training the classifier, dividing the dataset into a training set and a test set in an 80% :20% ratio in order to prevent overfitting of the model, and finally performing predictive classification. In this paper, the pysenti library (which utilizes weighted NB) is used to assign weights to the sentiment polarity of each sentiment word in conjunction with the sentence structure, and then weighted summation is used to obtain the sentiment polarity score of the text.

NB probabilistic classification technique based on Bayes’ theorem assumes that the attributes are independent of each other and don’t interfere with each other. The prior probability of the data is calculated using the training set text with category labels, and then the probability of the test set text belonging to a particular category is found based on Bayes’ theorem, as shown in the following formula. $P (C_{i} | X) = \frac{P (X | C_{i}) P (C_{i})}{P (X)}$ (3.1)

Where C_i (i = 1, 2, . . . , m) indicates that the data are classified into categories, which in this paper are set at three categories, positive, neutral and negative;X denotes the set of attributes, with X = {x₁, x₂, . . . , x_n}, indicating a total of one attribute, which corresponds to the number of feature words in the text data; if P (C_i|X) = max {P (C_j|X)} (i ≠ j), the category of the unknown sample is judged to be category C_i. P (C_i) represents the probability of occurrence of C_i category in the training set, which can be obtained by calculating the proportion of the number of C_i category N_i to the total sample size N, P (C_i) = N_i/N. P (X|C_i) can be calculated from the prior probability of occurrence of each attribute under a category in the training text, as different attributes are assumed to be independent of each other in Naive Bayesian, thus P (X|C_i) expressed as $P (X | C_{i}) = \prod_{k = 1}^{n} P (x_{k} | C_{i})$ .The denominator P (X) is fixed for both x_k, $P (X) = \prod_{k = 1}^{n} P (x_{k})$ .

Since the above equation has the same denominator $P (X) = \prod_{k = 1}^{n} P (x_{k})$ , P (X) is thought of as a normalization factor, so simplifying the above equation yields a simplified Naive Bayesian classifier. Let V_nb (C) be the maximum category after Naive Bayesian-based categorization, defined as follows: $V_{nb} (C) = arg max_{x_{k}} P (C_{i}) \prod_{k = 1}^{n} P (x_{k} | C_{i})$ (3.2)

Since the assumption of conditional independence rarely holds in reality, an extension of the Naive Bayes is needed to relax the conditional independence assumption, one way of doing this is to weight the attributes differently and the resulting model is called a weighted Naive Bayes. The definition of a Weighted Naive Bayes (WNB) is as follows. $V_{wnb} (C) = arg max_{x_{k}} P (C_{i}) \prod_{k = 1}^{n} P (x_{k} | C_{i})^{w_{i}}$ (3.3) where V_wnb (C) is the maximum category after weighted NB categorization, w_i is the weight of the attribute X_i.

3.2.3 Building an IF decision matrix based on key attributes

Let C = {C₁, C₂, . . . , C_n} be a set of extracted key attributes and A = {A₁, A₂, . . . , A_m} be the set of alternative merchandises. Since IFSs can be used to represent the uncertainty of information related to the emotions expressed (satisfaction, dissatisfaction and hesitation), so that the data from section 4.2.2, after classification according to a Weighted Naive Bayesian categorization, is used to represent the performance scores of the different attribute of the alternative merchandises in terms of the IFS α_ij = (μ_ij, ν_ij), where, μ_ij, ν_ij denote the affiliation and non- affiliation of the i^th merchandises in the A set based on the j^th attribute in the C set, and π_ij represents the hesitancy, π_ij = 1 - μ_ij - ν_ij. The formula to calculate these indicators can be as follows [9]. $u_{α_{ij}} = \frac{T_{ij}^{pos}}{T_{ij}^{pos} + T_{ij}^{neg} + T_{ij}^{neu}}$ (3.4) $ν_{α_{ij}} = \frac{T_{ij}^{neg}}{T_{ij}^{pos} + T_{ij}^{neg} + T_{ij}^{neu}}$ (3.5) $π_{α_{ij}} = \frac{T_{ij}^{hes}}{T_{ij}^{pos} + T_{ij}^{neg} + T_{ij}^{neu}}$ (3.6)

According to the characteristic of intuitionistic fuzzy sets, μ_ij, ν_ij ∈ [0, 1], and μ_ij + ν_ij + π_ij = 1, denote the number of reviews in the OCR, where $T_{ij}^{pos}, T_{ij}^{neg}, T_{ij}^{neu}$ denote the number of online consumer reviews with positive, negative and neutral sentiment orientation for j^th features of i^th alternative merchandises in the OCR, respectively. We take the performance score α_ij obtained according to the key characteristics as the output ${\tilde{y}}_{ij}$ of the DEA model. At the same time, because the DEA model requires that the decision-making unit must have input variables, all DMUs are given the same virtual input variables ${\tilde{x}}_{ij} = (0 . 5, 0.5)$ .

3.3 Benchmarking and product ranking

In the decision-making process in real life, it is very important to consider the preferences of decision makers. Decision makers’ preferences are usually hesitant and uncertain. Therefore, in order to reduce the flexibility of the traditional DEA model DMU in choosing weights in the intuitionistic fuzzy environment, this paper uses the method of weight restriction to incorporate decision makers’ preferences into the DEA model. A new integrated IFMBWM-DEA model is proposed, which comprehensively considers the positive, neutral and negative online comments of online consumers, making the ranking of alternatives more objective in the intuitionistic fuzzy environment.

Since the IFMBWM model (2.1) and model (2.2) are nonlinear minimum models, which may produce multiple optimal solutions [58], we consider the following corresponding linear models (3.7) and (3.8) of BWM. $\begin{matrix} min ξ \\ s . t . | τ_{best} - τ_{j} ρ_{best, j} | ⩽ ξ \\ | τ_{j} - τ_{worst} ρ_{j, worst} | ⩽ ξ \\ \sum_{j = 1}^{n} τ_{j} = 1 \\ τ_{1} ⩾ τ_{2} ⩾ \dots ⩾ τ_{n} \\ ξ ⩾ 0, τ_{j} ⩾ 0, for all j \in N \end{matrix}$ (3.7) $\begin{matrix} min ζ \\ s . t . | υ_{best} - υ_{j} σ_{best, j} | ⩽ ζ \\ | υ_{j} - υ_{worst} σ_{j, worst} | ⩽ ζ \\ \sum_{j = 1}^{n} υ_{j} = 1 \\ υ_{1} ⩽ υ_{2} ⩽ \dots ⩽ υ_{n} \\ ζ ⩾ 0, υ_{j} ⩾ 0, for all j \in N \end{matrix}$ (3.8)

The equality constraints $\sum_{i = 1}^{n} τ_{i} = 1$ , $\sum_{j = 1}^{n} υ_{j} = 1$ in IFMBWM model (3.7) and model (3.8) are different from those $\sum_{i = 1}^{m} ν_{i} x_{io} = 1$ in DEA-CCR model (2.3), so the following DEA model is considered: $\begin{matrix} max \sum_{i = m + 1}^{m + s} w_{i} y_{io} - θ_{o}^{CCR} \sum_{i = 1}^{m} w_{i} x_{io} \\ s . t . \sum_{i = m + 1}^{m + s} w_{i} y_{ij} - θ_{j}^{CCR} \sum_{i = 1}^{m} w_{i} x_{ij} ⩽ 0 \dots \\ j = 1, \dots, n, \\ \sum_{i = 1}^{m + s} w_{i} = 1, \\ w_{i} ⩾ 0, i = 1, \dots, m + s . \end{matrix}$ (3.9)

Where, $θ_{j}^{CCR}$ is the estimated efficiency score of the j^th DMU in the model (3.9). M. Zohrehbandian et al. have proved that the optimal solution of the model is the same, in which the constraint $\sum_{i = 1}^{m} w_{i} x_{io} = 1$ in model (3.3) is replaced by $\sum_{i = 1}^{m + s} ω_{i} = 1$ , which is similar to the weight normalization in IFMBWM [59]. Assuming that the optimal weight vector $W^{*} = (w_{1}^{*}, w_{2}^{*}, \dots w_{n}^{*})^{T}$ , $w_{i}^{*} = (τ_{i}^{*}, υ_{i}^{*}) (i \in N)$ is IFMNs, taking the membership degree as an example, the proposed IFMBWM-DEA model is as follows: $\begin{matrix} max f_{1} = \sum_{i = m + 1}^{m + s} τ_{i} y_{io} - θ_{o}^{CCR} \sum_{i = 1}^{m} τ_{i} x_{io} \\ max f_{2} = - ξ \\ s . t . \sum_{j = m + 1}^{m + s} τ_{i} y_{ij} - θ_{j}^{CCR} \sum_{i = 1}^{m} τ_{i} x_{ij} ⩽ 0 \dots \\ j = 1, \dots, n, \\ \sum_{i = 1}^{m + s} τ_{i} = 1, \\ | τ_{best} - τ_{i} ρ_{best, i} | ⩽ ξ, i = 1, \dots, m + s, \\ | τ_{i} - τ_{worst} ρ_{i, worst} | ⩽ ξ, i = 1, \dots, m + s, \\ τ_{1} ⩾ τ_{2} ⩾ \dots ⩾ τ_{n}, \\ ξ ⩾ 0, τ_{i} ⩾ 0, i = 1, \dots, m + s . \end{matrix}$ (3.10)

The first objective function and the first constraint belong to DEA model; The second objective function and the third and fourth constraints are about inputs in BWM; The third objective function and the fifth and sixth constraints belong to the output in BWM; Equality constraints $\sum_{i = 1}^{m + s} τ_{i} = 1$ belong to DEA and BWM. Since the model (3.10) is a multi-objective programming model, the parameter method and constraint ɛ>, Min max method and multi-objective programming model [60]. In this paper, the min max method is used to solve the model, which can be easily converted into a linear model, as shown below: $\begin{matrix} min α \\ s . t . f_{1}^{*} - (\sum_{i = m + 1}^{m + s} τ_{i} y_{io} - θ_{o}^{CCR} \sum_{i = 1}^{m} τ_{i} x_{io}) ⩽ α, \\ f_{2}^{*} - (- ξ) ⩽ α, \\ \sum_{j = m + 1}^{m + s} τ_{i} y_{ij} - θ_{j}^{CCR} \sum_{i = 1}^{m} τ_{i} x_{ij} ⩽ 0, j = 1, \dots, n, \\ \sum_{i = 1}^{m + s} τ_{i} = 1, \\ | τ_{best} - τ_{i} ρ_{best, i} | ⩽ ξ, i = 1, \dots, m + s, \\ | τ_{i} - τ_{worst} ρ_{i, worst} | ⩽ ξ, i = 1, \dots, m + s, \\ τ_{1} ⩾ τ_{2} ⩾ \dots ⩾ τ_{n}, \\ ξ, ψ ⩾ 0, τ_{i} ⩾ 0, i = 1, \dots, m + s . \end{matrix}$ (3.11)

By solving the model (3.11), we can get the first part of the optimal weight of intuitionistic fuzzy multiplication $(τ_{1}^{*}, \dots, τ_{m}^{*}, τ_{m + 1}^{*}, \dots, τ_{m + s}^{*})^{T}$ , the membership of the intuitionistic fuzzy multiplication weight about input and output. In the same way, we can solve the second part of the optimal weight of intuitionistic fuzzy multiplication $(υ_{1}^{*}, \dots, υ_{m}^{*}, υ_{m + 1}^{*}, \dots, υ_{m + s}^{*})^{T}$ , the non-membership part of the input and output intuitionistic fuzzy multiplication weights, so we can get the optimal weight of $W^{*} = (w_{1}^{*}, w_{2}^{*}, \dots w_{n}^{*})^{T} = ((τ_{1}^{*}, υ_{1}^{*}), (τ_{2}^{*}, υ_{2}^{*}), \dots, (τ_{n}^{*}, υ_{n}^{*}))^{T}$ . Then, the efficiency score of the j^th DMU is calculated as follows: $θ_{j}^{IFMBWM - DEA} = \frac{\sum_{i = m + 1}^{m + s} w_{i}^{*} \otimes {\tilde{y}}_{ij}}{\sum_{i = 1}^{m} w_{i}^{*} \otimes {\tilde{x}}_{ij}}$ (3.12)

The efficiency obtained from the solution is also an intuitionistic fuzzy number. According to definition 2.4, compare the efficiency and rank the products.

renewcommand theequation thesection.arabic equation

4 Case studies and results

We will apply the proposed decision support framework to a case study, using Python from JD.Com (www.jd.com) captured the online consumer review data of 15 notebook computers, including Dell Lingyue 5000, Dell’s magazine 5515, Dell’s magazine G15, Acer shadow knight, Acer Feifan S3, Acer predator, Huawei mate book 14s 2021, Huawei mate Book D15, Huawei mate Book x Pro 2021, HP shadow wizard, HP star 15, HP war 99, Lenovo Xiaoxin air 14 2021, Lenovo Savior y9000k 2021 Lenovo Savior y9000p, the set of replaceable goods is represented as, and Table 3 shows the 101405 data sets of crawled online consumer comments after data cleaning.

Table 3
The number of OCRs clustered into key attributes

DMU Number of sentences in OCR

C ₁ C ₂ C ₃ C ₄ C ₅ C ₆

A ₁ 3697 4872 636 452 5413 660

A ₂ 954 3783 797 130 1747 177

A ₃ 160 1148 136 53 315 16

A ₄ 1951 17440 5552 1508 6829 278

A ₅ 141 10170 399 1155 5925 174

A ₆ 65 3211 867 200 1178 29

A ₇ 2673 20073 1371 413 14852 927

A ₈ 2969 19355 1925 519 14678 1657

A ₉ 1127 11417 1417 1222 6989 384

A ₁₀ 2234 19397 4851 1522 7160 242

A ₁₁ 2806 14303 1214 1348 9132 1042

A ₁₂ 739 14980 2093 1134 8533 224

A ₁₃ 2966 14040 1343 1575 7094 1261

A ₁₄ 2517 10808 3362 587 3751 104

A ₁₅ 1880 10768 3790 608 4128 363

DMU	Number of sentences in OCR
A ₁	3697	4872	636	452	5413	660
A ₂	954	3783	797	130	1747	177
A ₃	160	1148	136	53	315	16
A ₄	1951	17440	5552	1508	6829	278
A ₅	141	10170	399	1155	5925	174
A ₆	65	3211	867	200	1178	29
A ₇	2673	20073	1371	413	14852	927
A ₈	2969	19355	1925	519	14678	1657
A ₉	1127	11417	1417	1222	6989	384
A ₁₀	2234	19397	4851	1522	7160	242
A ₁₁	2806	14303	1214	1348	9132	1042
A ₁₂	739	14980	2093	1134	8533	224
A ₁₃	2966	14040	1343	1575	7094	1261
A ₁₄	2517	10808	3362	587	3751	104
A ₁₅	1880	10768	3790	608	4128	363

4.1 Data preprocessing

Preprocess the online consumer comment data after data cleaning. After Chinese word segmentation, deactivation words and part of speech tagging, TF-IDF algorithm is used to extract candidate keywords in online consumer comments, and the TOP200 words with the highest frequency are reserved. After removing unqualified words, K-means clustering algorithm is used to calculate the best category attribution based on the similarity between points and create normalized labels, The K-means clustering diagram is shown in Fig. 1 below. It can be seen from Fig. 3 that when k = 6, the clustering effect is better. Therefore, words with similar meanings are classified into the same group, and a total of six key attributes are clustered, namely after-sales service, quality, logistics, price, appearance and gifts. The key attributes are shown in table.

Fig. 3

K-means clustering diagram when k = 6.

4.2 Sentiment analysis

The purpose of this section is to calculate the emotion score of key attributes of replaceable products, w- here the emotion score is expressed by IFVs. This paper considers both positive and negative online consumer comments, so that potential consumers can make purchase decisions more quickly and accurately; It is also very helpful for merchants to further upgrade their products and plan publicity. First, the “match” function in R language is used to cluster the online consumer comments of alternative products according to the key attributes. The results of the number of OCRs clustered into the key attributes are shown in Table 3.

Then the preprocessed word vector is transformed, the classifier is trained, and then the weighted naive Bayes is used to give weight to the emotional polarity of each emotional word in combination with the sentence structure, and then the weighted sum is used to obtain the emotional polarity score of online consumer comments. The polarity results of emotional analysis are shown in Table 4, where POS, neu and neg respectively represent the number of online consumer comments with positive, neutral and negative emotional polarity.

Table 4
The results of emotional polarity

Criteria Polarity Alternative products

A ₁ A ₂ A ₃ A ₄ A ₅ A ₆ A ₇ A ₈

C ₁ Pos 3218 730 25 1514 81 54 2246 2663

Neu 66 51 3 67 3 4 62 42

Neg 413 173 132 353 57 7 365 264

C ₂ Pos 22 3634 909 16250 9599 2712 19168 18671

Neu 202 8 10 126 45 32 30 84

Neg 546 141 229 1064 526 467 875 600

C ₃ Pos 8 736 118 5301 345 746 1234 1783

Neu 82 10 0 52 3 12 26 16

Neg 430 51 18 199 51 109 111 126

C ₄ Pos 0 126 46 1435 1104 172 378 505

Neu 22 0 0 10 6 6 0 4

Neg 5284 4 7 63 45 22 35 10

C ₅ Pos 10 1732 302 6689 5827 1080 14690 14490

Neu 119 4 1 25 24 10 19 22

Neg 530 11 12 115 74 88 143 160

C ₆ Pos 3 120 10 214 147 22 819 1450

Neu 127 16 0 4 3 0 0 27

Neg 22 41 6 60 24 7 108 180

Criteria Polarity Alternative products

A ₉ A ₁₀ A ₁₁ A ₁₂ A ₁₃ A ₁₄ A ₁₅

C ₁ Pos 352 899 1880 424 1907 2173 1458

Neu 0 35 67 7 41 92 60

Neg 775 1300 859 308 1081 252 362

C ₂ Pos 9413 16631 12476 13755 12414 10304 9587

Neu 244 188 149 35 61 40 172

Neg 1760 2578 1678 1190 1565 464 1009

C ₃ Pos 1098 4289 1025 1848 1064 3115 2975

Neu 0 44 24 35 17 56 67

Neg 319 518 165 210 262 191 748

C ₄ Pos 865 1281 1191 1029 1379 533 519

Neu 18 45 10 0 8 8 0

Neg 339 196 147 105 188 46 89

C ₅ Pos 6540 6814 8840 8309 6914 3705 3986

Neu 63 49 35 63 24 12 24

Neg 386 297 257 161 156 34 118

C ₆ Pos 197 152 757 98 841 72 278

Neu 18 2 31 0 26 4 2

Neg 169 88 254 126 394 28 83

Criteria	Polarity	Alternative products
C ₁	Pos	3218	730	25	1514	81	54	2246	2663
	Neu	66	51	3	67	3	4	62	42
	Neg	413	173	132	353	57	7	365	264
C ₂	Pos	22	3634	909	16250	9599	2712	19168	18671
	Neu	202	8	10	126	45	32	30	84
	Neg	546	141	229	1064	526	467	875	600
C ₃	Pos	8	736	118	5301	345	746	1234	1783
	Neu	82	10	0	52	3	12	26	16
	Neg	430	51	18	199	51	109	111	126
C ₄	Pos	0	126	46	1435	1104	172	378	505
	Neu	22	0	0	10	6	6	0	4
	Neg	5284	4	7	63	45	22	35	10
C ₅	Pos	10	1732	302	6689	5827	1080	14690	14490
	Neu	119	4	1	25	24	10	19	22
	Neg	530	11	12	115	74	88	143	160
C ₆	Pos	3	120	10	214	147	22	819	1450
	Neu	127	16	0	4	3	0	0	27
	Neg	22	41	6	60	24	7	108	180
Criteria	Polarity	Alternative products
		A ₉	A ₁₀	A ₁₁	A ₁₂	A ₁₃	A ₁₄	A ₁₅
C ₁	Pos	352	899	1880	424	1907	2173	1458
	Neu	0	35	67	7	41	92	60
	Neg	775	1300	859	308	1081	252	362
C ₂	Pos	9413	16631	12476	13755	12414	10304	9587
	Neu	244	188	149	35	61	40	172
	Neg	1760	2578	1678	1190	1565	464	1009
C ₃	Pos	1098	4289	1025	1848	1064	3115	2975
	Neu	0	44	24	35	17	56	67
	Neg	319	518	165	210	262	191	748
C ₄	Pos	865	1281	1191	1029	1379	533	519
	Neu	18	45	10	0	8	8	0
	Neg	339	196	147	105	188	46	89
C ₅	Pos	6540	6814	8840	8309	6914	3705	3986
	Neu	63	49	35	63	24	12	24
	Neg	386	297	257	161	156	34	118
C ₆	Pos	197	152	757	98	841	72	278
	Neu	18	2	31	0	26	4	2
	Neg	169	88	254	126	394	28	83

Finally, the performance scores of key features are constructed into intuitionistic fuzzy numbers. First, convert the data in Table 4 into intuitionistic fuzzy numbers (such as Eqs. (3.4), (3.5) and (3.6) in section 3.2.3.), and then take the performance scores obtained according to the key features as the output in the DEA model. At the same time, because DEA model requires that decision-making units must have input variables, all decision-making units are given the same virtual input variables ${\tilde{x}}_{ij} = (0.5, 0.5)$ . The input and output of IFMBWM-DEA model are shown in Table 5.

Table 5

Output of IFMBWM-DEA model

DMU	Output
	C ₁	C ₂	C ₃	C ₄	C ₅	C ₆
A ₁	(0.87,0.11)	(0.95,0.04)	(0.86,0.13)	(0.95,0.04)	(0.97,0.02)	(0.80,0.19)
A ₂	(0.76,0.18)	(0.96,0.03)	(0.92,0.06)	(0.96,0.03)	(0.99,0.00)	(0.67,0.23)
A ₃	(0.15,0.82)	(0.79,0.19)	(0.86,0.13)	(0.86,0.13)	(0.95,0.03)	(0.62,0.37)
A ₄	(0.78,0.18)	(0.93,0.06)	(0.95,0.03)	(0.95,0.04)	(0.97,0.01)	(0.76,0.21)
A ₅	(0.57,0.40)	(0.94,0.05)	(0.86,0.12)	(0.95,0.03)	(0.98,0.01)	(0.84,0.13)
A ₆	(0.83,0.10)	(0.84,0.14)	(0.86,0.12)	(0.86,0.11)	(0.91,0.07)	(0.75,0.24)
A ₇	(0.84,0.13)	(0.95,0.04)	(0.90,0.08)	(0.91,0.08)	(0.98,0.00)	(0.88,0.11)
A ₈	(0.89,0.08)	(0.96,0.03)	(0.92,0.06)	(0.97,0.01)	(0.98,0.01)	(0.87,0.10)
A ₉	(0.31,0.68)	(0.82,0.15)	(0.77,0.22)	(0.70,0.27)	(0.93,0.05)	(0.51,0.44)
A ₁₀	(0.40,0.58)	(0.85,0.13)	(0.88,0.10)	(0.84,0.12)	(0.95,0.04)	(0.62,0.36)
A ₁₁	(0.66,0.30)	(0.87,0.11)	(0.84,0.13)	(0.88,0.10)	(0.96,0.02)	(0.72,0.24)
A ₁₂	(0.57,0.41)	(0.91,0.07)	(0.88,0.10)	(0.90,0.09)	(0.97,0.01)	(0.43,0.56)
A ₁₃	(0.64,0.34)	(0.88,0.11)	(0.79,0.19)	(0.87,0.11)	(0.97,0.02)	(0.66,0.31)
A ₁₄	(0.86,0.10)	(0.95,0.04)	(0.92,0.05)	(0.90,0.07)	(0.98,0.00)	(0.69,0.26)
A ₁₅	(0.77,0.19)	(0.89,0.09)	(0.78,0.19)	(0.85,0.14)	(0.96,0.02)	(0.76,0.22)

4.3 Results of benchmarking and product ranking

First, we invite experts to use Saaty’s scale to compare the key attributes C = {C₁, C₂, . . . , C₆} extracted from online consumer reviews, and obtain the preference relationship matrix A of the consistency of intuitionistic fuzzy multiplication. The preference of intuitionistic fuzzy multiplication for virtual input is (1, 1). $A = {(\begin{matrix} (1, 1) & (\frac{1}{5}, 4) & (\frac{6}{5}, \frac{1}{3}) & (\frac{2}{5}, \frac{4}{3}) & (\frac{4}{9}, \frac{7}{4}) & (\frac{8}{7}, \frac{3}{4}) \\ (4, \frac{1}{5}) & (1, 1) & (3, \frac{1}{4}) & (\frac{8}{3}, \frac{1}{4}) & (\frac{9}{4}, \frac{5}{8}) & (7, \frac{1}{9}) \\ (\frac{1}{3}, \frac{6}{5}) & (\frac{1}{4}, 3) & (1, 1) & (\frac{1}{9}, 5) & (\frac{1}{7}, 3) & (\frac{7}{4}, \frac{2}{7}) \\ (\frac{4}{3}, \frac{2}{5}) & (\frac{1}{4}, \frac{8}{3}) & (5, \frac{1}{9}) & (1, 1) & (\frac{6}{5}, \frac{4}{7}) & (3, \frac{1}{7}) \\ (\frac{7}{4}, \frac{4}{9}) & (\frac{5}{8}, \frac{9}{4}) & (3, \frac{1}{7}) & (\frac{4}{7}, \frac{6}{5}) & (1, 1) & (\frac{5}{4}, \frac{2}{7}) \\ (\frac{3}{4}, \frac{8}{7}) & (\frac{1}{9}, 7) & (\frac{2}{7}, \frac{7}{4}) & (\frac{1}{7}, 3) & (\frac{2}{7}, \frac{5}{4}) & (1, 1) \end{matrix})}_{6 \times 6}$

According to the algorithm for sorting standards proposed by [56], the outgoing degrees of all standard nodes can be calculated, $D_{1}^{out} = 2$ , $D_{2}^{out} = 5$ , $D_{3}^{out} = 1$ , $D_{4}^{out} = 4$ , $D_{5}^{out} = 3$ , $D_{6}^{out} = 0$ . Thus, the best standard C_B = C₂ and the worst standard C_W = C₆ can be determined. The weight of the standard needs to be satisfied τ₂ ⩾ τ₄ ⩾ τ₅ ⩾ τ₁ ⩾ τ₃ ⩾ τ₆, υ₂ ⩽ υ₄ ⩽ υ₅ ⩽ υ₁ ⩽ υ₃ ⩽ υ₆. By substituting the data into the model (3.11) and model (3.12), the optimal weight results of intuitionistic fuzzy multiplication can be solved as Table 6.

Table 6
The optimal weights of IFMBWM-DEA

DMU w ₁ w ₂ w ₃ w ₄ w ₅ w ₆ w ₇

A ₁ (0.01,0.57) (0.26,0.70) (0.01,0.53) (0.04,0.66) (0.03,0.61) (0.01,0.48) (0.66,0.01)

A ₂ (0.04,0.57) (0.12,0.70) (0.03,0.53) (0.09,0.66) (0.05,0.61) (0.01,0.49) (0.66,0.01)

A ₃ (0,0.57) (0.11,0.70) (0,0.52) (0.11,0.66) (0.11,0.61) (0.00,0.49) (0.66,0.00)

A ₄ (0.03,0.57) (0.12,0.70) (0.03,0.53) (0.09,0.66) (0.06,0.62) (0.00,0.49) (0.66,0.00)

A ₅ (0.03,0.57) (0.12,0.69) (0.03,0.52) (0.09,0.66) (0.06,0.62) (0.00,0.49) (0.66,0.00)

A ₆ (0.07,0.57) (0.07,0.69) (0.06,0.53) (0.10,0.66) (0.06,0.62) (0.00,0.49) (0.65,0.00)

A ₇ (0.00,0.57) (0.32,0.70) (0.00,0.53) (0.01,0.66) (0.01,0.62) (0.01,0.49) (0.66,0.00)

A ₈ (0.05,0.57) (0.12,0.70) (0.03,0.53) (0.09,0.66) (0.06,0.62) (0.01,0.49) (0.66,0.00)

A ₉ (0.00,0.57) (0.34,0.70) (0.00,0.53) (0.00,0.66) (0.00,0.62) (0.00,0.49) (0.66,0.00)

A ₁₀ (0.00,0.57) (0.12,0.70) (0.01,0.53) (0.11,0.66) (0.10,0.62) (0.00,0.49) (0.66,0.00)

A ₁₁ (0.00,0.57) (0.11,0.70) (0.00,0.53) (0.11,0.66) (0.11,0.61) (0.00,0.49) (0.66,0.00)

A ₁₂ (0.00,0.57) (0.12,0.70) (0.00,0.53) (0.11,0.66) (0.11,0.62) (0.00,0.48) (0.66,0.00)

A ₁₃ (0.01,0.57) (0.12,0.70) (0.01,0.53) (0.10,0.66) (0.10,0.62) (0.01,0.49) (0.66,0.00)

A ₁₄ (0.00,0.57) (0.34,0.69) (0.00,0.53) (0,0.65) (0,0.61) (0,0.48) (0.66,0.01)

A ₁₅ (0.05,0.57) (0.10,0.69) (0.03,0.52) (0.07,0.66) (0.06,0.61) (0.03,0.48) (0.65,0.01)

DMU	w ₁	w ₂	w ₃	w ₄	w ₅	w ₆	w ₇
A ₁	(0.01,0.57)	(0.26,0.70)	(0.01,0.53)	(0.04,0.66)	(0.03,0.61)	(0.01,0.48)	(0.66,0.01)
A ₂	(0.04,0.57)	(0.12,0.70)	(0.03,0.53)	(0.09,0.66)	(0.05,0.61)	(0.01,0.49)	(0.66,0.01)
A ₃	(0,0.57)	(0.11,0.70)	(0,0.52)	(0.11,0.66)	(0.11,0.61)	(0.00,0.49)	(0.66,0.00)
A ₄	(0.03,0.57)	(0.12,0.70)	(0.03,0.53)	(0.09,0.66)	(0.06,0.62)	(0.00,0.49)	(0.66,0.00)
A ₅	(0.03,0.57)	(0.12,0.69)	(0.03,0.52)	(0.09,0.66)	(0.06,0.62)	(0.00,0.49)	(0.66,0.00)
A ₆	(0.07,0.57)	(0.07,0.69)	(0.06,0.53)	(0.10,0.66)	(0.06,0.62)	(0.00,0.49)	(0.65,0.00)
A ₇	(0.00,0.57)	(0.32,0.70)	(0.00,0.53)	(0.01,0.66)	(0.01,0.62)	(0.01,0.49)	(0.66,0.00)
A ₈	(0.05,0.57)	(0.12,0.70)	(0.03,0.53)	(0.09,0.66)	(0.06,0.62)	(0.01,0.49)	(0.66,0.00)
A ₉	(0.00,0.57)	(0.34,0.70)	(0.00,0.53)	(0.00,0.66)	(0.00,0.62)	(0.00,0.49)	(0.66,0.00)
A ₁₀	(0.00,0.57)	(0.12,0.70)	(0.01,0.53)	(0.11,0.66)	(0.10,0.62)	(0.00,0.49)	(0.66,0.00)
A ₁₁	(0.00,0.57)	(0.11,0.70)	(0.00,0.53)	(0.11,0.66)	(0.11,0.61)	(0.00,0.49)	(0.66,0.00)
A ₁₂	(0.00,0.57)	(0.12,0.70)	(0.00,0.53)	(0.11,0.66)	(0.11,0.62)	(0.00,0.48)	(0.66,0.00)
A ₁₃	(0.01,0.57)	(0.12,0.70)	(0.01,0.53)	(0.10,0.66)	(0.10,0.62)	(0.01,0.49)	(0.66,0.00)
A ₁₄	(0.00,0.57)	(0.34,0.69)	(0.00,0.53)	(0,0.65)	(0,0.61)	(0,0.48)	(0.66,0.01)
A ₁₅	(0.05,0.57)	(0.10,0.69)	(0.03,0.52)	(0.07,0.66)	(0.06,0.61)	(0.03,0.48)	(0.65,0.01)

The efficiency score and ranking results of alternative products can be obtained by substituting the obtained optimal weight into Eq. (4.12), as shown in Table 7 below.

Table 7

Summary of information obtained from DEA and IFMBWM-DEA models

DMU	DEA		IFMBWM-DEA
	Score	Rank	Score	Rank
A ₁	(0.9900,0.5910)	7	(0.0899,0.5072)	1
A ₂	(1.0000,0.4390)	3	(0.1612,9.3030)	3
A ₃	(0.9600,1.0000)	13	(0.1540,18295)	14
A ₄	(1.0000,0.4510)	4	(0.1610,152.61)	8
A ₅	(0.9980,0.5740)	6	(0.1591,8.1337)	2
A ₆	(0.9340,1.0000)	15	(0.1592,224974)	15
A ₇	(1.0000,0.3640)	2	(0.1444,29.978)	5
A ₈	(1.0000,0.2730)	1	(0.1626,13.886)	4
A ₉	(0.9390,1.0000)	14	(0.1356,3912.8)	11
A ₁₀	(0.9600,0.8390)	9	(0.1548,2464.1)	10
A ₁₁	(0.9730,0.6770)	8	(0.1561,7203.6)	13
A ₁₂	(0.9800,1.0000)	12	(0.1573,61.099)	7
A ₁₃	(0.9800,0.8640)	10	(0.1566,5392.9)	12
A ₁₄	(0.9999,0.4950)	5	(0.1413,34.890)	6
A ₁₅	(0.9750,0.8640)	11	(0.1585,2332.8)	9

renewcommand theequation thesection.arabic equation

5 Comparative analysis

In order to prove the effectiveness of the proposed method, two aspects are compared in this section, one is the comparison between considering the preference of decision makers and not considering the preference of decision makers, and the other is the comparison between the proposed method and other existing ranking methods.

5.1 Comparison of whether or not to consider the preferences of decision makers

In order to verify whether the proposed consideration of the preference of decision makers will have an effect on the ranking of alternatives, we in contrast the DEA methods of whether to pay attention to the preference of decision makers. In order to save space, only the final results are displayed. The results are shown in Table 7.

It can be viewed from Table 7 that the ranking results of these two methods are obviously different. This is mainly due to the fact considering the preference of decision makers, the weight of attributes will have an effect on the dominance of alternative products, and then affect the final ranking of alternative products. In short, thinking about the preference of decision makers will affect the final ranking of alternative products by influencing the weight of attributes.

5.2 Comparison between the proposed method and other existing methods

The proposed IFMBWM-DEA is compared with IF-VIKOR [61] and IF-TOPSIS [62] methods.

(1) Comparison with the IF-TOPSIS method

The IF-TOPSIS product ranking method focuses on normalizing the original data matrix, determining the weight coefficients of the attributes based on expert recommendations, then calculating the weighted Euclidean distance and proximity between the alternative product and the best or worst solution, and ranking the alternative products based on proximity. The attribute weights are w = [0.15, 0.25, 0.1, 0.2, 0.15, 0.05, 0.1], and the steps of the IF-TOPSIS method are as follows.

Step 1: Identify positive and negative ideal solutions.

The laptop’s positive ideal solution (PIS) A and negative ideal solution (NIS) A+^– are defined as follows. $A^{+} = {a_{j +}, {max}_{i = 1}^{m} 〈 a_{ij} 〉; j = 1, 2, \dots, n}$ $A^{-} = {a_{j -}, {min}_{i = 1}^{m} 〈 a_{ij} 〉; j = 1, 2, \dots, n}$

Step 2: Calculate the Euclidean distance between each alternative product and the positive and negative ideal solution. $D_{i}^{+} = \sum_{j = 1}^{n} w_{j} d (a_{ij}, a_{j +})$ $D_{i}^{-} = \sum_{j = 1}^{n} w_{j} d (a_{ij}, a_{j -})$

Step 3: Calculate proximity. ${CI}_{i} = \frac{D_{i}^{-}}{D_{i}^{+} + D_{i}^{-}}, i = 1, 2, \dots, m$

The results of the IF-TOPSIS method, which calculates the positive and negative ideal distances and similar proximity of the substitutable products, are shown in the Table 8 for the 15 laptops.

Table 8
The ranking result calculated by the IF-TOPSIS method

DMUs $D_{i}^{+}$ $D_{i}^{-}$ CI _i Rank

A ₁ 0.0719 0.9021 0.9262 2

A ₂ 0.1047 0.8631 0.8918 5

A ₃ 0.6846 0.5757 0.4568 15

A ₄ 0.1055 0.8555 0.8902 6

A ₅ 0.2873 0.7572 0.72493 11

A ₆ 0.1424 0.8745 0.86 7

A ₇ 0.0770 0.8807 0.9196 3

A ₈ 0.0147 0.9406 0.9846 1

A ₉ 0.4445 0.5427 0.5497 14

A ₁₀ 0.4728 0.6197 0.5673 13

A ₁₁ 0.2273 0.7605 0.7699 10

A ₁₂ 0.2822 0.6981 0.7121 12

A ₁₃ 0.2104 0.7318 0.7767 9

A ₁₄ 0.086 0.8920 0.9121 4

A ₁₅ 0.1546 0.8141 0.8404 8

DMUs	$D_{i}^{+}$	$D_{i}^{-}$	CI _i	Rank
A ₁	0.0719	0.9021	0.9262	2
A ₂	0.1047	0.8631	0.8918	5
A ₃	0.6846	0.5757	0.4568	15
A ₄	0.1055	0.8555	0.8902	6
A ₅	0.2873	0.7572	0.72493	11
A ₆	0.1424	0.8745	0.86	7
A ₇	0.0770	0.8807	0.9196	3
A ₈	0.0147	0.9406	0.9846	1
A ₉	0.4445	0.5427	0.5497	14
A ₁₀	0.4728	0.6197	0.5673	13
A ₁₁	0.2273	0.7605	0.7699	10
A ₁₂	0.2822	0.6981	0.7121	12
A ₁₃	0.2104	0.7318	0.7767	9
A ₁₄	0.086	0.8920	0.9121	4
A ₁₅	0.1546	0.8141	0.8404	8

(2) Comparison with the IF-VIKOR method

IF-VIKOR is a MCDM based on ideal point trade-off ranking, characterized by the ability to combine maximizing group utility and minimizing individual regret to achieve a preferred choice of finite decision options. Based on the data in Table 5, group utility and individual regret are determined, and then the group utility value and the trade-off value are solved for; the smaller the trade-off value, the better the alternative. the steps of the IF-VIKOR method are as follows.

Step 1: Calculate each attribute for positive and negative intuitionistic fuzzy ideal solutions. $d_{i}^{+} = \frac{1}{2} (| u_{ij} - u_{j}^{+} | + | v_{ij} - v_{j}^{+} | + | π_{ij} - π_{j}^{+} |)$ $d_{i}^{-} = \frac{1}{2} (| u_{ij} - u_{j}^{-} | + | v_{ij} - v_{j}^{-} | + | π_{ij} - π_{j}^{-} |)$ $π_{i}^{+} = 1 - u_{j}^{+} - v_{j}^{+}, π_{i}^{-} = 1 - u_{j}^{-} - v_{j}^{-}$

Step 2: Determine the positive and negative intuitionistic fuzzy ideal solutions of the intuitionistic fuzzy synthesis decision matrix. $r^{+} = 〈 u^{+}, v^{+} 〉 = 〈 max_{u}, min_{v} 〉, (j = 1, 2, \dots, n)$ $r^{-} = 〈 u^{-}, v^{-} 〉 = 〈 min_{u}, max_{v} 〉, (j = 1, 2, \dots, n)$

Step 3: Calculate the utility value U_i, individual regret value K_i and trade-off value for each option Z_i. $U_{i} = \sum_{j = 1}^{n} (w_{j} \frac{d (r_{j}^{+}, r_{ij})}{d (r_{j}^{+}, r_{j}^{-})}), (i = 1, 2, \dots, m$ $K_{i} = max (w_{j} \frac{d (r_{j}^{+}, r_{ij})}{d (r_{j}^{+}, r_{j}^{-})}), (i = 1, 2, \dots, m$ $\begin{matrix} Z_{i} = λ \frac{U_{i} - {min}_{i} U_{i}}{{max}_{i} U_{i} - {min}_{i} U_{i}} + \dots \\ 1 - λ (\frac{U_{i} - {min}_{i} U_{i}}{{max}_{i} U_{i} - {min}_{i} U_{i}}, (i = 1, 2, \dots, m \end{matrix}$

In the above equation, w_i are the weights of the attributes of the integrated intuitionistic fuzzy decision matrix, and λ (0 ⩽ λ ⩽ 1) are the compromise coefficients. In the article, λ = 0.5,a compromise and balance approach are taken.

Step 4: Rank the alternatives according to the compromise value, the larger the Z_i, the worse the plan; the smaller the Z_i, the better the plan.

Based on the above steps of the IF-VIKOR method, the group utility value, individual regret value and compromise value of the alternative products were calculated and the results of the ranking of the 15 laptops are shown in the Table 9.

Table 9

The ranking result calculated by the IF-VIKOR method

DMUs	U _i	K _i	Z _i	Rank
A ₁	0.0647	0.0238	0.1485	6
A ₂	0.0389	0.0126	0.061	2
A ₃	0.2986	0.1161	0.8995	14
A ₄	0.0583	0.0221	0.1317	3
A ₅	0.0848	0.0309	0.2092	7
A ₆	0.2334	0.0806	0.6453	13
A ₇	0.0551	0.0241	0.1365	5
A ₈	0.0127	0.0073	0	1
A ₉	0.3705	0.0978	0.9159	15
A ₁₀	0.238	0.072	0.6122	12
A ₁₁	0.1755	0.0621	0.4793	11
A ₁₂	0.1489	0.0325	0.3061	8
A ₁₃	0.1831	0.0541	0.4532	10
A ₁₄	0.0554	0.024	0.1364	4
A ₁₅	0.182	0.0499	0.4324	9

5.3 Discussion

Comparing the results of ranking products using the DEA method in section 5.1, we can see that whether decision preferences are considered has a significant impact on consumers making purchase decisions. The results of ranking products by IF-TOPSIS, IF-VIKOR in section 5.2 can be seen in Fig. 4, the results of ranking products by the IFMBWM-DEA method are not identical to those of the IF-TOPSIS and IF-VIKOR methods. So we checked the correlation between the proposed method and the other MCDM methods using the Spearman correlation coefficient, which checks the relationship between the rankings obtained from the different MCDM methods, and the calculated results are shown in Table 10. It can be seen from Table 10 that although the ranking results are not identical, there is a high correlation coefficient.

Fig. 4

Ranking results.

Table 10

Spearman coefficients for IFMBWM-DEA and other ranking methods results

Method	Mean	SD	DEA	IF-VIKOR	IF-TOPSIS	IFMBWM-DEA
DEA	8.000	4.472	1
IF-VIKOR	8.000	4.472	0.896**	1
IF-TOPSIS	8.000	4.472	0.718**	0.800**	1
IFMBWM-DEA	8.000	4.472	0.721**	0.782**	0.582*	1

*p < 0.05; **p < 0.01.

The reason for the incomplete consistency of the ranking results between these methods is because that the proposed method considers consumers’ own purchase preference situations. The IF-TOPSIS and IF-VOKOR methods assume that consumers are perfectly rational when purchasing laptops, but are not perfectly rational in the actual purchase decision process [63]. Therefore, the OCR-based method of product ranking is more reasonable.

Although many websites now provide quantitative benchmarks for ranking products, it is still recommended that consumers make their purchase decisions through OCR. In contrast, the method proposed in this paper extracts key attributes from OCR as evaluation indicators, integrates positive, neutral and negative sentiment orientations, and considers the decision maker’s own preferences, making it more objective and realistic and simplifying the decision-making process.

6 Conclusion

With the explosive growth of massive amounts of data, the number of online consumer reviews is also increasing, and more and more attention is being paid to how these reviews can be used to aid decision-making. This study proposes an IFMBWM-DEA method that considers decision maker preferences and solves the proposed bi-objective model using a minimum-maximum approach. The method makes full use of IFS theory, SA, IFMBWM and DEA to deal with the ranking of substitutable products and has the following advantages in practical application.

(1) Most of the existing studies that rank alternative products based on OCR assess them on the basis of a few established criteria based on past experience. This may lead to insufficient consideration of substitutable products and increase the risk of decision making. Therefore, this paper constructs evaluation indicators by extracting the attributes that consumers care about from OCR, which is more in line with objective reality.

(2) Transforming the massive amount of OCR text information into data that can be used for decision analysis, the neutral sentiment in OCR has been neglected in previous studies, which may lead to the loss of decision information. In this paper, we use IFs to integrate positive, neutral and negative emotions in OCR.

(3) The combination of the consumer’s own preferences and OCR together determines the optimal weighting of key attributes. In this paper, we use BWM to consider decision maker preferences and combine it with DEA to rank alternative products, making the final ranking results more reasonable.

(4) Compared to previous methods of ranking alternative products based on OCR, the decision-making process is greatly simplified, making it easier for consumers to make purchase decisions through OCR.

Currently, OCR has been used in different scenarios in real life. The method proposed in this paper is not only applicable to ranking alternative products based on OCR in E-commerce to assist consumers in making purchase decisions, but also provides a low-cost and time-efficient source of information for merchants to make management decisions, and can be applied to other scenarios with similar processes. In the tourism industry, it can be used to explore the desires of travelers, explore and recommend the image of tourist destinations, improve and recommend hotel services, and evaluate and recommend restaurant satisfaction. In the medical industry, the current practice of medical management based on the Internet has improved the unbalanced allocation of medical service resources, allowing for hospital recommendations, doctor service quality evaluation and recommendations. In the film and television industry, in the context of social networking, it can be used for movie recommendations and box office forecasting.

It should be noted that this study has some limitations. Firstly, only text-based OCRs are recognized in the SA process, and the method could be improved to recognize more forms of OCRs such as emojis and videos, etc. Secondly, the method proposed in this paper only serves individual potential consumers and does not consider group decision making, the model could be extended to group decision making problems to deal with more real-world problems. In future research, we can further investigate and extend our proposed model by drawing on the novel approach to solving fractional DEA proposed by Muhammad Akram et al. in their study of Fractional transportation problems under interval-valued Fermatean fuzzy sets [64]. In this paper, we use IFs to describe uncertainty, and Robust Optimization methods are also a very effective tool for dealing with uncertainty [65 –67]. In the future, we also can extend Robust Optimization methods to this DEA problem driven by online reviews and use them to assist decision makers in making effective decisions.

Footnotes

Acknowledgments

The work is supported by a research grant from the National Natural Science Foundation of China (No. 72171123, No. 72171149), National Social Science Foundation of China (No. 21ZDA105)

COVID-19 and e-commerce: a global review | UNCTAD

References

Yang

, Liu

, Liang

, Tang

Exploiting user experience from online customer reviews for product design, Int J Inf Manage 46 (2019), 173.

Imtiaz

M.N.

, Ben Islam

M.K.

Identifying Significance of Product Features on Customer Satisfaction Recognizing Public Sentiment Polarity: Analysis of Smart Phone Industry Using Machine-Learning Approaches, (Taylor & Francis) , Appl Artif Intell 34 (2020), 832.

Sampaio

C.H.

, Ladeira

W.J.

, Santini

F.D.O.

Apps for mobile banking and customer satisfaction: a cross-cultural study, Int J Bank Mark(Emerald Publishing Limited) 35 (2017), 1133.

Cha

, Borchgrevink

C.P.

Customers’ perceptions in value and food safety on customer satisfaction and loyalty in restaurant environments: moderating roles of gender and restaurant types, J Qual Assur Hosp Tour (Routledge) 20 (2019), 143.

, Shu

, Yao

Optimal pricing and service level in supply chain considering misreport behavior and fairness concern, Comput Ind Eng 174 (2022), 108759.

, Ma

The impact of carbon policy on carbon emissions in various industrial sectors based on a hybrid approach, Environ Dev Sustain (2022).

Liu

, Bi

J.-W.

, Fan

Z.-P.

A Method for Ranking Products Through Online Reviews Based on Sentiment Classification and Interval-Valued Intuitionistic Fuzzy TOPSIS, Int J Inf Technol Decis Mak 16 (2017), 1497.

Xue

, Li

, Han

Evaluation and Emotional Analysis of Mobile Phone Sales of JD E-commerce Platform Based on LDA Model, J Phys Conf Ser 1861 (2021), 12076.

Calı and Ş

Balaman

Improved decisions for marketing, supply and purchasing: Mining big data through an integration of sentiment analysis and intuitionistic fuzzy multi criteria assessment, Comput Ind Eng 129 (2019), 315.

10.

Ahani

, Nilashi

, Yadegaridehkordi

, Sanzogni

, Tarik

A.R.

, Knox

, Samad

, Ibrahim

Revealing customers’ satisfaction and preferences through online review analysis: The case of Canary Islands hotels, J Retail Consum Serv 51 (2019), 331.

11.

Cambria

Affective Computing and Sentiment Analysis, IEEE Intell Syst 31 (2016), 102.

12.

Jia

, Li

Chinese micro-blog sentiment classification based on emotion dictionary and semantic rules, in Proc – 2020 Int Conf Comput Inf Big Data Appl CIBDA 2020 (2020).

13.

Zhao

, Zhang

, Liu

, Wang

, Zhang

Research on Domain Emotion Dictionary Construction Method based on Improved SO-PMI Algorithm, in ACMInt Conf Proceeding Ser, (2021).

14.

Sun

, Chu

, Du

Sentiment analysis of hotel reviews based on deep leaning, in Proc –2020 Int Conf Robot Intell Syst ICRIS 2020 (2020).

15.

, Xu

, Mangla

S.K.

, Chan

F.T.S.

, Zhu

, Arisian

Matchmaking in reward-based crowdfunding platforms: a hybrid machine learning approach, Int J Prod Res (Taylor & Francis) 60 (2022), 7551.

16.

Abdul Aziz

, Starkey

Predicting Supervise Machine Learning Performances for Sentiment Analysis Using Contextual-Based Approaches, in IEEE Access 8 (2020).

17.

Sattar

, Fatima

Sentiment Analysis Based on Reviews Using Machine Learning Techniques, in Pakistan J Eng Technol 4 (2021).

18.

Fan

Z.-P.

, Li

G.-M.

, Liu

Processes and methods of information fusion for ranking products based on online reviews: An overview, Inf Fusion 60 (2020), 87.

19.

Jiang

, Li

, Wang

, Zhang

Deep feature weighting for naive Bayes and its application to text classification, in Eng Appl Artif Intell 52 (2016).

20.

Pang

, Lee

Opinion Mining and Sentiment Analysis, Found Trends Inf Retr 2 (2008), 1.

21.

Kumar

, Parimala

An Integration of Sentiment Analysis and MCDM Approach for Smartphone Recommendation, Int J Inf Technol Decis Mak 19 (2020).

22.

Zhang

, Tian

, Fan

, Li

Customized ranking for products through online reviews: a method incorporating prospect theory with an improved VIKOR, Appl Intell 50 (2020), 1725.

23.

Vyas

, Uma

, Ravi

Aspect–based approach to measure performance of financial services using voice of customer, J King Saud Univ –Comput Inf Sci 34 (2022), 2262.

24.

Kumar

A Multi-Criteria Decision Making Approach for Recommending a Product Using Sentiment Analysis, in 2018 12th Int Conf Res Challenges Inf Sci (2018).

25.

Abirami

A.M.

, Askarunisa

Sentiment analysis model to emphasize the impact of online reviews in healthcare industry, Online Inf Rev (Emerald Publishing Limited) 41 (2017), 471.

26.

Ravi

, Ravi

Ranking of branded products using aspect-oriented sentiment analysis and ensembled multiple criteria decision-making, Int J Knowl Manag Tour Hosp (Inderscience Publishers) 1 (2017), 317.

27.

Alrababah

S.A.A.

, Gan

K.H.

, Tan

Comparative Analysis of MCDM Methods for Product Aspect Ranking: TOPSIS and VIKOR, in 2017 8th Int Conf Inf Commun Syst, (2017).

28.

Liang

, Liu

, Wang

Hotel selection utilizing online reviews: A novel decision support model based on sentiment analysis and dl-VIKOR method, Technol Econ Dev Econ 25 (2019), 1139.

29.

, Xia

Multiplicative consistency of hesitant fuzzy preference relation and its application on group decision making, Int J Inf Technol Decis Mak 13 (2014).

30.

Fan

Z.-P.

, Xi

, Li

Supporting the purchase decisions of consumers: A comprehensive method for selecting desirable online products, Kybernetes (Emerald Publishing Limited) 47 (2018), 689.

31.

Song

, Li

A purchase decision support model considering consumer personalization about aspirations and risk attitudes, J Retail Consum Serv 63 (2021), 102728.

32.

, Feng

, Jiang

, Wei

, Xu

Data-Driven Robust DEA Models for Measuring Operational Efficiency of Endowment Insurance System of Different Provinces in China, in Sustainability 14 (2022).

33.

, Xu

, Ji

, Feng

, Wei

, Jiang

Data-Driven Robust Data Envelopment Analysis for Evaluating the Carbon Emissions Efficiency of Provinces in China, in Sustainability 14 (2022).

34.

Bagherikahvarin

A DEA-PROMETHEE approach for complete ranking of units, Int J Oper Res 35 (2016).

35.

Karami

, Ghasemy Yaghin

, Mousazadegan

, Supplier selection and evaluation in the garment supply chain: an integrated DEA – PCA– VIKOR approach,J Text Inst (Taylor & Francis) . 112 (2021), 578.

36.

Akram

, Shah

S.M.U.

, Al-Shamiri

M.M.A.

, Edalatpanah

S.A.

Extended DEA method for solving multi-objective transportation problem with Fermatean fuzzy sets, AIMS Math 8 (2023), 924.

37.

Mahmoudi

, Emrouznejad

, Shetab-Boushehri

S.-N.

, Hejazi

S.R.

The origins, development and future directions of data envelopment analysis approach in transportation systems, Socioecon Plann Sci 69 (2020), 100672.

38.

Omrani

Common weights data envelopment analysis with uncertain data: A robust optimization approach, Comput Ind Eng 66 (2013), 1163.

39.

C.-K.

, Liu

F.-B.

, Hu

C.-F.

A Hybrid Fuzzy DEA/AHP Methodology for Ranking Units in a Fuzzy Environment, in Symmetry (Basel) 9 (2017).

40.

, Tang

, Liao

, Shen

, Lev

The state-of-the-art survey on integrations and applications of the best worst method in decision making: Why, what, what for and what’s next? Omega87 (2019), 205.

41.

Liu

, Bi

J.-W.

, Fan

Z.-P.

Ranking products through online reviews: A method based on sentiment analysis technique and intuitionistic fuzzy set theory, Inf Fusion 36 (2017), 149.

42.

Gobi

, Rathinavelu

Analyzing cloud based reviews for product ranking using feature based clustering algorithm, Cluster Comput 22 (2019), 6977.

43.

Zhang

, Li

, Wu

An extended TODIM method to rank products with online reviews under intuitionistic fuzzy environment, J Oper Res Soc (Taylor & Francis) 71 (2020), 322.

44.

Liu

, Teng

Probabilistic linguistic TODIM method for selecting products through online product reviews, Inf Sci (Ny) 485 (2019), 441.

45.

Sharma

, Tandon

, Kapur

P.K.

, Aggarwal

A.G.

Ranking hotels using aspect ratings based sentiment classification and interval-valued neutrosophic TOPSIS, Int J Syst Assur Eng Manag 10 (2019), 973.

46.

Heidary Dahooie

Raafat

Qorbani

A.R.

Daim

An intuitionistic fuzzy data-driven product ranking model using sentiment analysis and multi-criteria decision-making, Technol Forecast Soc Change 173 (2021), 121158.

47.

Zhang

, Guo

, Zhang

, Zhou

, Wang

Product selection based on sentiment analysis of online reviews: an intuitionistic fuzzy TODIM method, Complex Intell Syst 8 (2022), 3349.

48.

Qin

, Wang

, Xu

Ranking Tourist Attractions through Online Reviews: A Novel Method with Intuitionistic and Hesitant Fuzzy Information Based on Sentiment Analysis, Int J Fuzzy Syst 24 (2022), 755.

49.

J.-W.

, Han

T.-Y.

, Yao

, Li

Ranking hotels through multi-dimensional hotel information: a method considering travelers’ preferences and expectations, Inf Technol Tour 24 (2022), 127.

50.

Tao

L.-L.

, You

T.-H.

A multi-criteria decision-making model for hotel selection by online reviews: Considering the traveller types and the interdependencies among criteria, Appl Intell 52 (2022), 12436.

51.

Darko

A.P.

, Liang

, Xu

, Agbodah

, Obiora

A novel multi-attribute decision-making for ranking mobile payment services using online consumer reviews, Expert Syst Appl 213 (2023), 119262.

52.

Atanassov

K.T.

Intuitionistic fuzzy sets, Fuzzy Sets Syst 20 (1986), 87.

53.

Intuitionistic preference relations and their application in group decision making, Inf Sci (Ny) 177 (2007), 2363.

54.

Xia

, Xu

Group decision making based on intuitionistic multiplicative aggregation operators, Appl Math Model (Elsevier Inc.) 37 (2013), 5120.

55.

Xia

, Xu

, Liao

Preference relations based on intuitionistic multiplicative information, IEEE Trans Fuzzy Syst (IEEE) 21 (2013), 113.

56.

Mou

, Xu

, Liao

An intuitionistic fuzzy multiplicative best-worst method for multi-criteria group decision making, Inf Sci (Ny) (Elsevier Inc.) 374 (2016), 224.

57.

Charnes

, Cooper

W.W.

, Rhodes

Measuring the efficiency of decision making units, Eur J Oper Res 2 (1978), 429.

58.

Rezaei

Best-worst multi-criteria decision-making method: Some properties and a linear model, Omega 64 (2016), 126.

59.

Zohrehbandian

, Makui

, Alinezhad

A compromise solution approach for finding common weights in DEA: an improvement to Kao and Hung’s approach, J Oper Res Soc 61 (2010), 604.

60.

, Jiang

, Feng

A new robust insurance model considering the time of accident, J Intell Fuzzy Syst (IOS Press) 43 (2022), 5515.

61.

Zeng

, Chen

S.-M.

, Kuo

L.-W.

Multiattribute decision making based on novel score function of intuitionistic fuzzy values and modified VIKOR method, Inf Sci (Ny) 488 (2019), 76.

62.

Bilgili

, Zarali

, Ilgün

M.F.

, Dumrul

The evaluation of renewable energy alternatives for sustainable development in Turkey using intuitionistic fuzzy-TOPSIS method, Renew Energy 189 (2022), 1443.

63.

Wei

, Qu

, Jiang

, Feng

, Xu

, Zhao

Robust minimum cost consensus models with aggregation operators under individual opinion uncertainty, J Intell Fuzzy Syst (IOS Press) 42 (2022), 2435.

64.

Akram

, Shah

S.M.U

, Shamiri

, Edalatpanah

S.A.

,Fractional transportation problem under interval-valued Fermatean fuzzy sets, AIMS Math 7 (2022), 17327.

65.

Wei

, Qu

, Wang

, Luan

, Zhao

Data-Driven Robust Maximum Expert Mixed Integer Con-sensus Models Under Multirole’s Opinions Uncertainty by Considering Noncooperators, IEEE Trans Comput Soc Syst, (2022), 1.

66.

, Wei

, Wang

, Li

, Jin

, Chaib

Robust minimum cost consensus models with various individual preference scenarios under unit adjustment cost uncertainty, Inf Fusion 89 (2023), 510.

67.

, Li

, Zhang

Risk-Averse Two-Stage Stochastic Minimum Cost Consensus Models with Asymmetric Adjustment Cost, Gr Decis Negot 31 (2022), 261.