Internet public informatioan text data mining and intelligence influence analysis for user intent understanding

Abstract

We propose models based on SVM, Naïve Bayes and deep learning to solve the consumption intention classification problem. Applying consumption intention mining to prediction tasks in social media. This paper discusses consumption intention towards a certain kind of product, i.e. movie, and uses movie consumption intention as an important feature in box office prediction. We combine consumption intention with traditional features used in the problem of box office prediction, and achieve a outperforms previous work of this problem We build a system based on linear regression which automatically predicts movies’ total box office and opening weekend box office one day prior to the movie’s release date.

Keywords

Text intention mining SVM deep learning

1 Introduction

The rapid growth of natural language sentence query brings great challenges to traditional search engines. Most of the traditional search engines mainly rely on keyword matching technology to perform fast search and return to user search results. Considering the natural language sentence query as described above, without understanding the semantic intent of the query, the search engine may return only the document that matches the user’s query keyword, rather than returning the information that the user really wants to find. However, when it is known that the user queries semantic intent (e.g. restaurant search) and at the same time knows the specific meaning of each query component (e.g. “French restaurant” is the search core word, “best” is a constraint). Then the search engine can search the query according to a specific pattern and return the most relevant and necessary results to the user, not only returning the results of some of the related words in the query. Therefore, one of the necessary factors for the further development and improvement of the search engine system is the query intent understanding. In recent years, the problem of query semantic intent understanding has gradually attracted more and more attention. The work of this paper focuses on the semantic intent understanding of natural language queries for the field of map search.

For business intent recognition on Query, the method used in this paper is to use Query itself, search engine search result interface and other content as the data source, and then translate the problem into a business intent recognition problem on the web page. Later, Derek Hao Hu [1 –3] and others conducted further research on the identification of commercial intent on Query. Considering that Honghua Kai’s work only analyzes Query itself without considering the user’s individual needs, they proposed the POINT algorithm (personalized online business intent detection). The algorithm combines the user’s query with the user’s profile (including the user’s search history, etc.) based on the conditional random field. In addition, Ashcan and Clarke [4 –6] and others use the user’s click-through behavior, and Guo and Agichtein [7 –9] use the user’s mouse click and scroll to perform online business intent recognition. However, the above work is based on the data in the search engine, and is essentially different from the consumer intent recognition on Weibo. There are many research work on online business intent recognition (OCI). In 2006, the concept of online commercial advertising was first proposed by Honghua Kai [10 –13] and others.

Kröll and Strohmaier [14 –16] first defined a new concept: Intent Analysis, which is intent analysis. In the 2009 article, they argued that intent analysis is a problem that is somewhat similar to sentiment analysis and treats it as a multivariate classification problem. Later in the 2013 article, Hollerit and Kröll [17 –19] and others studied the consumption intentions on Weibo. Their defined consumption intentions need to include at least one consumer intent keyword, such as auction, buy, cheap, etc., which is closer to the category of “explicit consumption intentions containing consumer intent triggers” in this classification system. Later, the consumption intention classification was carried out by methods such as SVM and Naïve Bayes, and finally achieved an accuracy of about 57%. In 2013, Zhiyuan Chen [20 –23] and others proposed the concept of “intentional text mining in online forums”. They conducted intent mining in an online forum, for example: “I want to buy a camera.” Their definition of intent is similar to the explicit consumption intention of a consumer intention trigger as defined in this article. They propose different expressions of intentions for different fields, and based on this idea, propose an intent mining algorithm based on migration learning. In this paper, their proposed algorithms are implemented and compared with the deep learning-based consumer intent mining algorithm. In addition, Jinpeng Wang [24 –26] and others proposed the problem of mining trend-related products from Weibo. They define “trends” as topics that are heatedly discussed by users on Weibo. For example, if someone says on Weibo that “the air in Beijing is very bad recently”, they hope to dig out products related to air purifiers, masks, etc. from the trend of poor air. Therefore, their work unearths a product related to a certain trend, and the consumption intention of the product, can be considered as a special case of the consumption intention research in this paper.

2 Conditional random field model

The Conditional Random Field Model (CRF) was first proposed by Lafferty et al. [27–28], which is an undirected graph model, that is, a set of output random variables under the given set of input random variables. The conditional probability distribution model, which assumes that the output random variables constitute Markov random fields, has applications in named entity recognition, Chinese word segmentation, annotation and other natural language processing tasks, and has good performance.

This section mainly introduces the conditional random field theory, including the definition of the conditional field, and various representation methods.

2.1 Definition of conditional random field model

Let X and Y be random variables, and.P (Y|X) is the conditional probability distribution of Y under the given X condition. If the random variable Y constitutes a Markov random field represented by the undirected graph G = (V, E), $P (Y_{υ} | X, Y_{ω}, ω \neq υ) = P (Y_{υ} | X, Y_{ω}, ω \sim υ)$ (1)

For any node v to be established, the conditional probability distribution is called coincidence. P (Y|X) is a conditional random field. Where ω ∼ υ denotes that G = (V,E) is all nodes w connected to the node v in the graph G, ω ≠ υ denotes all nodes except the node v, and Y_υ and Y_ω are the nodes v and the random variable corresponding to w.

In reality, it is generally assumed that X and Y have the same graph structure, and the linear chain shown in the following two figures is used.

Then, assuming that P(Y, X) is a linear chain condition random field, the conditional probability that the random variable Y takes the value of y has the following form under the condition that the random variable has a value of x: $\begin{matrix} P (y, x) = \frac{1}{Z (X)} exp \\ {\sum_{i, k} λ_{k} t_{k} (y_{i - 1}, y_{i}, x, i) + \sum_{i, l} u_{l} s_{l} (y_{i}, x, i)} \end{matrix}$ (2) $\begin{matrix} Z (x) = \sum_{y} exp \\ {\sum_{i, k} λ_{k} t_{k} (y_{i - 1}, y_{i}, x, i) + \sum_{i, l} u_{l} s_{l} (y_{i}, x, i)} \end{matrix}$ (3)

Where t_k a feature function is defined on the edge, called a transition feature, depending on the current and previous position, and s_l is a feature function defined on the node, called a state feature, depending on the current position. Both t_k and s_l depend on the position and are local feature functions. Generally, the feature functions t_k and s_l have a value of 1 or 0, and when the feature condition is satisfied, the value is 1, and otherwise it is 0. The conditional random field is completely determined by the weights λ_k and u_l of the characteristic functions t_k, s_l.

2.2 Conditional learning field learning algorithm

This section discusses the problem of estimating the conditional random field model parameters for a given training data set, i.e. the conditional random field learning problem. The conditional random field model is actually a log-linear model defined on time series data, and its learning methods include maximum likelihood estimation and regularized maximum likelihood estimation. The specific optimization implementation algorithm has improved iterative scale method IIS, gradient descent method and quasi-Newton method. We choose the BFGS algorithm of the proposed Newton method for a brief introduction.

The conditional random field model learning can apply the Newton method or the quasi-Newton method. For the conditional random field model. $P_{ω} (y, x) = \frac{exp {\sum_{i = 1}^{n} ω_{i} f_{i} (x, y)}}{\sum_{y} exp {\sum_{i = 1}^{n} ω_{i} f_{i} (x, y)}}$ (4)

The objective function of learning is:

$\begin{matrix} min_{ω \in R^{n}} f (ω) = \sum_{x} \tilde{P} (x) log \sum_{y} exp {\sum_{i = 1}^{n} ω_{i} f_{i} (x, y)} \\ - \sum_{x, y} \tilde{p} (x, y) \sum_{i = 1}^{n} ω_{i} f_{i} (x, y) \end{matrix}$ (5)

The gradient function is: $g (ω) = \sum_{x, y} \tilde{P} (x) p_{ω} (y, x) f (x, y) - F_{\tilde{p} (f)}$ (6)

The BFGS algorithm of the Newton method is as follows:

Input: features function f₁, f₂, …, f_n, Empirical distribution is $\tilde{P} (x, y)$

Output: Optimal parameter value $\bar{ω} P_{\bar{ω}} (y, x)$

Initially selected point ω⁰, taking B₀ as a positive definite symmetry matrix, set k = 0;

Compute g (k) = ω^k, if g (k) =0, stop, else turn to (3);

Obtain p_k from B_kp_k = - g_k;

One-dimensional search: obtain λ_k when $f (ω^{(k)} + λ_{k} p_{k}) = min_{λ ⩾ 0} f (ω^{(k)} + λ p_{k})$

Set ω^(k+1) = ω^(k) + λ_kp_k;

Compute g_k+1 = g (ω^(k+1)), if g_k = 0, stop, or obtain B_k+1 by: $B_{k + 1} = B_{k} + \frac{y_{k} y_{k}^{T}}{y_{k}^{T} δ_{k}} - \frac{B_{k} δ_{k} δ_{k}^{T} B_{k}}{δ_{k}^{T} B_{k} δ_{k}}$

Where, y_k = g_k+1 - g_k, δ_k = ω^k + 1 - ω^(k) Set k = k+1, turn to (3).

3 Structured SVM-based query intent understanding method

This section describes the representation of semantic intents based on natural language sentences. Firstly, the semantic intent representation of natural language sentence query is introduced. Then, the grammar of generating semantic intent is introduced. Finally, the analysis of natural language sentence query intent is transformed into structural prediction problem, and the corresponding learning algorithm is given. In general, the task defined in this chapter is to map natural language sentence queries into corresponding semantic intent representations. Note that we are a grammar based on a specific search context definition. By redefining the grammar that suits the desired situation, the method proposed in this paper can be generalized to other search scenarios.

3.1 Structured SVM-based learning algorithm

In general, the measure of the correctness of a predictive parse tree is the F1 value (for example, Johnson’s work [29]). Specifically, the harmonic mean of the correct rate and the recall rate is calculated based on the nodes that overlap between the trees. We will use this type of loss function and introduce a standard 0-1 loss function as a measure of the benchmark. Suppose z and z_i are two output parse trees, and |z| and |z_i| are the number of brackets in z and z_i, respectively. Let n be the same number of brackets in both trees. Then the loss function of trees z and z_i can be calculated as follows: $F - loss (z_{i} z) = 1 - \frac{2 \times n}{| z | + | z_{i} |}$ (7) $zero - one (z_{i} z) = \begin{matrix} 1 & if z_{i} \neq z \\ 0 & otherwise \end{matrix}$ (8)

3.2 Maximization algorithm

Note that the learning function can be calculated by looking for the structure y ∈ Y by the maximal algorithm such that F (x ; y ; ω) =〈 ω, δω_i (y) 〉 is maximal. To do this, we use the CKY parser developed by Mark Johnson and integrate it into our algorithm.

3.3 Results and analysis

This section will introduce the semantic intent of the natural language sentence query proposed in this paper to indicate the validity of the learning method. We conducted two sets of comparative experiments. The first set of experiments was used to demonstrate the performance of the learning method proposed in this paper, including three evaluation indicators: correct rate, recall rate and F1 value. The second set of experiments was used to explore the effects of related kernel functions on learning outcomes. The results using cross-test on the MSItent dataset is analyzed as Table 1.

Table 1
Analyze results using cross-test on the MSItent dataset

Parameters Test Recall Test accuracy Test F1

PCFG 79.10 89.83 84.12

$0 / 1 - loss ({SVM}_{2}^{Δ m})$ 83.43 88.05 85.78

$0 / 1 - loss ({SVM}_{2}^{Δ s})$ 83.26 88.57 85.47

$F - loss ({SVM}_{2}^{Δ m})$ 83.42 88.72 85.97

$F - loss ({SVM}_{2}^{Δ s})$ 83.01 88.39 85.60

Parameters	Test Recall	Test accuracy	Test F1
PCFG	79.10	89.83	84.12
$0 / 1 - loss ({SVM}_{2}^{Δ m})$	83.43	88.05	85.78
$0 / 1 - loss ({SVM}_{2}^{Δ s})$	83.26	88.57	85.47
$F - loss ({SVM}_{2}^{Δ m})$	83.42	88.72	85.97
$F - loss ({SVM}_{2}^{Δ s})$	83.01	88.39	85.60

In addition, the structured SVM may produce some “NULL” value output on the test set, probably because the grammar generated by the structured SVM does not derive the sentence. But in general, the method we proposed has a higher recall rate.

This chapter attempts to explore the semantic meaning of natural language sentence query from a new perspective, that is, the natural language sentence query is parsed into the corresponding semantic intent representation. Firstly, we introduce a hierarchical structure to represent the semantic intent of natural language sentence query. Then, an automatic learning method for natural language sentence query semantic intent expression is proposed, and the natural language sentence query and corresponding are constructed manually. The corpus of semantic intent representations. The experimental results on the annotated corpus show that our method achieves very good performance in terms of accuracy and F1 values. Therefore, we can infer that the structured support vector machine is very suitable for the semantic intent learning problem of natural language sentence query. We also use the conditional random field model to obtain semantic annotation results with high accuracy, which brings benefits to the query semantic intent representation learning preprocessing.

The main drawback of the proposed method is the limitation of querying the sequence of words in natural language sentences. We note that although it applies to the query semantic intent modeling of this task, it may be more beneficial to ignore these restrictions. This issue will be one of the directions for our future development. Another interesting and very important issue for future research is to extend fully supervised SVM learning to semi-supervised SVM learning. In this way, the semantic intent of natural language queries can be learned by processing the annotated and unlabeled data, which greatly saves manpower and material resources.

4 Mining Intent based on deep learning

In the previous chapter, we implemented the SVM-based consumer intent classifier and further improved the performance of the classifier by introducing external corpus for migration learning. However, by observing the classification results, we found that the SVM classifier does not implement the mining of semantic level information, which leads to the poor classification of SVM for those microblogs that do not have explicit consumption intention trigger words. In order to solve this problem, we introduce a deep learning-based classification model of consumption intentions, and combine it with distributed word representation to achieve a deeper understanding of Weibo text.

4.1 Distributed word representation model

In Chapter 2, the SVM-based consumer intent classification model, we use the word bag model to represent words. This model represents each word as a vector of length equal to the length of the dictionary, and only one dimension in the vector is 1, and all other dimensions are 0, which we call a One-hot representation. The advantage of the One-hot representation is that it is very concise, but its disadvantage is that it does not represent any semantic features itself, and the words and words are completely isolated. Due to problems with the One-hot representation, Hinton [30] et al. proposed a distributed word representation (Word Embedding or Distributed Word Representation) model in 1986. This distributed word vector represents each word as a real-value word vector of the same dimension (such as 100 dimensions), so it can represent much more information than the One-hot representation, and can express some semantic level. Characteristics.

4.2 Introduction to deep learning

Deep Learning is a new direction in the field of machine learning. Its main idea is to learn deep representation and abstraction from text, image, speech and other data. Commonly used models for deep learning include Auto encoder, Denoising Auto encoder(DAE) [31 –34], Stacked Denoising Auto encoder (SDAE) [35 –37], Recursive Auto encoder, Deep Belief Network, Deep Boltzmann Machine, Recurrent Neural Network and so on. At present, deep learning has achieved very good results in a large number of natural language processing tasks. For example, in language models, part-of-speech tagging, word segmentation, named entity recognition, sentiment analysis, and other tasks, deep learning-based models reach or approach the state-of-the-art level [38, 39].

The consumption intention classification and sentiment analysis tasks in this paper have certain similarities. In the field of sentiment analysis, most of the research work is based on the word bag model, and uses a large number of artificially constructed resources (such as emotional word dictionary, etc.). However, in recent years, some research work using distributed word representation and deep learning models has emerged and achieved good results.

4.3 Classification of consumption intentions based on distributed word representation and SDAE

Denoising Auto Encoder (DAE) was proposed by Bengio et al. [11] and is an improvement of the Auto Encoder model. DAE introduces “destruction” of the input vector to force the hidden layer in Auto Encoder to learn more powerful features. Usually, in order to destroy the input vector, we can randomly set some inputs to 0. The DAE first encodes the input and then attempts to recover the original input vector from the corrupted input vector. In this way, the DAE can learn the implicit associations between multiple samples.

Stacked Denoising Auto encoder (SDAE) is a stack of multiple DAEs. The training of SDAE is divided into two steps:

Perform Pre-training, train DAE layer by layer from the first layer, and use the output of the kth layer as the input of the k+1th layer. This step is unsupervised.

Perform Fine-tuning to further adjust the parameters in each layer. We use the output of the last layer as input to a logistic regression layer and add supervisory information to the logistic regression layer (obtained by previous manual annotation). After that, we train the entire SDAE and adjust the parameters in all hidden layers.

The SDAE model framework used in this chapter is shown as Fig. 1:

Fig. 1

SDAE schematic.

We first train the first auto encoder in the left picture, which receives the original input vector and gets an encoding function c. After that, we use the output of the first encoder as the input of the second encoder to get the encoding function $f_{θ}^{2}$ . Finally, as shown on the right, this process is repeated, and a cascading denoising autoencoder network, SDAE, is obtained.

4.4 Experimental results

The experimental corpus in this section is still the microblog corpus in Section 2.6. We compared the following models: Word Embedding+SDAE+Logistic Regression (WE+DAE+LR), SDAE+ Logistic Regression (DAE+LR), Word Embedding+ Logistic Regression (WE+LR), and Word Embedding+SVM (WE+SVM), and the three models in Chapter 2. The experimental results are shown in the Table 2:

Table 2
Consumer intent classification experiment results

Model Precision Recall F-measure

SVM 0.74 0.70 0.72

Naïve Bayes 0.75 0.75 0.75

Naïve Bayes+Co-Class 0.73 0.80 0.76

WE+SVM 0.77 0.70 0.73

WE+LR 0.76 0.75 0.75

SDAE+LR 0.82 0.85 0.83

WE+SDAE+LR 0.83 0.85 0.84

Model	Precision	Recall	F-measure
SVM	0.74	0.70	0.72
Naïve Bayes	0.75	0.75	0.75
Naïve Bayes+Co-Class	0.73	0.80	0.76
WE+SVM	0.77	0.70	0.73
WE+LR	0.76	0.75	0.75
SDAE+LR	0.82	0.85	0.83
WE+SDAE+LR	0.83	0.85	0.84

As can be seen from the above table, the introduction of Word Embedding can significantly improve the classification results, and the SDAE model is significantly better than SVM and Naïve Bayes. The combination of WE+SDAE+LR achieved the highest F-measure.

This chapter first points out the problem of the word bag model used in Chapter 2, the SVM/Naïve Bayes model, and then proposes a solution to these problems using Word Embedding and Stacked Denoising Autoencoder. The experimental results show that both Word Embedding and Stacked Denoising Autoencoder can improve the classification of consumption intentions, and the model combination of Word Embedding+StackedDenoisingAutoencoder+Logistic Regression achieves the best classification of consumption intentions.

5 Conclusions

Opinion mining gains more comprehensive consumer opinions for consumers and businesses through the collection and processing of online product reviews. The opinions in the comments have an impact on the consumer’s intention to consume, avoid blind consumption, and reduce the risk in consumption. At the same time, it also facilitates real-time tracking of customer opinions by customers, and adjusts product quality issues and service satisfaction issues to improve customer satisfaction. However, most of the current opinion mining work is directed at English text processing. In the Chinese text processing, it often has a lower accuracy rate and recall rate. This paper mainly studies Chinese online commentary mining. Based on hotel reviews, book reviews, and computer reviews, text mining techniques are used to extract words of interest in comments. On the basis of the research of existing methods, the methods of extracting feature word extraction, comment word extraction and merging synonym in Chinese product review are improved. This paper introduces the current situation of product comment opinion mining and common feature extraction and synonym merging methods, and analyzes the advantages and disadvantages of these methods, and clarifies the research significance and practical application value of product review text mining. The natural language processing of Chinese comment texts is completed based on the natural language processing platform. According to the natural language analysis, the review texts are divided into four categories. In view of the phenomenon that the product product text mining has poor effect on Chinese product review feature extraction, the research focuses on the product text feature extraction method. On the basis of this, an improved Chinese product comment feature word and opinion word extraction algorithm is proposed. By combining the conjunction word with the newly defined feature word and opinion word, the feature word and opinion word are comprehensively extracted, and the product review is verified through experiments. The validity of the feature word and opinion word extraction method. On the basis of the analysis and implementation of the existing synonym merging method, in order to meet the requirements of extracting the briefness of the information, the product comment feature words and opinion words are combined based on the synonym word forest extension. The experiment proves that the method can effectively improve the briefness of information.

Footnotes

Acknowledgment

This work was supported by Natural Science Foundation of Hubei Province of China(Grant No. 2018CFB681).

References

Hanyu and

Yu , Fast deconvolution for motion blur along the blurring paths, Canadian Journal Of Electrical And Computer Engineering-Revue Canadienne De Genie Electrique Et Informatique 40(4) (2018), 266–274.

Zhenghua ,

Yaozong and

Qian . Spatially adaptive denoising for X-ray cardiovascular angiogram images, Biomedical Signal Processing And Control 40 (2018), 131–139.

Yi-Bin ,

Ya-Jun and

Han-Xin , Research on improved edge extraction algorithm of rectangular piece, International Journal Of Modern Physics C 29(1) (2018), 77–78.

Xuesong ,

Zhixin and

Qinghua , Intelligent inversion method for pre-stack seismic big data based on MapReduce, Computers & Geosciences 110 (2018), 81–89.

Xuesong ,

Wenyin and

Qinghua , Contaminant source identification of water distribution networks using cultural algorithm, Concurrency And Computation-Practice & Experience 29(4) (2017): Document number: UNSP e4230

Qinghua ,

Liping and

Zhixin , Research of pre-stack AVO elastic parameter inversion problem based on hybrid genetic algorithm[J], Cluster Computing-The Journal Of Networks Software Tools And Applications 20(4) (2017), 3173–3183.

Qinghua ,

Zhixin and

Xuesong , Research on the parameter inversion problem of prestack seismic data based on improved differential evolution algorithm, Cluster Computing-The Journal Of Networks Software Tools And Applications 20(4) (2017), 2881–2890.

Zhenghong ,

Huabing and

Cuina , Fast non-rigid image feature matching for agricultural UAV via probabilistic inference with regularization techniques, Computers And Electronics In Agriculture 143 (2017), 79–89.

Xiaogang and

Q.A.

Wang , 2D numerical study of polar active liquid crystal flows in a cavity, Computers & Fluids 155(SI) (2017), 33–49.

10.

Yu ,

Jie and

Xia , Poissonian image deblurring method by non-local total variation and framelet regularization constraint, Computers & Electrical Engineering 62 (2017), 319–329.

11.

Xuesong ,

Tao and

Qinghua , An improved cultural algorithm and its application in image matching, Multimedia Tools And Applications 76(13) (2017), 14951–14968.

12.

Fan and

Qi-Ling , Systemic estimation of dam overtopping probability: Bayesian networks approach, Journal Of Infrastructure Systems 23(2) (2017): Document number: 04016037

13.

Rong ,

Chunling and

Duo , All-optical control of weak-light transport and Fano-like resonance using control-probe technique in a quantum-dot-pillar microcavity system, Journal Of Applied Physics 121(14) (2017): Document number:144303

14.

Yue ,

Changya and

Jianzhong , Tool path generation algorithm based on covariant field theory and cost functional optimization and its applications in blade machining, International Journal Of Advanced Manufacturing Technology 90(1-4) (2017), 927–943.

15.

Mamta, Review on comparision of relational database model and other database models, International Journal of Innovative Research in Computer and Communication Engineering (IJIRCCE) 6(8) (2018).

16.

Deng ,

Yan-Duo and

Wei , Efficient vulnerability detection based on an optimized rule-checking static analysis technique, Frontiers Of Information Technology & Electronic Engineering 18(3) (2017), 332–345.

17.

Yong ,

Jiahao and

Huihui , Robust image feature matching via progressive sparse spatial consensus, IEEE Access 5 (2017), 24568–24579.

18.

Tao ,

Zixiang and

Yanduo , Robust face super-resolution via locality-constrained low-rank representation, IEEE Access 5 (2017), 13103–13117.

19.

Shuiping ,

Xin and

Chengyi , Fast implementation for the singular value and eigenvalue decomposition based on FPGA, Chinese Journal Of Electronics 26(1) (2017), 132–136.

20.

Qinghua ,

Hanmin and

Xuesong , Multi-label classification algorithm research based on swarm intelligence, Cluster Computing-The Journal of Networks Software Tools and Applications 19(4) (2016), 2075–2085.

21.

Haihui ,

Zhihong and

Shuangyu , A novel real-time method for moving vehicle detection, Journal Of Internet Technology 17(7) (2016), 1501–1509.

22.

Hanyu ,

Xia and

Xiuhua , Multi-frame real image restoration based on double loops with alternative maximum likelihood estimation, Signal Image And Video Processing 10(8) (2016), 1489–1495.

23.

Pingjun , Haptics for product design and manufacturing simulation, IEEE Transactions On Haptics 9(3) (2016), 358–375.

24.

Xuesong ,

Qinghua and

V.S.

Sheng , A double weighted naive bayes with niching cultural algorithm for multi-label classification, International Journal Of Pattern Recognition And Artificial Intelligence 30(6) (2016): Document number:1650013

25.

Kalal and

Tiwari , A proposed method to personalize and extract the hidden knowledge through web usage mining and pattern discovery, International Journal of Innovative Research in Computer and Communication Engineering (IJIRCCE) 6(9) (2018).

26.

Hua and

Liu , A novel fast algorithm for the pseudo Winger-Ville distribution, Journal Of Communications Technology And Electronics 60(11) (2015), 1238–1247.

27.

Ying ,

Yan and

W.D.S.

Hill , A time series model coefficients monitoring approach for controlled processes, Chemical Engineering Research & Design 100 (2015), 228–236.

28.

Yuntao ,

Xiaobing and

S.H.

Cheung , Utilizing principal singular vectors for 2D DOA estimation in single snapshot case with uniform rectangular array, International Journal Of Antennas And Propagation 23 (2015): Document number:681251

29.

Shi-Hong ,

Fa-Ting and

Li , Research on improving adaptive variable step length MPPT algorithm, International Journal Of Sensor Networks 17(3) (2015), 139–145.

30.

Maithri and

Chandramouli , Clustering algorithms for high dimensional data –a survey, International Journal of Innovative Research in Computer and Communication Engineering (IJIRCCE) 6(10) (2018).

31.

Wu , A traffic motion object extraction algorithm, International Journal of Bifurcation and Chaos 25(14) (2015): Article Number 1540039

32.

Wu ,

Wang and

Zou , Research on internet information mining based on agent algorithm, Future Generation Computer Systems 86 (2018), 598–602.

33.

Ke ,

Wu ,

Wang and

Zou , Evaluation of developer efficiency based on improved DEA model, Wireless Personal Communications 12(4) (2018), 3843–3849.

34.

Wu ,

Wang and

Yuntao , Sewage information monitoring system based on wireless sensor, Desalination And Water Treatment 12 (2018), 73–83.

35.

Wu ,

Wang and

Zou , Bidirectional cognitive computing method supported by cloud technology, Cognitive Systems Research 52 (2018), 615–621.

36.

Wu ,

Wang ,

Jin , et al., Uniform L1 stability of the inelastic boltzmann equation with large external force for hard potentials, Discrete And Continuous Dynamical Systems Series S 12(4-5) (2019), 1005–1013.

37.

Xuesong ,

Hanmin and

Zhixin , Hybrid genetic algorithm for engineering design problems, Cluster Computing-The Journal Of Networks Software Tools And Applications 20(1) (2017), 263–275.

38.

Haibin ,

Shejie and

Dianhua , Tied factor analysis for unconstrained face pose classification, OPTIK 127(23) (2016), 11553–11566.

39.

Yuntao ,

Leshem and

J.J.

Rindom , Joint pitch and DOA estimation using the ESPRIT method, IEEE-Acm Transactions On Audio Speech And Language Processing 23(1) (2015), 32–45.

Internet public informatioan text data mining and intelligence influence analysis for user intent understanding

Abstract

Keywords

1 Introduction

2 Conditional random field model

2.1 Definition of conditional random field model

3.1 Structured SVM-based learning algorithm

3.3 Results and analysis

4.1 Distributed word representation model

4.2 Introduction to deep learning

4.3 Classification of consumption intentions based on distributed word representation and SDAE

Table 2 Consumer intent classification experiment results Model Precision Recall F-measure SVM 0.74 0.70 0.72 Naïve Bayes 0.75 0.75 0.75 Naïve Bayes+Co-Class 0.73 0.80 0.76 WE+SVM 0.77 0.70 0.73 WE+LR 0.76 0.75 0.75 SDAE+LR 0.82 0.85 0.83 WE+SDAE+LR 0.83 0.85 0.84

Footnotes

Acknowledgment

References

Table 2
Consumer intent classification experiment results

Model Precision Recall F-measure

SVM 0.74 0.70 0.72

Naïve Bayes 0.75 0.75 0.75

Naïve Bayes+Co-Class 0.73 0.80 0.76

WE+SVM 0.77 0.70 0.73

WE+LR 0.76 0.75 0.75

SDAE+LR 0.82 0.85 0.83

WE+SDAE+LR 0.83 0.85 0.84