Price forecast of Yunnan Pu’er tea based on the MLP neural network

Abstract

Pu’er tea as the landmark product of Yunnan province, which plays a decisive role in the national tea industry and the economic development of Yunnan province. At present, the qualitative and quantitative methods cannot be accurately predicted the price of Pu’er tea. In order to solve this problem, this paper tries to use MLP neural network to forecast Pu’er tea price. Firstly, determine the influencing factors of Pu’er tea price, then established the MLP neural network model, selection activation function and design the number of neurons and the hidden layer size, finally compare the method between MLP and random forest to verify the feasibility and effectiveness of the MLP method.

Keywords

MLP neural network random forest price predict

1. Introduction

According to the China and Yunnan Statistical Yearbook and China Tea Circulation Association market analysis report, in 2018, the national tea production has reached 2.61 million tons, and the number of tea drinkers is nearly 500 million. Production of Pu’er tea was 170 thousand tons, accounting for 6.69 percent of the national tea production. There are more than 8 million tea farmers and more than 11 million people in this industry, accounting for about 1/4 of the total population of Yunnan province. Pu’er tea as the landmark product of Yunnan province, which plays a decisive role in the national tea industry and the economic development of Yunnan province. Therefore, Pu’er tea has great significance and status both for the development of national tea industry and Yunnan economy. During the 16 years of development, Pu’er tea industry has experienced two changes – crash in 2007 and tumble in 2014. Until 2018, the production of Pu’er tea has increased to 170 thousand tons, and the price of Pu’er tea also recovered to a certain extent. However, the price fluctuation of Pu’er tea without any warning will directly harms the tea worker’s interests and is harmful to the healthy development of this industry.

The price forecast method are qualitative forecast methods and quantitative forecast methods. Among them, the qualitative forecast method is highly subjective and easy to be affected by human factors, and the forecast results are higher uncertain and unstable. Conversely, the forecast results of the quantitative forecast method have strong certainty and stability. Most quantitative methods adopt intelligent algorithms, which mainly include artificial neural network algorithms, gray forecast algorithms, genetic algorithms, wavelet analysis and so on. The artificial neural network algorithm is one kind of mathematical model for simulating the biological neural system for information processing. The constructed predictive model is more intelligent and has a higher forecast rate. In particular, the traditional artificial neural network algorithm is improved to avoid the disadvantages of the traditional algorithm and further improve the accuracy of the forecast. At present, Pu’er tea price forecasting method is mainly adopt qualitative method, predict tea prices based on experience, such as climate drought reduced Pu’er tea production and then increase the price, economic conditions affect consumers’ consumption power, which causes price fluctuations. Although the above qualitative analysis method has certain rationality, it also has great subjectivity, so it is impossible to accurately predict the price. The quantitative analysis methods in this field are mainly based on relevant data to make forecasts. For example, Xu [1] used field investigation method to investigate the price of Pu’er tea. However, due to the limitations in manpower and technology, only three locations could be selected so the investigation did not fully reflect the situation of the tea industry. Dou [2] based on the analysis of raw materials and labor costs, it is concluded that spring tea in 2013 will have a certain increase but in a very reasonable 10% to 15%. In recent years, with the rapid development of artificial intelligence, more and more scholars have tried to use machine learning methods to predict the prices of various industries, such as Zahid et al. [3] based on enhanced convolutional neural networks and enhanced support vector regression to predicts electricity prices and loads. Berradi and Lazaar [4] used principal component analysis combined with recurrent neural networks to predict the stock price of the Casablanca Stock Exchange. Peng et al. [5] used a recurrent neural network to predict stock prices and increase the accuracy by about 30%. Wang and Xue [6] based on the cyclic neural network to predict the stock price. Li and He [7] raised license plate location algorithm based on convolutional neural network. Bento et al. [8] raised the short-term price forecast method of neural network and wavelet transform based on bat algorithm. Loureiro et al. [9] used deep neural networks to predict fashion sales. Chen et al. [10] used BP neural networks to predict building energy consumption. Wu et al. [11] used the method of mean influence value (MIV), genetic algorithm (GA) and BP neural network to conduct cotton price forecast research and concluded that the method has good fitting precision and high forecast accuracy. The sample can reflect 99% of the sample characteristics. Mostafa and El-Masry [12] used gene expression programming and artificial neural networks to predict oil prices.

By studying domestic and foreign literature, scholars mainly use neural network methods to predict prices of oil, stock and real estate. However, in the field of Pu’er tea, no scholars use neural network methods to forecast. Therefore, attempt to use the neural network method for price forecast is an attempt and exploration in this field, thus providing some suggestions for the government to formulate relevant policies and guarantee healthy development of the tea industry.

2. Analysis influence factors of Pu’er tea price

2.1 Influence factor index system

The price of Pu’er tea was affected by many factors. By analyzing relevant factors and referring to relevant literature, this paper takes the price of Pu’er tea as explained variable and the production of Pu’er tea, national tea production, sales volume of Pu’er tea and the number of tea drinkers in China as the explanatory variables. According to the theory of supply and demand, the increase of Pu’er tea production will lead to supply increase and thus affect the price. The increase of national tea production reflects the tea consumption groups or per capita consumption. According to the China and Yunnan Statistical Yearbook and China Tea Circulation Association market analysis report, the proportion of Pu’er tea production is about 5.5%, thus can known the market share and market influence of Pu’er tea. The sales volume of Pu’er tea directly reflects the market recognition and acceptance of Pu’er tea. According to the statistical data, the sales volume of Pu’er tea increases year by year but have different growth proportion. When sales increase to a certain extent, supply will exceed demand in the market and causes price drop. The number of tea drinkers nationwide is equal to the number of tea consumers, the per capita consumption of tea is 3.3 kilograms every year, while the sales volume of Pu’er tea accounts for about 4.5% of the national tea sales volume. The tea consumption increase will expand the demand of tea, thus affecting the price of Pu’er tea. Among them, the price of Pu’er tea is set as (RMB/kg), production of Pu’er tea is X1 (tons), national production of tea is X2 (tons), sales of Pu’er tea is X3 (tons), and the number of tea drinkers is X4 (100 million). Statistical data such as Table 1.

Table 1
Price and influencing factors of Pu’er tea from 2003 to 2018

Year	X1	X2	X3	X4	Y
2003	1.5	76.81	0.04	2.87	3.00
2004	2	83.52	0.15	2.96	7.00
2005	5.20	93.49	0.35	3.02	14.80
2006	8	102.81	0.87	3.35	53.30
2007	9.9	101.02	1.67	3.51	20.00
2008	5.28	125.48	2.66	3.89	13.08
2009	4.5	135.06	3.19	4.13	16.35
2010	5.08	146.25	3.64	4.20	19.62
2011	5.56	160.76	4.15	4.43	21.59
2012	8.13	176.15	4.70	4.49	24.46
2013	9.69	188.72	5.38	4.58	24.95
2014	11.4	204.93	6.17	4.65	27.04
2015	12.9	227.66	6.84	4.71	23.61
2016	13	231.33	7.41	4.78	16.51
2017	14	246.04	7.82	4.96	20.70
2018	14.42	261.60	8.74	5	25.35

Source: China Statistical Yearbook, Yunnan Statistical Yearbook.

2.2 Analysis of abnormal point

The reasons for this abnormal fluctuation are as follows: Pu’er tea has attracted people’s attention due to its characteristics of increasing fragrance and value. From mainstream media to portal websites, Pu’er tea was described as a financial product that can be invested for a long time. A large number of transactions by major auction houses have further increased people’s collecting enthusiasm. Serious drought in Yunnan, forced a large number of merchants rise tea price. But this industry is not yet mature, Dayi, Xiaguan mass release goods caused the entire Pu’er tea collection market was broken. People blindly hype Pu’er tea as will bring huge investment returns and storage, but after hot and market speculation the tea market gradually become cold and cheerless, a lot of Pu’er tea in stock caused the serious mismatch between supply and demand, coupled with drought, earthquake, etc., brought a fatal blow to Pu’er tea market in 2007 and tea prices fell back quickly. Due to the large abnormal fluctuation in 2006, the prediction model may be disturbed. Therefore, established the MLP neural network model to deal with the nonlinear relationship to achieve a more optimized fitting effect.

3. Neural network price prediction model

Perceptron is a concept in artificial neural network, which is divided into single-layer and multi-layer perceptron. The single-layer perceptron only contains input and output layer, and the input layer is directly connected to the output layer. This paper selects a multi-layer perceptron that can deal with nonlinear problems. It is a multi-layer feed-forward artificial neural network model. In addition to the input and output layer, it also contains one or more hidden layers. The structure is shown in Fig. 1. Among them, input layer neurons receive input signals, hidden and output layer carry out nonlinear processing of input signals, finally output results from the output layer.

Figure 1.

Multilayer feed-forward artificial neural network.

When using the neural network to construct the price forecasting model of Pu’er tea, this paper takes four variables as characteristics. Therefore, the input layer of this prediction model contains 4 neurons, and the number of neurons in the hidden layer is $q$ , where the threshold value of $h$ neuron is $\gamma_{h}$ weight of the $i$ neuron from the input layer to the hidden layer $h$ neuron is $w_{ih}$ , and $f(x)$ is the activation function. The input value $\alpha_{h}$ and output value $b_{h}$ of the $h$ neuron in the hidden layer are:

$\displaystyle\alpha_{h}=\sum\limits_{i=1}^{4}{w_{ih}x_{i}}$ (1) $\displaystyle b_{h}=f(\alpha_{h}-\gamma_{h})$ (2)

Considering the price prediction is a regression problem, the neural network contains only one output layer neuron. The threshold value of neurons in the output layer is $\theta$ , the weight of neurons in the hidden layer $h$ to the output layer is $w_{h}$ , then the input value and output value of neurons in the output layer are respectively Eqs (3) and (4):

$\displaystyle\beta=\Sigma{w_{h}b_{h}}$ (3) $\displaystyle\hat{y}=f(\beta-\theta)$ (4)

4. Experiment and analysis

The experiment in this paper consists two parts. One is the parameter analysis of MLP neural network. The second is use the neural network model and random forest method for comparative analysis.

4.1 Introduction to the experimental environment

This paper use Intel(R) Pentium (R) CPU N3540@2.16 GHz processor, 4 GB memory, 64-bit operating system, Python3.6.3 scikit-learn data package, MLP Regression model with a maximum of 2000 training times as termination condition.

4.2 Parameter analysis of MLP neural network

4.2.1 Selection of activation function

Activation function enable MLP model to deal with nonlinear problems, so the selection of activation function has a crucial impact on the accuracy of model prediction. In this paper, Relu function and Tanh function are used for comparison, and the number of neural network unit layers is set as 2, 3, 4 and 5 layers respectively to predict the price in 2017 and 2018. According to the comparative analysis of the experimental results of different activation functions, then the activation function is determined.

Table 2
Comparison of relative error results of activation functions

Activation function	2	3	4	5
Relu	8.86%	9.62%	8.42%	5.35%
Tanh	73.03%	59.30%	52.58%	38.47%

Experimental results shows that the relative error of Relu are all smaller than Tanh activation function, so selected Relu as the activation function in this paper (see Table 2).

4.2.2 Design the number of neurons and layers

After determining the activation function, analysis the price prediction results of the number of hidden layer neurons and the number of hidden layer. The number of neurons is 10, 20, 30, 40, 50, 60. The number of cell layers is 2, 3, 4 and 5, using Relu activation function, the data from 2003 to 2016 are used as the training set, and the data from 2017 and 2018 are used as the verification set. The experimental results are as follows (see Table 3 and Fig. 2).

Table 3
The relative errors of the number of neurons and the number of cell layers

[height=0.8cm,width=3.2cm]Call layerNeuron number	10	20	30	40	50	60
2	27.17%	8.86%	14.22%	91.21%	9.78%	10.34%
3	15.24%	9.62%	14.72%	13.21%	15.70%	88.62%
4	7.94%	8.42%	12.90%	8.67%	9.16%	13.73%
5	7.24%	5.35%	9.35%	11.42%	14.16%	9.34%

Figure 2.

The relative errors of the number of neurons and the number of cell layers.

4.2.3 Analysis of training results

According to the experimental results, the average relative error rate of price prediction results in 2017 and 2018 is 5.35% when the number of neural units is 20 and the number of unit layers is 5, the verification value is close to the real value. However, whether it is the optimal result still needs further verification. Therefore, the data from 2003 to 2017 are adopted as the training set, and the data from 2018 as the verification set. The verification results are as follows (see Fig. 3).

According to the experimental results, the number of neurons is 20 and the number of layers is 2. The number of neurons is 20, and the number of layers is 5. Neuron number is 40, layer number is 4, these three points were close to the 2018 real value, but considering the relative error results in experiment 1, the number of neurons is 20, the layer number is 5 have the optimal prediction results, the above results show that the number of hidden layer neurons is too small to fit effect is insufficient, and continue to increase the number of neurons in ascension fitting effect is even worse. Finally, the number of neurons was set to 20 and the number of layers was set to 5.

4.2.4 Experimental results of abnormal points

In the previous paper, the abnormal sample point has been analyzed, but whether if this point will have an impact on price prediction needs to be further analyzed. The sample set contains data from 2003 to 2018, the data of 2006 are abnormal sample points. In order to verify whether this abnormal sample point has an effect on network training and thus affects the training accuracy of the network, this paper conducted 30 independent repeated experiments after eliminating the sample points in 2006. During analysis the influence of abnormal points, the experiment was set as the activation function with Relu function, the number of neurons was 20, and the number of hidden layers was 5, the result as shown in Table 4.

Table 4
Experimental results of abnormal points

Forecast analysis	Reserved anomaly point	Abnormal point elimination	Real value
2018 forecast	24.52	22.78	25.35
The relative error	3.3%	10.14%

Figure 3.

Training result.

According to the above experimental results, eliminate abnormal sample points does not have a significant impact on the training. Therefore, under the current parameter setting, even if there are abnormal samples in the sample set, they will not have significant impact on the output results, which shows strong robustness. The main reason is that the maximum iteration number was used as the termination condition during the experiment, rather than the training error. This can not only effectively prevent over fitting, but also reduce the impact of data disturbance effective.

4.2.5 Influence of different training sets on prediction results

In addition, considering the number of training samples will have an impact on the experimental results, this paper respectively use 5 sample sets from 2012 to 2016, 10 sample sets from 2007 to 2016 and 14 sample sets from 2003 to 2016 to predict the tea price in 2017 and 2018. The predicted results are shown in Table 5.

Table 5
Prediction results of different training sets

Training set	Relative error (%)	2017 Predicted value (RMB)	2018 predicted value (RMB)
2012–2016	22.35	17.60	34.83
2007–2016	21.84	22.46	16.45
2003–2016	5.35	22.23	24.52

Experimental results show that use 2012–2016 and 2007–2016 samples as the training sample set, the average relative error of the tea prices in 2017 and 2018 prediction are above 20%, compare with the 2003–2016 samples as the training sample have a large gap, so in order to the accuracy of experimental results, choose 2003–2016 as a training set of sample.

4.3 Compare and analysis with random forest method

In order to further verify the effectiveness of MLP algorithm, the random forest algorithm is used to compare and analyze the price prediction results. As a common nonlinear prediction method, random forest is an algorithm that integrates multiple trees through the idea of ensemble learning. Its basic unit is decision tree, while its essence belongs to ensemble learning, a branch of machine learning. In this paper, X1 from 2003 to 2016 is used for regression to obtain the estimated values of X1 in 2017 and 2018. Using this method, X2, X3 and X4 values from 2003 to 2016 are successively regression to obtain the estimated values of X2, X3 and X4. The results are as follows (see Table 6).

Table 6
Regression results of independent variables

	X1	X2	X3	X4
Estimates for 2017	12.73	240.53	7.88	5.19
Estimates for 2018	13.79	254.50	8.46	5.29

Then estimated values of X1, X2, X3 and X4 in 2017 and 2018 are used to forecast the tea price in 2017 and 2018, and the forecast results are 21.04 and 19.84 (see Table 7).

Table 7

Comparison results of MLP and Random Forest

Model	2017		2018		RE
Random forest	The real value	Predictive value	The real value	Predictive value	10.87%
	20.70	21.04	25.35	19.84
MLP	The real value	Predictive value	The real value	Predictive value	5.35%
	20.70	22.22	25.35	24.52

The above table shows that use the method of random forests, average relative error rate reached 10.87%, while use the MLP method the predictive value of 2017 is 22.22 RMB and 2018 forecasts is 24.52 RMB, and the average relative error is 5.35%, which verify the feasibility and effectiveness of the MLP method, this method has a certain reference value to Pu’er tea industry.

5. Conclusion

Based on analysis of the tea prices influence factors, firstly, choose the activation function, secondly, established the MLP neural network model, analysis the number of neurons and the number of hidden layers, then analysis and verification of the abnormal point, finally, comparing with random forests by MLP method and verify the feasibility and effectiveness of the MLP method. Using the sample data from 2003 to 2016 as the training set and the sample data from 2017 and 2018 as the verification set, the average relative error rate of MLP was 5.35%. This method can provide a reference for the government and has certain theoretical and practical significance.

However, due to the small sample data of Yunnan Pu’er tea industry, this paper may not consider the influencing factors comprehensively, which reduces the accuracy of MLP neural network to a certain extent, which needs to be further analyzed and improved in the future research.

Footnotes

Acknowledgments

National Natural Science Foundation of China (61762088): Intelligent method of urban logistics coordination under urban logistics service platform.

References

, Investigation on farmers of Pu’er tea in Yunnan Tea, Journal of Tea 39(1) (2012), 24–37.

Dou

, Pu’er tea has gone up all the way this year, Farm Staff and Observe Industry, 40.

Zahid

Ahmed

Javaid

and Abbasi

R.A.

, Electricity price and load forecasting using enhanced convolutional neural network and enhanced support vector regression in smart grids, Electronics 8 (2019), 122.

Berradi

and Lazaar

, Integration of principal component analysis and recurrent neural network to forecast the stock price of casablanca stock exchange, Procedia Computer Science (148) (2019), 55–61.

Peng

Liu

and Zhang

, Modeling and analysis of stock price prediction based on MLSTM, Computer Engineering and Application (2019).

Wang

and Xue

, Research on stock index price prediction based on circular neural network, Journal of Zhejiang University of Technology (4) (2019), 186–191.

and He

, A new algorithm of vehicle license plate location based on convolutional neural network, Journal of Computational Methods in Sciences and Engineering (18) (2018), 1021–1033.

Bento

P.M.R.

Pombo

J.A.N.

Calado

M.R.A.

and Mariano

S.J.P.S.

, A bat optimized neural network and wavelet transform approach for short-term price forecasting, Applied Energy (210) (2018), 88–97.

Loureiro

A.L.D.

Miguéis

V.L.

and da Silva

L.F.M.

, Exploring the use of deep neural networks for sales forecasting in fashion retail, Decision Support Systems (114) (2018), 81–93.

10.

Chen

Ren

T.-T.

and Wu

Z.-C.

, Research on neural network optimization algorithm for building energy consumption prediction, Journal of Computational Methods in Sciences and Engineering (18) (2018), 695–707.

11.

Liu

and Fang

, Research on cotton price prediction in China based on MIV-GA-BP neural network, Cotton Textile Technology (7) (2018), 77–80.

12.

Mostafa

M.M.

and El-Masry

A.A.

, Oil price forecasting using gene expression programming and artificial, Economic Modeling (54) (2016), 40–53.

13.

Gao

, Study on natural quality evaluation of cultivated land based on BP neural network, Wuhan: central China normal university, 2012.

14.

, Research on price prediction of second-hand housing in Beijing based on BP neural network, Capital University of Economics and Business, 2018.

15.

Cui

and Dou

, China carbon market price forecast based on EMD-GA-BP and EMD-PSO-ISSVM, Operation and Management (7) (2018), 133–143.

16.

and Zhao

, BP neural network coal demand forecasting model based on Matlab, China Management Science (10) (2008), 521–525.

17.

Jiang

, Prediction of landslide displacement time series based on regression neural network, Huazhong University of Science and Technology, 2017.

18.

Zhang

, Research on China’s corn futures price prediction method based on machine learning, Northeast Agricultural University, 2018.

19.

Qian

and Sun

, Exploration and practice of flower price prediction platform design, Northern Horticulture (20) (2018), 191–198.

20.

Liu

and Yao

, Real estate price prediction model based on the influence of multiple factors, New Theory, 2017.

21.

Ren

, Analysis on forecasting methods of agricultural product market prices, Chinese Agricultural Science Bulletin 148(26) (2011), 209–212.