Sportswear retailing forecast model based on the combination of multi-layer perceptron and convolutional neural network

Abstract

Apparel sales forecasting plays an important role in production planning, distribution decision, and inventory management of enterprises. Especially, the sportswear market has been shown rapid growth characterized by long-term sales. This paper proposes a sales forecasting model for sportswear sales based on the multi-layer perceptron (MLP) and the convolutional neural network (CNN). A novel loss function is also proposed to improve the prediction accuracy. The proposed model is trained and validated on the time-series retailing data collected from three offline local sports stores in China. The influencing factors of retailing forecasting, such as time-series sales data, product features, distribution strategy, shop size, and other parameters, were also defined. Experimental results show that the proposed forecasting model outperforms the compared statistical methods by a large margin. Specifically, the proposed model provided 65% prediction accuracy, while the compared methods provided 16% prediction accuracy. The results show that the proposed model could be potentially used in sportswear sales forecasting, especially offline clothing and other long lifecycle clothing fields.

Keywords

Sportswear sales forecast convolutional neural network long-term sales multi-layer perceptron

The rapid growth of the textile and apparel industries has attracted many scholars’ attention in clothing retailing forecasting research. Sales forecasting, also known as demand forecasting, contributes to implementing an efficient production management plan and plays a decisive role in the operational decision support system. A fast and robust estimation is essential for store merchandisers and brand producers in the product shipment planning.¹ It can also help retailers avoid product losses and enable flexible business planning.² It is crucial to understand the clothing products and impact factors that influence sales forecasting in order to choose proper forecasting methods and make reasonable sales forecasts.³ The early sales amount of a specific retail product is significant for inferring the overall sales life span, and is often considered a good indicator for retailers.⁴ Despite the significance, the overall product sales prediction using only early sales data has not been studied. On the other hand, various forecasting models have been studied in many applications.^5,⁶

Accurate retail forecasting is challenging due to the diversity of clothing types and the significant impact of marketing strategies on a decision for forecasting data models. Figure 1(c) demonstrates three major impact factors for clothing sales forecasts: (a) diverse features of clothing, (b) retailing strategy factors influencing forecasting result,⁷ and (c) forecasting methods and dataset. As shown in Figure 1(a), sports clothing, especially team-wear clothing with a relatively long-term lifecycle (more than 1 year), is selected as the target category. Sportswear is different from fast fashion, one-time shot, classical sales clothing such as the white shirt and jeans (the full-year selling items). In the sport fashion industry, regional distributors are a main global retailing channel of sports products.

Figure 1.

Selected research object and method of this work and two aspects influencing accuracy of sales forecasting.

The data source plays an important role in influencing the prediction results.⁸ Time-series data have been commonly used in apparel sales forecasting. However, the forecasting prediction accuracy is limited with a single time-series data as the input. Caglayan et al.⁹ showed that traditional forecasting methods based on time-series only consider product cycles, sales data, and sales trends, which were biased and inaccurate. In practical applications in offline clothing retail stores, more data are required to obtain higher prediction accuracy. Jain and Kumar¹⁰ proposed a hybrid prediction model combining the traditional time-series method with the artificial neural network (ANN). A multi-channel data-based forecast model was constructed to capture more complex time-series data, improving the sales forecast of a specific clothing type.

This paper focuses on the early sales of time-series sports apparel. A new machine learning-based forecasting model is proposed to estimate the retailing number of the selected sport styles. The proposed hybrid model is based on the combination of multi-layer perceptron (MLP) and convolutional neural network (CNN). Instead of just using historical sales data of similar products, the model utilizes the influencing factors, including promotion strategy, product attributes, early sales, promotional activities, and shop size. The proposed model is the first attempt to combine time-series and classification data, especially offline sales forecasting, for sports apparel. A 2-year sports product retailing dataset was used to compare different techniques and evaluate the forecast accuracy. The proposed model is applicable in forecasting sports/team-wear styles sales for merchandisers and small–medium industry with a small dataset.

Related works

Influencing factors on clothing sales

Frank et al.⁷ analyzed clothing sales forecasts involving both internal and external factors, such as product attributes, climate data, and media effects. Then, they proposed a mathematical sales forecast model for women’s clothing, in which fuzzy logic was used to synthesize the parameters and remove unnecessary factors. The ANN showed better results than the other two methods: the seasonal single exponential smoothing model and the Winters’ three-parameter model. However, it was shown that considering only specific factors limit the prediction accuracy. Aksoy et al.¹¹established a clothing demand prediction model based on the adaptive network-based fuzzy inference system, which employs more factors, such as fashion trends, distribution models, and competitive environments.

Several studies were conducted on investigating a specific influencing factor on the clothing retail forecasting. Guven and Simsir¹² investigated the effect of color parameters on clothing retail forecasting using ANN and support vector machines (SVM). The results showed that the ANN provided better accuracy than SVM with color parameters. Zhou et al.¹³ considered the influence of consumers’ purchasing preferences and seasons on the forecast. The traditional Bass model was improved by combining similar clothing feature data. The previous retailing forecasting research works involved two aspects: (a) defining exogenous influencing factors on clothing sales and (b) developing a suitable model for a specific clothing target market.

Thomassey¹⁴ divided exogenous factors related to clothing item sales into two categories: uncontrolled and controlled factors. Uncontrolled factors include macro-economic data, calendar data, competition, and weather data, while controlled factors include item features, fashion trends, retailing strategy, and marketing strategy. Macro-economic data involve the national and global economic situation. According to a study of the changes in the Australian apparel retail industry,¹⁵ the economic depression in the post-war period and the general decline in consumer demand caused inventory backlogs. After struggling with severe challenges for a while, the apparel manufacturing and sales industries have recently continued developing with economic recovery, technological innovation, and the expansion of import and export trade. Badorf and Kai¹⁶ maintained that consumers’ buying behavior would be affected by the uncontrollable factor: weather. The weather may cause consumers to postpone or abandon the purchase of products; this is an unpredictable and uncontrolled factor affecting retail volume and strategy. Ma and Fildes¹⁷ showed that the product sales volume of offline retail stores fluctuates significantly under the influence of seasonal cycles and holidays. Promotional activities during this period could result in a significant increase in sales. It is unrealistic to forecast a single piece of clothing retail from a macro level using uncontrollable factors.

Several studies were conducted on controllable factors of garment features. Consumers usually purchase clothing based on fashion trends and pay more attention to the style, color, design, and other clothing product features. Mo et al.¹⁸ found that appearance affects consumers’ purchases more than product quality. As many people purchase their clothes online these days, media channels (such as Twitter¹⁹ and Amazon²⁰) have also had a considerable impact on retailing strategy. Although external factors also have a certain impact on clothing retail, the extent of the effect is difficult to measure.

Clothing retail forecast methods

Clothing retail forecast methods are divided into three main categories: statistical methods, deep learning methods, and hybrid methods. Statistical models and neural network models were first explored for clothing retail forecasting.²¹ In recent years, deep learning methods have shown outstanding performance in many forecasting fields.

Statistical methods

The statistical forecast model was first proposed in 2002,²² and was an efficient prediction model that can predict thousands of historical datasets in few seconds. Since then, various statistical models have been studied for clothing forecasting: linear regression,²³ time-series regression,²⁴ moving average (MA), exponential smoothing,²⁵ autoregressive integrated moving average (ARIMA),^26–28 and driver moderator method.²⁹ A case study using the statistical model³⁰ was recently shown to be successfully applied to a Mexican fashion company’s sales increase, and to anticipate product behavior from early weeks. In this successful case in Mexico, the size of the shop, retailing number, and historical price of each item were used to define the shop ‘LEVEL.’ Then, the amount and price of the sales for new products were appropriately forecasted by utilizing the pre-knowledge of the shop ‘LEVEL.’ The statistical model typically consisted of the following steps: a model selection by a statistical analysis expert and the data structure analysis. The statistical models were fast, processing a huge amount of data in few seconds to obtain a final forecasting result. However, the statistical methods required pre-understanding of sales data structure, and determining a proper statistical model was time-consuming.

Deep learning methods

Recently, deep learning methods, especially ANN, have shown their ability to solve complex problems in many applications. Other successful machine learning methods in fashion applications include extreme learning machine,³¹ decision tree,³² and genetic algorithm.^33,34 The neural network was also applied in fashion color estimation, fashion forecasting,¹¹ and stock inventory. A large-scale adaptive non-linear dynamic system, ANN provides robustness and error tolerance with strong collective computing and learning capabilities. However, a deep neural network, such as ANN, requires a huge amount of data for training. Also, the training using gradient descent-based optimization is often relatively slow, as it requires many iterative learning steps to obtain good learning performance. Despite such a slow training speed, the ANN model has the advantage of learning complex features and patterns from the input data without requiring a deep understanding of the dataset or an additional labeling process.

Sales forecasting is very important in e-commerce to make efficient business decisions, and is affected and influenced by many factors, such as promotional activities, product competition, and user preferences. A deep sales forecasting framework (DSF) was proposed for e-commerce sales forecasting to reduce the influence of interference items in sales forecasting and capture more effective information.³⁵ A sales residual network was introduced on top of the decoder to establish a learning sequence, targeting to model the sales impact caused by product competition. The DSF improved the prediction accuracy for the e-commerce sales prediction, showing robustness to complex input data. In Zhao and Wang,³⁶ the CNN was used for sales forecasting in e-commerce. The employed CNN realized more accurate sales forecasting through its effective features learning ability. In view of the successful case of CNN being used in e-commerce sales forecasting, it can be derived from the trend that neural networks can also be used in sales forecasts with a small dataset.²¹

Hybrid methods

Clothing sales forecasting is complicated because it is affected by many factors, such as weather, marketing, psychological considerations, and fashion trend, which have a certain degree of volatility. Thus, it is difficult to achieve accurate prediction with a single mathematical model. To overcome this limitation, hybrid approaches have been studied, aiming to obtain a better forecasting process.

Similar to a single model case, there are two main categories: the statistical-based hybrid method and the deep learning-based hybrid method. Different types of hybrid models have been explored to address various problems in fashion forecasting applications, resulting in various levels of accuracy, speed, and performance. Fan et al. proposed a hybrid method that combines the Bass model and sentiment analysis.³⁷ Early sales with fewer historical sales data, using a hybrid method, presented outstanding performance.³⁸ In Huang and Lin,³⁹ a hybrid approach was applied to a Canadian fashion company stock-out estimation application. Grey system theory and fuzzy time-series⁴⁰ were used to forecast the growth of green electronic materials, taking advantage of the robustness to a limited number of industrial sample data. However, these two models required an understanding of theoretical assumptions and specific mathematical knowledge. Different forecasting models, fuzzy logic, neural networks, and data mining were proposed to evaluate real fashion data by Thomassey.⁴¹ Also, based on real data, the sourcing and forecasting processes were simulated and analyzed. In summary, the hybrid method combined the advantages of the statistical method, robustness, and deep learning, fast speed. The choice of different models within the same dataset will achieve very different prediction success rates in light of the above. Note that a hybrid method also showed better performance when pre-knowledge of the specific dataset is provided.

The proposed forecasting model

In the proposed method, the weekly sales units are modeled as time-series. Although in previous studies, sequence data have been mostly modeled with recurrent neural networks (RNN),⁴² some recent studies showed that CNN provides better performance over the RNN on specific tasks such as language translation. For instance, a simple convolutional architecture outperformed the state-of-the-art recurrent networks such as long short-term memory in a diverse range of applications and datasets.⁴³ It showed that CNN is more suitable than RNNs for domains where a long history is required, with the following three characteristics of CNN for time-series input data:

The convolutions are causal, indicating that only past information is used to predict future information.

The architecture can take any length sequence as an input and output any length sequence.

The architecture can build a very long history size. The receptive field of the network is directly linked to the network depth, indicating that a deeper network can use more historical data in predicting future information.

Another advantage of CNN over RNN is that CNN does not depend on any previous predictions. It uses only the current window (receptive field) for prediction, enabling the model to understand the sales data patterns occurring in that specific window.

The proposed method consists of the CNN and MLP to model time-series and process numerical sales data, respectively. The flowchart of the proposed method is depicted in Figure 2. Although it was originally designed for two-dimensional image data, CNN can be used to model time-series forecasting problems. The adopted CNN architecture in the proposed model is inspired by the Wave Net architecture,⁴⁴ proposed by Google DeepMind for raw audio time-series. The key idea of the architecture is adopting Dilated Causal Convolution to extract causal features only; in such a way, the network cannot see intermediate future time-series data. A sequence of dilated convolutions with increasing dilated rate allows increasing the receptive field of the network without compromising its performance. No pooling layer is required to sample the output feature vectors into a lower dimension. In the following sections, subnetworks are described in more detail.

Figure 2.

The flowchart of the proposed method.

MLP: feature extraction of the time-invariant metadata

For all styles in our database, we use additional time-invariant information to improve the sales forecasting, which is defined as follows:

Style_id: indicates the styles of clothing. Our dataset is composed of three styles (StyleA: 0, StyleB: 1, StyleC: 2).

Shop_size: indicates the size of the shop, usually related to the total amount of the sold products in the shop. The data are divided into three groups (SMALL: 1, MEDIUM: 2, BIG: 3).

Shop_id: differentiates each shop in our database.

Clothing_size: indicates the clothing size. Four different clothing sizes exist in our present dataset (S: 0, M: 1, L: 2, XL: 3).

Clothing_color: indicates the color of the current clothing. The clothes were classified into 12 different colors (Black, White, Red, Orange, Yellow, Blue, Green, Gray, Neon Yellow, Brown, Violet, Stone gray). There is no hard constraint on the color set, and more colors can be dynamically added to the proposed model.

Note that the categorical information can vary from one application to another. Clothing color is defined as a sequence of 12-bits (individual colors are either present or not). Each cloth is labeled with at most two main colors, meaning that the 12-bits color-encoding sequence can have at most two bits with value 1, and all other bits are set to value 0. For example, the bit sequence “110000000000” indicates that the cloth involves two main colors: black and white.

The categorical data are processed with the MLP model that consists of three hidden fully connected layers (FCL). The input is a vector of 16 variables ( $X_{i}, i \in {0, 1, \dots, 15}$ ) that are normalized to the range $[0, 1]$ for better stability and a faster convergence, computed as follows:

\bar{X_{i}} = \frac{X_{i} - X_{i}^{\min}}{X_{i}^{\max} - X_{i}^{\min}}

where

X_{i}^{\min}

and

X_{i}^{\max}

denote the min and max values of the

X_{i}

variable. Note that the clothing color variable is already normalized as it can be 0 or 1.

The detailed structure of the used MLP network is described in Table 1. Each hidden layer comprises a FCL followed by a rectified linear activation function (ReLU). The ReLU activation is widely used due to its simplicity yet robustness against the gradient-vanishing problem that hinders the deeper network from converging. The proposed MLP adopts 24 neurons for the first two hidden layers from an original 16 values input vector to extract interconnected features on the categorical data. The number of neurons, 24, was empirically selected after testing different numbers of neurons.

Table 1.

The detailed architecture of the proposed MLP

Layer type	Number of neurons	Notes
Input	16	All values are normalized in $[0, 1]$
Hidden layer 1	24	With ReLU activation
Hidden layer 2	24	With ReLU activation
Hidden layer 3	16	With ReLU activation
output	16

CNN: feature extraction of the time-series data

A 1D CNN is employed to process time-series retailing data. The network is composed of four dilated causal 1D-convolutions to extract the diverse features from time-series data. Each convolution has an increasing (doubled) dilation rate, increasing the receptive field ([1, 2, 4, 8]).

The FCL interprets the features extracted by the convolutional part of the model. The flatten layer converts the output from the convolutional layers into a single one-dimensional vector to fit the FCL. Table 2 shows the detailed structure of the proposed CNN. As shown in Table 2, four 1D-convolution layers are used (denoted 1DConv). All 1D convolution layers share common parameters such as a Kernel Size of 3 and a stride of 1 with causal padding.

Table 2.

The detailed structure of the employed CNN

Layer Type	Features Resolution (In to Out)	Notes
Input	9 × 4	All values are normalized in $[0, 1]$
1DConv 1	9 × 4 to 9 × 32	dilation rate of 1, with ReLU activation
1DConv 2	9 × 32 to 9 × 32	dilation rate of 2, with ReLU activation
1DConv 3	9 × 32 to 9 × 64	dilation rate of 4, with ReLU activation
1DConv 4	9 × 64 to 9 × 64	dilation rate of 8, with ReLU activation
Flatten	9 × 64 = 576
Hidden layer 1	576 to 128	With ReLU activation
Dropout 1	128	The dropout rate of 20%
Hidden layer 2	128 to 64	With ReLU activation
output	64

The input data of the CNN model is a subset of four time-series. The retailing data are sampled in weeks for a period of 2 years from 2018 to 2019. For each week of the year, the amount of the sold specific clothing is given along with the information of price and the fact if it was sold during a promotion week, as follows:

Sale_count: indicates the number of the sold specific clothing during the current week. Note that we assume the weekly sales range in $[0, 50]$ .

Promotion_period: is a binary indicator, where 1 means the current week is the promotion period, and 0 means otherwise.

Clothing_price: indicates the price of the clothing during the current week. The prices range in $[0.99, 199.99]$ .

Holidays_period: is a binary indicator, where 1 means the current week is the holiday period (school or public), and 0 means otherwise.

These four time-series data are normalized with the corresponding min-max range and then fed into the CNN as input data.

Feature fusion

The output vectors from two subnetworks, MLP and CNN, are merged via a concatenation layer. Then, the merged vector is processed with the final FCL to obtain the sales prediction score. The output of the CNN (modeling the time-series data) is a 64-features 1D-vector, while the output of the MLP (modeling the categorical data) is a 16-features 1D-vector. Accordingly, the dimension of the combined 1D-vector is 80-features.

Promotion-aware loss function

In this paper, a promotion-aware loss function is proposed based on the root mean square error (RMSE) that drives the model to be more performant on sales forecasting during the sports clothing promotion period. The promotion-aware loss is defined as follows:

l_{promo} = \sqrt{\frac{1}{\sum_{t \in S} w_{t}} \sum_{t \in S} w_{t} (\hat{y_{t}} - y_{t})^{2}}

where

w_{t} = (1 + P P_{t})

represents the weight of the current prediction timestamp

t

and

P P_{t} \in {0, 1}

indicating if the current prediction timestamp

t

falls in a promotion period. Thus,

w_{t}

doubles the quadratic error term

(\hat{y_{t}} - y_{t})

² for all promotion period samples.

The experimental results

Experimental dataset

We have collected sales data of three main-lines styles: T-shirt, Polo, and Pants (Style A, Style B, and Style C) from a specific sports brand in three different shops located in China. These three shops (labeled shop1, shop2, and shop3) are of different sizes to model the full spectrum of shop sales. The labels shop1, shop2, shop3, refer to a small, medium, and big size, respectively. The collected retailing data are summarized in Table 3 and are composed of:

Table 3.

An overview of the dataset: three selected sportswear products

Style	Year	Color	Size	Retailing shop
T-shirt	2018	black, white, blue	S,M,L,XL	shop1, shop2, shop3
	2019	black, white, dark gray	S,M,L,XL	shop1, shop2, shop3
Polo	2018	black, white/orange, brown/green	S,M,L,XL	shop1, shop2, shop3
	2019	black/white, white/black, gray/violet	S,M,L,XL	shop1, shop2, shop3
Pants	2018	black, white/black, gray/orange	S,M,L,XL	shop1, shop2, shop3
	2019	black, white/blue, grey/white	S,M,L,XL	shop1, shop2, shop3

Sale-units per week.

Sold-prices per week.

Metadata of the sportswear clothes (size, colors, etc.).

The examples of retailing data are illustrated to visualize the customer behavior in terms of size, promotion, and different styles. As shown in Figure 3, T-shirts were sold at a similar price from 2018 to 2019, except for the promotion period. The promotion of this brand was highly linked with public holidays, commercial holidays, and school holidays. During the promotion period, the number of retailing was increased for most sizes. Therefore, it is difficult to distinguish whether the sales behavior is due to holidays or promotion of the brand. In addition, consumers’ purchase choices regarding the size do not depend on promotions. Figure 4 illustrates different styles of clothes retailing curves in the function of promotion weeks. As shown in Figure 4, tops (T-shirts and Polo) were sold more than pants, which is matched to the general retailing experience. It is known that consumers, especially male customers, tend to buy tops over pants. Most of the retailing peak was related to holiday promotion. Figure 5 shows that the retailing amount of each shop was rapidly increased during holidays and promotion periods. The curve for the T-shirt shows that it was seasonally well retailed in both summers 2018 and 2019. Figure 6 depicts the retailing curve of the L-size T-shirt with three different colors in terms of promotion and holidays for shop2. As shown in Figure 6, for the same style of different colors, the retailing peak dramatically increases in some promotion weeks. However, in other promotion weeks, the sales volume was not improved. The amount of retailed Style A (T-shirt, size L, all color types) is summarized in Figure 7. It is hard to see a repeated retailing pattern in the correlation of months directly.

Figure 3.

Retailing data for Style A (T-shirt), Black, all sizes (S, M, L, XL) in shop2.

Figure 4.

Retailing data of all styles A, B, C (T-shirt, Polo, Pants), black, and size L in shop 2.

Figure 5.

Retailing data of Style A (T-shirt), Black, Size L in shop1, shop2, and shop3.

Figure 6.

Retailing data of Style A (T-shirt) in different colors (Black, White, and Dark blue (2018)/Dark Gray (2019)), Size L in shop2.

Figure 7.

The sum-up of retailing data of Style A (T-shirt) all colors (Black, White and Dark blue (2018)/Dark Gray (2019)), Size L in shop2.

The above sales data, together with clothing marketing experience, show that promotion/holidays, retailing price, clothing color and shop size/location, type of clothing are important indexes of purchasing behavior. However, it is difficult to quantify and summarize the tracking retailing prediction and how each index impacts the retailing number.

Experimental dataset split

The performance and robustness of the proposed method for sales forecasting are evaluated through qualitative evaluation and comparisons. The compared methods are autoregressive (AR), MA, and ARIMA methods. The compared methods were implemented using the Statsmodel library. For a fair comparison, the parameters of each model are tuned to give the least forecasting error along with our full test sets. To achieve this, we use a brute-force grid searching minimization for parameters. An iterative rolling forecast mechanism was used, in which each statistical model is recreated with the full data history of the current clothing after each new observation (forecast). Each test set is chosen in such a way that at least 6 months of data are available for training. For each test set, and for each single week prediction, all three statistical models are regenerated using all available previous sales data in an iterative rolling forecast manner. This is a small advantage compared with training sets used for the proposed method where no test data are leaked during the training phase. In this way, a direct comparison on forecast quality between statistical methods with the neural network method is possible.

The dataset is divided into three subsets:

Training data (70%): used for training models.

Validation data (10%): used to validate the model for each epoch (excluded from training). The validation error is used to adjust the learning rate during the training process.

Test data (20%): used to compare the models.

As the lifecycle of our sport/team-wear clothing ranges from 1 year to 2 years, the dataset was randomly split to ensure that at least 4 months of sale units of a specific clothing item belong to the training set. Random split selection is a good way to evaluate methods when dealing with big data. All results presented in this paper were validated with 10 differently split datasets leading to the same forecasting effectiveness. This way enables confirmation of a fair comparison with other statistical methods that require a sale unit’s history to make predictions. The split of the dataset is visualized in Figure 8, in which the red denotes the testing set used for specific clothing.

Figure 8.

Dataset split of the black T-shirt (Size L) in shop2.

Quantitative comparison with state-of-the-art statistical approaches

The proposed method is compared with three statistical models for the test sets. Figure 8 shows the complete sales data of black L-size T-shirt from shop2; the test, validation, and training sets are denoted as red, green, and blue lines, respectively.

Figure 9 compares four methods, AR, MA, ARIMA, and the proposed method, for the test set given in Figure 8. Note that the ground-truth curve of Figure 9 is exactly the same as the red curve in Figure 8 (period 07.2018–12.2018). As shown in Figure 9, the proposed method provides more aligned results to the ground-truth over the other compared statistical methods.

Figure 9.

Comparison of different sales forecasting methods on the test data.

To quantitatively evaluate the proposed method, the following four statistical error metrics are computed for each method:

Min and Max error

Mean and Std error

RMSE

Weighted Absolute Percent Error (WAPE)

The RMSE is defined as:

RMSE = \sqrt{\frac{1}{| S |} \sum_{t \in S} {(\hat{y_{t}} - y_{t})}^{2}},

where S refers to the set of all evaluated times, |S| the total number of samples,

\hat{y_{t}}

and

y_{t}

indicate the current prediction value and the associated ground-truth value at time

t

. The WAPE is defined as:

WAPE = \frac{\sum_{t \in S} | \hat{y_{t}} - y_{t} |}{\sum_{t \in S} y_{t}} .

A quantitative comparison between the proposed method and the three statistical-based forecasting methods is summarized in Table 4. In the remainder of this paper, all following quantitative comparisons adopt the same way in reporting error statistics.

Table 4.

Quantitative comparisons for the test data of black L-size T-shirt in shop2. Bold indicates the lowest error for each error matrix

	AR	MA	ARIMA	OUR
Min	0.32	0.60	0.30	0.00
Max	8.97	10.71	8.22	3.00
Std	2.11	2.35	2.40	0.82
Mean	2.88	3.03	2.81	1.04
RMSE	3.57	3.83	3.69	1.31
WAPE	0.80	0.84	0.78	0.29

Table 5 shows the complete comparisons for the full test dataset. As shown in Table 5, the proposed method outperforms the other statistical methods for sale forecasting in terms of all the error matrices. It confirms that a hybrid solution that merges time-series data with categorical data can provide better prediction accuracy, mainly based on the robust and meaningful extracted features. From Table 5, the WAPE error rate of the proposed method is less than 35% for the complete dataset. Thus, the overall accuracy of the proposed method is higher than 65%.

Table 5.

Quantitative comparisons for three styles of all sizes and color sale forecasting errors in three shops. Bold indicates the lowest error for each statistic category

Shop	AR	MA	ARIMA	OUR
Shop1
Min	0.00	0.00	0.00	0.00
Max	7.44	8.46	7.68	3.05
Std	2.23	2.35	2.39	0.62
Mean	2.35	2.65	2.20	0.98
RMSE	3.07	3.12	2.64	1.12
WAPE	0.84	0.88	0.87	0.35
Shop2
Min	0.00	0.00	0.00	0.00
Max	10.25	11.68	9.53	6.15
Std	2.89	2.92	2.98	1.05
Mean	2.88	3.03	2.81	1.35
RMSE	3.99	4.25	4.01	1.69
WAPE	0.85	0.89	0.85	0.31
Shop3
Min	0.00	0.00	0.00	0.00
Max	12.32	13.05	11.23	8.06
Std	3.44	3.48	3.52	1.44
Mean	3.33	3.66	3.34	2.03
RMSE	5.69	5.79	5.74	2.24
WAPE	0.84	0.86	0.84

Evaluation of the importance of the metadata

In this section, the proposed model is compared against the standard CNN-based sales prediction model to prove the advantage of the proposed hybrid solution. The standard CNN-based prediction model was trained using the same data and settings that were used to train the proposed hybrid model, including epochs, training parameters, and data split. Similar to the previous section, six error matrices were used to compare the prediction accuracy of both networks (Table 6). As shown in Table 6, the proposed model outperforms the standard CNN-based model, proving the advantage of the hybrid approach.

Table 6.

Comparison of the proposed hybrid model and the standard CNN-based model for all styles and sizes in three shops. Bold indicates the lowest error for each error matrix

	Shop1		Shop2		Shop3
t	CNN	OUR	CNN	OUR	CNN	OUR
Min	0.00	0.00	0.00	0.00	0.00	0.00
Max	4.61	3.05	7.22	6.15	9.55	8.06
Std	1.42	0.62	1.46	1.05	1.84	1.44
Mean	1.77	0.98	1.82	1.35	2.78	2.03
RMSE	1.85	1.12	2.12	1.69	3.31	2.24
WAPE	0.41	0.35	0.41	0.31	0.40	0.32

Ablation study for the promotion-aware loss

In this ablation study, the proposed promotion-aware loss is evaluated. As a basis network, the same network structure is trained using RMSE loss ( $l_{RMSE}$ ), while the proposed model is trained with promotion-aware loss function ( $l_{promo}$ ). Note that all the experimental settings are the same except for the loss function. Table 7 compares the performance of these two trained networks. Some important points can be derived from Table 7. For the complete dataset, the total prediction errors are smaller when using the new promotion-aware loss function, $l_{promo}$ . It is proved that the promotion-aware loss function can guide the model to focus more on its robustness during promotion periods, which are a very important periods for any retailing shop as a considerable amount of clothes are sold in these periods.

Table 7.

Prediction errors with two compared loss functions for three styles (all colors, all sizes) in three shops. Bold indicates the lowest error for each error matrix

	Shop2		Shop2		Shop2
	Error using loss lRMSE	Error using loss Lpromo	Error using loss lRMSE	Error using loss Lpromo	Error using loss lRMSE	Error using loss Lpromo
Min	1.10	0.61	1.68	0.79	2.22	0.72
Max	4.61	3.05	7.31	6.15	9.55	8.06
Std	1.42	1.22	2.13	1.85	2.63	2.01
Mean	1.92	1.66	2.56	1.90	2.93	2.48
RMSE	1.55	1.12	2.82	2.14	3.31	2.78
WAPE	0.35	0.26	0.34	0.25	0.34	0.25

*The prediction errors were computed using only promotion weeks in this table.

Discussion

A set of qualitative and quantitative evaluations was conducted, highlighting the superiority of the proposed method in sales forecasting compared with three other statistical methods (AR, MA, and ARIMA). All evaluations were conducted based on a real retail dataset obtained from three different business scales (three offline sale stores) for three different sportswear items. In the proposed method, the MLP subnetwork detects stationary categorical data features (such as shop size, item color, item style), and the CNN subnetwork gathers information from four time-series data from the previous 9 weeks (sale units, item price evolution, promotion period, and holiday period). The sales unit for the next week is predicted by combining these outputs from two subnetworks via a final hidden layer. The accuracy and robustness of the proposed hybrid forecasting model were verified on the test dataset. The accuracy of the proposed model reaches 65% with taking only a small amount of incomplete sales data as input (the last 9 weeks), which is more than four times more accurate than AR, MA, and ARIMA. Further, it was proved that the adoption of the hybrid approach decreases the prediction error rate by 6% (from 41% for the standard CNN-only model to 35% for the proposed hybrid model). This indicates that the external factors are appropriately modeled into the proposed hybrid model via additional sales forecasting meta-information.

Furthermore, a new loss function, promotion-aware loss function, is proposed to enhance the training convergence and robustness of forecasting at different periods of the year. This loss function measures a weighted RMSE while paying more attention in the promotion period. The promotion-aware loss results in 75% prediction accuracy, increased by 10% compared with the RMSE loss. It validates the contribution of the proposed loss on the entire forecasting accuracy. The considerable performance gain achieved by the proposed model over the other state-of-the-art methods shows promising potential for sportswear retailers, as it can reduce capital investments so that the inventory management problem can be better solved. One limitation of the proposed method is the requirement of a massive amount of data to train the model, due to the employed deep learning subnetwork. This design choice is also explained by the fact that this study is focusing on long-term sales clothing. However, suppose the training data is already spanning a variety of different products and characteristics. In that case, the current model will be robust and accurate for a new sportswear collection.

In a real business environment, there may be several possible scenarios:

When the same retailer introduces a new product to replace the same type and brand of sportswear, for example, a model trained with a dataset including POLO samples can be directly used to forecast the sales of a new POLO item. In this example, the previous POLO dataset can be directly used as input data. Accordingly, a robustly trained model can be obtained.

When new retailers enter the market where retailing data lack, the existing model can be used for sales forecasting at the initial stage of retail, as sports clothing usually has more than 1 year of retail cycle. As retail data are accumulated, the robustness of the forecasting model can be gradually enhanced.

Sportswear retailing has a specific sales pattern. For example, football sportswear retailing is highly related to the football Premiership. Therefore, for long-term (more than 1 year) sales types, the proposed model can be extended to any sportswear with a short initial training.

Conclusion and future works

This paper presents an efficient method to forecast retail sales, focusing on the sportswear apparel industry. It is essential in sportswear to reliably solve the challenge of supply and replenishment, due to the very long-life cycle. To this end, a hybrid model is proposed based on the combination of categorical data via MLP network and time-series data via CNN. The robustness and accuracy of the model were verified on three different team styles datasets obtained from three different stores, compared with three state-of-the-art forecasting methods. The proposed method reaches a prediction accuracy of more than 65%, outperforming the compared methods. Further, a new training loss, the promotion-aware loss, was proved to improve the overall forecasting accuracy of the model. It proves that the method is of great significance for sportswear sales forecasting. In addition, without a long-term retailing dataset, the proposed model can be adapted to the long-term sales of sports clothing retail and similar categories. Applicable global industrial users include clothing retailers, brands, and merchandisers in the sports clothing and team-wear field.

Future works will include studying the proposed method for online sales where a large amount of data can be collected. Furthermore, the sports retail business has recently experienced a very challenging time due to the COVID situation, and it would be very interesting to feed our model with such new input factors. The macro-economic analysis is also very valuable for regional sellers to make predictions on the purchase intention of local consumers. A fast and affordable forecasting software or APP will be practically helpful for small and medium-sized enterprises and sellers.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Damien Lefloch

References

Donohue

KL.

Efficient supply contracts for fashion goods with forecast updating and two production modes. Manag Sci 2000; 46: 1397–1411.

Ren

Chan

Siqin

Demand forecasting in retail operations for fashionable products: Methods, practices, and real case study. Ann Oper Res 2020; 291: 761–777.

Armstong

JS.

Principles of Forecasting: A Handbook for Researchers and Practitioners. International Series in Operations Research & Management Science 2001; Vol. 30. New York: Springer.

Fisher

Raman

Mccleclland

AS.

Rocket science retailing is almost here — are you ready?

Harvard Bus Rev 2000; 78: 115–124.

Beheshti-Kashi

Karimi

Thoben

, et al. A survey on retail sales forecasting and prediction in fashion markets. Syst Sci Control Eng 2015; 3: 154–161.

Guo

Wong

Leung

, et al. Applications of artificial intelligence in the apparel industry: A review. Text Res J 2011; 81: 1871–1892.

Frank

Garg

Sztandera

, et al. Forecasting women’s apparel sales using mathematical modeling. Int J Cloth Sci Technol 2003; 15: 107–125.

Liu

Ren

Choi

, et al. Sales forecasting for fashion retailing service industry: A review. Math Prob Eng 2013; 1–9.

Caglayan

Satoglu

Kapukaya EN. Sales Forecasting by Artificial Neural Networks for the Apparel Retail Chain Stores. In: Proceedings of the INFUS 2019 conference: Intelligent and Fuzzy Techniques in Big Data Analytics and Decision Making, Istanbul, Turkey, 23–25 July 2019, pp.451–456.

10.

Jain

Kumar

AM.

Hybrid neural network models for hydrologic time series forecasting. Appl Soft Comput 2007; 7: 585–592.

11.

Aksoy

Ozturk

Sucky

A decision support system for demand forecasting in the clothing industry. Int J Cloth Sci Technol 2012; 24: 221–236.

12.

Guven

Simsir

Demand forecasting with color parameter in retail apparel industry using artificial neural networks (ANN) and support vector machines (SVM)

methods. Comput Ind Eng 2020; 147.

13.

Zhou

Meng

Wang

, et al. A demand forecasting model based on the improved Bass model for fast fashion clothing. Int J Cloth Sci Technol 2020; 33: 106–121.

14.

Thomassey

Sales forecasting in apparel and fashion industry: A review. In: Intelligent Fashion Forecasting Systems: Models and Applications. Springer, 2013, pp.9–27.

15.

Greig

Technological change and innovation in the clothing industry: The role of retailing.

Labour Ind 1990; 3: 330–353.

16.

Badorf

Kai

The impact of daily weather on retail sales: An empirical study in brick-and-mortar stores. J Retail Consum Serv 2020; 52.

17.

Fildes

Retail sales forecasting with meta-learning. Eur J Oper Res 2020; 288: 111–128.

18.

Sun

Yang

Consumer visual attention and behavior of online clothing. Int J Cloth Sci Technol 2020; 33: 305–320.

19.

Beheshti-Kashi

Twitter and fashion forecasting: An exploration of tweets regarding trend identification for fashion forecasting. 2015; Intern-Report. New York: Academic Press.

20.

Anastasios

Smith

Danaher

PJ.

From Amazon to Apple: Modeling online retail sales, purchase incidence, and visit behavior. J Bus Econ Stat 2014; 32: 14–29.

21.

Croda

Romero

Morales

SO.

Sales prediction through neural networks for a small dataset. Int J Interact Multimedia Artif Intell 2019; 5: 35–41.

22.

Bos

Franses

Ooms

Inflation, forecast intervals and long memory regression models. Int J Forecast 2002; 18: 243–264.

23.

Yusuf

Alawneh

SG.

GPU implementation of sales forecasting with linear regression. Int J Innov Res Comput Sci Technol 2018; 6: 43–48.

24.

Karmy

Maldonado

Hierarchical time series forecasting via support vector regression in the European travel retail industry. Expert Syst Appl 2019; 137: 59–73.

25.

Smyl

A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. Int J Forecast 2020; 36: 75–85.

26.

Omar

Hoang

Liu

A hybrid neural network model for sales forecasting based on ARIMA and search popularity of article titles.

Comput Intell Neurosc 2016; 2016: 1–9.

27.

Abraham

Ledolter

Statistical methods for forecasting . Vol. 234. John Wiley & Sons, 2009.

28.

Box

GEP

Jenkins

Reinsel

, et al. Time series analysis: Forecasting and control . John Wiley & Sons, 2015.

29.

Byun

Sternquist

Fast fashion and in-store hoarding: The drivers, moderator, and consequences. Cloth Text Res J 2011; 29: 187–201.

30.

Nucamendi-Guillén

Moreno

Mendoza

A methodology for increasing revenue in fashion retail industry. Int J Retail Distrib Manag 2018; 46: 726–743.

31.

Huang

Zhu

Siew

CK.

Extreme learning machine: Theory and applications. Neurocomputing 2006; 70: 489–501.

32.

Sun

Liu

Xurigan

, et al. Research of clothing sales prediction and analysis based on ID3 decision tree algorithm. International Symposium on Computers & Informatics, 2015.

33.

Meyer-Baese

Schmid

VJ.

Pattern recognition and signal analysis in medical imaging . Elsevier, 2014, 5: pp.135–149.

34.

Kim

Cho

SB.

Application of interactive genetic algorithm to fashion design. Eng Appl Artific Intell 2000; 13: 635–644.

35.

Deng

, et al. A Deep Neural Framework for Sales Forecasting in E-Commerce. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management,Beijing, China, 3–7 November 2019, pp.299–308.

36.

Zhao

Wang

Sales forecast in E-commerce using convolutional neural network. ArXiv Preprint 2017, ArXiv: 1708.07946.

37.

Fan

Che

Chen

ZY.

Product sales forecasting using online reviews and historical sales data: A method combining the Bass model and sentiment analysis. J Bus Res 2017; 74: 90–100.

38.

Guo

Wong

A multivariate intelligent decision-making model for retail sales forecasting. Decis Support Syst 2013; 55: 247–255.

39.

Huang

Liu

Intelligent retail forecasting system for new clothing products considering stock-out. Fibre Text East

Eur 2017; 25:10–16.

40.

Lee

Tsai

SB.

Grey system theory and fuzzy time series forecasting for the growth of green electronic materials. Int J Product Res 2014; 52: 2931–2945.

41.

Thomassey

Sales forecasts in clothing industry: The key success factor of the supply chain management. Int J Product Econ 2010; 128: 470–483.

42.

Kuan

Liu

Forecasting exchange rates using feedforward and recurrent neural networks. J Appl Econometr 1995; 10: 347–364.

43.

Bai

Kolter

Koltun

An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. ArXiv preprint 2018; ArXiv:1803.01271.

44.

Oord

Dieleman

Zen

, et al. Wavenet: A generative model for raw audio. ArXiv preprint 2016; ArXiv:1609.03499.