Abstract
The end breakage rate (EBR), which is one of the most important quality variables used to determine the yield of a spinning process, depends on various process conditions and fiber/yarn properties. In the current study, historical data consisting of more than 10,000 runs from 55 ring spinning machines recorded under normal operation in YUNSA Worsted and Woolen Company in Turkey were analyzed using exploratory and predictive statistical techniques. Principal Component Analysis (PCA) was used to determine subsets of quantitative variables, which vary collectively, forming clusters for different machine types. Correspondence Analysis (CA) was found to be particularly beneficial to determine the association between machines and nominal variables, which make a significant contribution to product quality in textile industries. The current spinning process requires accurate discrimination between acceptable and faulty yarns, determined via a threshold on the EBR, so logistic regression was utilized for the prediction of faulty yarns. The Receiver Operating Characteristic curves showed that the discriminative capacity of the logistic models was at an acceptable level, almost on a par with that of Artificial Neural Network (ANN) models. For different types of machines, while yarn count, roving count, lot size, twist level and composition were commonly present in logistic models, the magnitude of their partial effects varied significantly. In conclusion, PCA, CA and logistic regression are suggested, along with ANN models, to be used for textile industries in online monitoring, detecting faulty machines, choosing optimum machines for specific operational conditions and determining the range of process variables for which controlled experiments may be required.
Keywords
Spinning, along with weaving and dyeing, is one of most important processes in the textile industry. 1 Ring spinning is the most widely used spinning process for yarn manufacturing, mainly due to its potential for modifications, such as roving stop motions, yarn break indicators, electronic speed, lot building programs, data collection and ring cleaning.2,3 A spinning machine consists of a number of parts operated in series. The roving is guided from the roving bobbin to the drafting system, which plays an important role in yarn uniformity. The thin fiber strand leaving the drafting system is twisted by a high-speed spindle, in order to provide the required strength to the yarn. 4 The ring traveler imparts twist to the yarn, and the ring traveler number is proportional to the weight of the traveler. The principal aim of spinning is to manufacture yarns of the required linear density, evenness and tensile characteristics, with the minimum number of faults. 5
Elimination of yarn faults during the ring spinning process is of the utmost importance. The end breakage rate (EBR) is one of the most crucial variables that determine the quality of the manufactured yarn, since it directly affects the overall yield of the spinning mill. 6 Various mathematical and statistical models on the mechanism of end breakage and the relationship between the EBR with yarn properties have been developed in the literature, in which irregularity, mean yarn strength and spinning tension are found to be the most significant variables.7,8 While general principals on the relationship between spinning conditions and yarn quality have been established, determining universal models with high predictive power has been elusive, mainly due to nonlinear relations between spinning process variables and the end breakage mechanism,9,10 and variations in raw materials and machines.11,12
In the literature, predictive models between yarn properties and spinning process conditions have been constructed using controlled experiments or historical data from specific processes, mainly via linear regression13–18 and Artificial Neural Network (ANN)14–16,18,19 methods. While point predictions from ANN models have been generally found to be superior to those from linear regression models, lack of an explicit representation of the estimated function, and lack of statistics, such as P-values and confidence intervals (CIs), in ANN models make it difficult to interpret the resulting model, assign significance to the variables and measure the precision of the predictions.
Variability in sector requirements, raw materials, machines, workers and working conditions may necessitate novel solutions to conventional problems and exclusive models to characterize the process variables in a specific process. Although the problem elaborated in the current study is a conventional problem in the textile industry, the suggested solution method, which leans towards multivariate statistical process control (SPC) techniques, 20 is designed to serve specific requirements of YUNSA, which is the leading worsted and woolen producer in Turkey, exporting to more than 50 countries. The analyzed data, kindly provided by YUNSA, consist of historical measurements collected between 2012 and 2014 from the ring spinning process, which consists of 55 different spinning machines of three different brands (types). Each measurement comprises a set of process variables on a single machine and the corresponding EBR. Two main problems related with the spinning process were of concern. Firstly, it was necessary to determine significant patterns that may exist between process variables, for example whether yarns ordered in high lot sizes were associated with certain fiber compositions, or whether certain subsets of machines were preferable under specific process conditions. Secondly, a method was developed to identify “faulty” machines, which are likely to manufacture yarns with a higher EBR, and to help in choosing “optimal” machines, which are likely to manufacture yarns with a small EBR under specific process conditions.
The first of these two problems was tackled using exploratory statistical techniques: Principal Component Analysis (PCA) and Correspondence Analysis (CA). While PCA is a popular multivariable SPC technique used in versatile chemical industries,21–23 it can be employed only on quantitative variables. CA is a method extensively used in geology24,25 and marketing,26,27 and it can be employed on data containing nominal variables, such as color and composition in the current dataset. Using PCA and CA, it was possible to visualize and cluster a large number of spinning process variables in lower dimensional subspaces. Discussion of the second problem led to the conclusion that binary predictions of acceptable versus faulty yarns would be preferable to point predictions of the EBR. Hence, instead of using the EBR as a continuous response variable in regression models, failure probability was predicted using logistic regression models. Predictive quality of the logistic regression models was evaluated using Receiver Operation Characteristic (ROC) curves, and compared with ANN models. The methodology in the current study is suggested as a tool for predicting failure probabilities of different machines at different operating conditions, and these predictions may be utilized for process improvement and SPC purposes.
Materials and methods
Statistical analysis methods
Principal Component Analysis
PCA is a popular technique used to capture the essential variability in large correlated data.
28
PCA is computed via employing eigenvalue decomposition
29
on the covariance matrix Cm × m of the scaled data matrix φk × m (k = number of samples, m = number of variables), yielding an eigenvalues matrix Λm × m, in which each eigenvalue (λ
i
) is located in decreasing order on the diagonal, and an eigenvectors matrix Um × m, which contains orthonormal loading vectors (u
i
). Loadings are the unit length basis vectors of the principal components (PCs), and jth element of u
i
is the “weight” (dot product, or direction cosine) of the jth variable on PC
i
The first A columns of the loadings [U(A)]m × A, usually determined via cross-validation,
30
are assumed to span the essential information subspace, while the excluded PCs form the residual subspace. Projecting the original data on the essential PC subspace yields scores matrix Tk × A comprising scores vectors [t
i
]k × 1, which may be used in visualizing and clustering data
31
The Q-residual statistic (scalar) measures the contribution of a sample (ϕm × 1) to the residual PC space, determined by the difference of the identity matrix (Im × m) from the outer product of loadings. Out-of-control samples may be detected using Q-residual control limits
32
Correspondence Analysis
A contingency table Nn × p contains the counts of occurrences (items) for categorical (nominal) variables. Independence between rows and columns in a contingency table may be tested via Pearson's chi-square test statistic χ2, computed using the observed (N
ij
) and the expected occurrences (E
ij
= N
σ
r
i
c
j
, see below for definitions) for table elements (cells) {i,j} with E
ij
≥ 5
The Pearson's statistic follows a
Applying singular value decomposition (SVD)
29
on
Factor scores of rows and columns can be represented on a symmetric plot, but proximity between rows and columns cannot be interpreted directly. Trace of the eigenvalues matrix (Λ = ∑2) is called the inertia (Ψ), and equal to the weighted sum of squared distances of the rows (or columns) to their center of mass. In the current study, rows (machines) will be analyzed in the column space; hence, the rest of the theory focuses on analyzing row profiles. The inertia of the ith row point can be computed using either row profiles or factor scores
Here p
ij
(and f
il
) corresponds to the ith row and jth (and lth) column of P (and F). The total inertia can be determined via summing the inertia contribution from each row
Multiplying the last term in equation (9) by N
σ
and comparing with equation (4) shows that
In the current study, CA has been employed using MATLAB package CAR. 36
Logistic regression
Logistic regression is used to estimate the relation between k observations of a set of regressor variables x′ = (x1, x2, … x
m
) and a binary response variable (Y).
37
Logistic regressions differs from linear regression in two aspects: (i) the conditional expected value of Y on the regressor values, π(x), should lie between zero and unity; (ii) the random error term ɛ in the statistical model
The estimate of the expected response (
A hypothesis test using the G statistic, which follows
Here, k
i
, O
i
and
Description of the historical dataset
Process variables consist of fiber properties, for example composition and color; yarn properties, for example twist level and yarn count; operational routine of the process, for example using specific machines depending on lot size; and machine operational variables, for example spindle speed. Process variables, which consist of both quantitative and nominal variables, had been determined on the basis of orders received by the company in the operational history. Quantitative variables and their operating ranges in the dataset are listed in Table 1. Roving and yarn counts have bimodal distributions with modes at 333 and 500 tex, and 25 and 33 tex, respectively. The ring traveler number takes discrete values with the mode at 24, and the spindle speed has a left skewed distribution. Lot size exhibits an exponential-like distribution with a long tail (Figure 1(a)): ∼60% of the yarn orders had a size smaller than 300 kg, while ∼2% of the orders had a size greater than 2000 kg. In order to prevent the high lot size runs dominating statistical models, logarithmic transformation was employed on the lot sizes. Machine age varied only for Type I machines, so it was not included in the statistical analyses except for the logistic model for Type I machines.
Descriptive statistics of the historical data: (a) percentage histogram (relative frequency × 100) of lot sizes divided into 20 intervals; percentage bar graph of runs performed (b) on each machine; (c) in each yarn color; and (d) in each yarn composition. Quantitative process variables and their ranges in the historical dataset
Nominal process variables and their levels in the historical dataset
In the historical dataset, the number of end breakages during the operation of each spindle had been counted and normalized to 1000 spindles and a 1-h period, obtaining the EBR (number of breaks per 1000 spindle hours). Engineers in YUNSA deem yarns of acceptable quality if the EBR is below a certain threshold, determined by the quality demand of the market. The conventional limit, for instance, is 50 end breaks per 1000 spindle hours in worsted ring spinning. 5 In the current study, the threshold of the EBR is taken to be half of that adopted by YUNSA for process improvement purposes (note that any convenient EBR threshold may be used with the current methodology) and a dichotomous response variable, which takes a value of unity when the EBR is above the threshold (failure), and a value of zero when EBR is equal or below the threshold (acceptable quality), is defined for each run.
Results and discussion
Exploratory analysis of process variables
In order to determine the patterns in the operational behavior of the process, quantitative process variables and binary nominal variables (twist direction and spinning) were analyzed using PCA, while various couples of nominal variables were analyzed with the help of CA. Before applying PCA, all variables were normalized to unit variance to give equal weighting to each variable. 20 Binary nominal variables were converted to quantitative variables via assigning zero to the S-direction and doubled-yarn spinning, and one to the Z-direction and singled-yarn spinning.
The first two PCs, which explain 57% of the variability (black lines in Figure 2(a)), are deemed to be sufficient to describe the historical operation, due to their higher explanatory power than that obtained from randomly produced data (gray lines),
42
also confirmed by cross-validation. Q-residuals (black solid lines in Figure 2(b)) shows that the number of samples exceeding the confidence bounds (99.9% and 99.99% limits shown with gray and black horizontal dashed lines, respectively) is small, indicating that the process had been generally stable and in-control. Explanation percentage (Figure 2(c)) and loadings (Figure 2(d)) show the explained variance of variables by PCs and the loadings of the two PCs. PC subspace of the process does not explain the variation in lot size, showing that lot size is virtually independent of the quantitative process variables, twist direction and spinning. PC 1 indicates an association of low spindle speed and low yarn count (tex) with high traveler number, Z-direction twist and singled yarn spinning, while PC 2 represents a collective variation of roving count and twist level in the opposite directions.
Principal Component Analysis on the quantitative historical dataset: (a) explanation percentages of principal components (PCs); (b) Q-residuals; (c) percent variance of variables explained by PCs 1 and 2; (d) loadings of PCs 1 and 2.
Examination of t-scores on the reduced PC plane reveals two separate clusters, possibly due to the significant contribution of binary variables to PC 1 (Figure 3(a)). Type I and III machines dominate the clusters on the right- (positive t1-scores) and the left- (negative t1-scores) hand sides, and Type II machines contribute to both clusters. Hence, Type I machines were generally operated at low spindle speed, low yarn count, Z-direction twist and singled yarn spinning, while the opposite conditions held for Type III machines (Figure 2(d)).
Analysis of the historical runs in lower dimensional spaces: (a) projection of runs on 2 PC space; principal coordinates on two-dimensional factor spaces obtained from Correspondence Analysis of individual machines with (b) lot sizes; (c) yarn compositions; and (d) yarn colors.
ACCs (in %) obtained from the Correspondence Analysis of the contingency table formed by lot sizes (columns) and machines (rows)
CA on the contingency table constructed from 37 machines and 13 compositions shows that Type I and II machines are preferred for WO and 50/50 WO-PES yarns; machines 16, 20, 22 and 24 of Type III are associated with 85/15 WO-filament PA, 75/10/15 WO-PA-EL 75/10/15 WO-PA-filament PA yarns, while the rest of the Type III machines are mostly associated with 96/4 WO-EL and 43/53/4 WO-PES-EL yarns (Figure 3(c)). CA employed on 37 machines and nine colors shows that the first axis discriminates ecru from navy blue yarns, while the second axis discriminates marango and other from navy blue and brown colored yarns (Figure 3(d)). Type I and II machines are preferred for ecru and marango colored yarns, respectively, but orders for yarns colors, similar to that seen for lot sizes and compositions, are not homogenously distributed among Type III machines.
Prediction of faulty yarns using logistic regression
A total of 7000 runs were used for model construction (training) purposes, while the remaining ∼3000 were used for prediction (testing). Quantitative process variables (regressors) were standardized to zero mean and unit standard deviation to facilitate interpretation of the regression coefficient estimates (
Initially, a global model including all three types of machines was formulated, but this model contained a large number of interaction terms with machine types. Since a large number of interactions with a single nominal variable is suggestive of constructing separate regression models for each level of the variable in query,
44
a separate model was built for each machine type. Hence, a single logistic model was determined for all Type I machines (929 runs). Averages of the estimated failure probabilities for individual machines do not seem to be consistent with the observed failure proportions (Figure 4(a)). Pearson's chi-square statistic was found to be 29.9, greater than the 95% confidence bound Logistic model adequacy checks for Type I machines. Observed proportion of failures versus predicted failure probabilities obtained via (a) the preliminary model and (b) the final model after including M
F
. (c) Deviance residuals of the training set. (d) Predicted versus observed failure probability for deciles in the datasets. (e) Receiver Operation Characteristic curves. (F) Observed and predicted failure probabilities for individual machines. Regression coefficient estimates, their standard errors (SEs) and P-values (P-vals) of the logistic regression models constructed for each type of machine
Summary of discrimination statistics for the logistic and Artificial Neural Network (ANN) models
AUC: area under the Receiver Operation Characteristic curve; TPR: true positive rate; TNR: true negative rate.
For Type I machines, parameter M F was found to be equal to 0.864 (Table 4), showing that odds of failure increased by 2.4 times (∼0.2 increase in probability) when yarns were manufactured using subgroup 2 instead of subgroup 1 machines. The observed proportions of faulty yarns manufactured in Type I machines are shown with black lines in Figure 4(f), while box-and-whisker plots represent the predicted failure probability distributions. The suggested methodology is able to correct for difference in operating conditions, and identify machines with higher failure probability. For instance, the observed proportion of faulty yarns from machines 6 and 8, which are in different subgroups, are almost equal to each other, but this similarity is mainly due to differences in the operating conditions, that is, production qualities of these machines would not be equivalent under identical operating conditions. Using the same methodology, machine number 36 in Type III machines was also found to increase the odds of failure by 2.9 times.
Relations of process variables with failure probability
Logistic regression parameter estimates of all three machine types are shown in Table 4. Spindle speed, twist direction and spinning were not found to have significant partial effects (regressor coefficients) on failure probability of any machines. In the literature, dynamic yarn strength was found to be negatively correlated with spindle speed, 47 and the EBR was found to be positively correlated with spindle speed for values higher than 15,000 RPM, 48 while change in spindle speed between 10,000 and 17,500 RPM did not show any effect on the EBR for cotton yarns. 49 This shows that the relationship between spindle speed and EBR may depend on process conditions, and suggests that results in the current study may be locally valid for the investigated process. The partial effect of machine age, although significant, was found to be relatively small, and hence is not included in the final model.
Figure 5 shows failure probability predictions over the range of a single variable, while holding other quantitative variables at their medians, and ecru and WO are taken as the reference state of the yarn. It is interesting that failure probability is negatively correlated with lot size in all three types of machines (Figure 5(a)). Failure probability is also negatively correlated with twist level for most of its range (Figure 5(b)), consistent with the positive correlation between twist level and dynamic yarn strength reported in the literature.
47
Change in the sign of correlation above ∼800 T/m for Type III machines is not reliable due to large 95% CI (dotted lines) in this region.
Prediction of failure probabilities with respect to (a) lot sizes; (b) twist level; yarn count (YC) at (c) roving count (RC) = 218 tex; (d) RC = 556 tex; RC at (e) YC = 12 tex; (f) YC = 33 tex; traveler number at (g) YC = 12 tex; (h) YC = 33 tex.
Among the magnitude of partial effects of all process variables, those of yarn count and roving count on failure probability were the highest and the most complicated, consisting of quadratic and interaction terms, also confirmed by a previous study. 13 The effect of each of these two variables is investigated at low and high levels of the other one. Figure 5(c) and (d) show that failure probability is generally negatively correlated with yarn count at low roving count (∼220 tex), but positively correlated at high roving count (∼550 tex). A closer examination shows that negative correlation between yarn count and failure probability is seen over the ¾ range, similar to the previous findings.8,13 A complication, however, arises from the concave functions predicted for Type I machines at high roving count and Type III machines at both levels of roving count. Although 95% CIs are relatively large around the maxima, the predicted concave function is likely to reflect the true nature of the complex relation. The effect of roving on failure probability is similar to but less complicated than that of yarn count (Figures 5(e) and (f)). Failure probability was found to increase with the traveler number for Type III machines (Figures 5(g) and (h)) in accordance with the expectation, 50 but different types of correlations and large CI (represented by dashed lines) seen for Type I and II machines make it difficult to reach a reliable conclusion on the exact nature of the relation.
Due to the large number of colors and compositions, various random groupings were used as different levels of Comp and Col in the models, and those that yield the smallest cross-validation deviance and P-value statistics were retained. Comp was taken to be equal to unity for compositions 5 and 6 in Type I machines, for composition 5 in Type II machines and for compositions 6, 8 and 11 in Type III machines. Overall, odds of failure were found to increase up to three times (∼0.1–0.2 increase in probability) when yarns with 85/15 and 70/30 WO-PA composition were manufactured, showing that fiber composition may be significantly correlated with ring spinning quality. Black, gray marl and marango colors have a relatively small negative correlation with failure probability only for Type III machines.
While the constructed logistic models should not be considered as strict causal relations due to the observational nature of the current study, most of the relations in the models are found to be consistent with the results from the literature. This consistency may be partly due to the openloop character of the process: operating conditions of the ring spinning process are determined on the basis of orders received by the company without any automatic feedback control mechanism for the EBR, hence eliminating the possibility of reverse causation to a certain degree. However, it is possible that quality variables not considered in the model have confounding effects, that is, they affect failure probability and the current process variables simultaneously. Hence, inclusion of other measured quality variables to the models is important to determine a more complete picture of the causal relations between variables.
Prediction of faulty yarns using ANN models
To evaluate the discriminative power of the linear logistic models comparatively, ANN models with a single hidden layer comprising seven nodes were constructed, and inputs, of which 20% was used for validation, and outputs identical to those used in the logistic model were employed for training and testing. As seen in Table 5, the discrimination capacity of ANN models is only a marginal improvement over the logistic models. Two explanations might be suggested as to why logistic and ANN models had comparable discrimination power, contrary to the higher predictive power of ANN models over linear regression models reported in the literature.51–53 Firstly, a logistic model, unlike a linear regression model, would not significantly be affected by the nonlinear relations in the regions, for which the EBR is much lower/higher than the threshold value. Secondly, logistic models in the current study consisted of interaction and quadratic terms, which were not frequently used in the literature, and were built for each machine type separately, rendering characterization of the nonlinear ring spinning process via “locally” valid linear models.
Conclusion
Characterizing ring spinning processes is highly important for minimizing yarn faults. In the current study, historical process data from three types of ring spinning machines in YUNSA were analyzed using exploratory (PCA, CA) and predictive (logistic regression) statistical techniques. PCA and CA were found to be well suited to extract patterns in the operational routine and yarn specifications, and to discriminate the operation conditions under which different types of machines were used. A future extension of the current study may be to use multiple CA (MCA), 35 in which patterns of deviation from independence are examined for more than two nominal variables, on the spinning process dataset. A preliminary application of MCA yielded promising results, in which simultaneous relations between machine numbers, lot sizes, yarn colors and compositions were determined.
A threshold value for the EBR was used to discriminate faulty from acceptable yarns to implement logistic regression on the available dataset. To our knowledge, logistic regression has been scarcely used in the textile industry, 54 and the current methodology may particularly be useful for quality control applications in the field. Logistic regression models gave acceptable discrimination with AUC values of 0.65–0.70 for test data, almost on a par with ANN models. Logistic models also helped in the identification of machines more likely to manufacture faulty yarns. For instance, Type I machines were generally found to have a lower failure probability over a wide range of operating conditions. In future studies, logistic models may also be incorporated into attribute control charts 55 for continuously monitoring the performance of individual or groups of machines. For all types of machines, the sign of the relations between spinning process variables and failure probability was found to be similar (except the traveler number), but the magnitudes of these relations differed significantly. A rather unexpected relation is the negative partial effect of lot size to failure probability. The existence of a “warm-up” period of the ring spinning machines might be an explanation of this relation: as the spinning machine is expected to run longer for higher lot sizes, the EBR might decrease. However, a more thorough examination, particularly to rule out the possible effects of confounding variables, is definitely required to clarify the mechanism behind this relation.
Relatively large CIs around a predicted response within a certain range of process variables is an indication of the input variables vector being inconsistent with the historical covariance. For instance, the high CI of predicted failure probabilities at low yarn count and low traveler number (Figure 5(g)) is a result of the fact that the process had not been frequently run under these conditions (see the loadings of PC 1 in Figure 2(d)), so controlled experiments may be performed to determine this relation precisely. In the spinning process, a large number of variables and nonlinearity of the process demand numerous controlled experiments to be performed on different levels to be able to construct precise causal models, and this may be impractical for a commercial textile company due to time and money limitations. The methodology advocated in the current study, that is, employing exploratory and predictive statistical analyses on historical data, may be used as a preliminary step for explanatory modeling, 56 so that controlled experiments may be employed around operating conditions, which have not been covered in the historical process conditions.
The current study shows that classical linear statistical tools, along with ANN models, have much to offer in analyzing textile processes, particularly in visual examination process variables, and supplementing the point predictions with CI. Quality variables, such as fiber fineness, strength and uniformity, although measured, were not available in the received dataset, and inclusion of these variables is likely to increase the predictive capability of the logistic models.
Footnotes
Acknowledgements
We would like to thank Murat Yildirim, Head of R&D Department at YUNSA, for providing us with the process data.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Bogazici University B.A.P. (Project 8041).
