Variances in knowledge-based interval type 2 Gaussian fuzzy on linear regression models

Abstract

Fuzzy logic is a branch of artificial intelligence that has been used extensively in developing Fuzzy systems and models. These systems usually offer artificial intelligence based on the predictive mathematical models used; in this case linear regression mathematical model. Interval type 2 Gaussian fuzzy logic is a fuzzy logic that utilizes Gaussian upper membership function and the lower membership function, with a footprint of uncertainty in between the Gaussian membership functions. The artificial intelligence solutions predicted by these interval type 2 fuzzy systems depends on the training and the resultant linear regression mathematical model developed, which usually extract their training data from the expert knowledge stored in their knowledge bases. The variances in the expert knowledge stored in these knowledge-bases usually affect the overall accuracy of the linear regression predictive models of these systems, due to the variances in the training data. This research therefore establishes the extent that these variances in knowledge bases affect the predictive accuracy of these models, with a case study on knowledge bases used to predict learners’ knowledge level abilities. The calculated linear regression predictive models show that for every variance in the knowledge base, there occurs a change in linear regression predictive model with an intercept value factor commensurate to the variances and their respective weights in the knowledge bases.

Keywords

Interval type 2 gaussian fuzzy logic linear regression predictive models intelligent system models knowledge-bases

1 Introduction

Fuzzy logic is a branch of artificial intelligence which mimic human reasoning by providing an array of reasoning possibilities, hence it is being used in developing intelligent systems and models. Interval type 2 Gaussian fuzzy, is a Gaussian membership fuzzy with upper membership function and lower membership function with a footprint of uncertainty in between the Gaussian membership functions. Such that, any input in an interval type 2 membership function, cuts both the upper and the lower membership functions [4, 22].

In order for fuzzy systems and models to offer artificial intelligence solutions, general mathematical predictive models have to be trained and developed to enable these systems in offering intelligent predictions. In this case, linear regression predictive models were used. To train and develop linear regression predictive models, fuzzy systems will have a set of inputs and their corresponding outputs. These outputs are estimated based on the expert knowledge stored in their knowledge bases. Thus, different knowledge bases will tend to give different output values for similar input values [12 , 28].

References [1 , 6] explained that to train and develop linear regression mathematical models, the inputs to the fuzzy systems and their corresponding outputs are used for the training of the linear regression. Hence, interval type 2 Gaussian fuzzy linear regression models referred to this paper, is a linear regression mathematical predictive model trained and developed in order to make predictions for these Gaussian Systems.

In this paper we examine how variances in knowledge bases of interval type 2 Gaussian fuzzy logic systems affect these systems’ output values, and their resultant linear regression predictive models when placed under similar conditions.

To conduct this research, three interval type 2 Gaussian fuzzy used to predict knowledge level abilities of students were developed. The three Gaussian fuzzy were all similar with the only difference being in the expert knowledge stored in their knowledge bases, thus variances in knowledge bases. The three different knowledge bases were corresponding to type 1 test, type 2 test and type 3 test respectively as suggested by [5, 14].

In order for a student’s knowledge level ability to be predicted using the fuzzy, a student was to undertake an assessment test. After undertaking an assessment test, the test score and the time taken to finish the test were used as the inputs to the fuzzy, which was then used to predict the student’s knowledge level ability. The assessment test score was scored out of 100%, and the time to complete an assessment test was calculated out of 20minutes; while the knowledge level output was rated out of 100%.

Fifty (50) random inputs each of the three interval type 2 Gaussian fuzzy were experimented, and their resultant knowledge level abilities documented. The collected data was used for training and validating the linear regression predictive model for each exam type at a ratio of 70:30 respectively, as advised by [13, 23]. The behavior of linear regression model was then analyzed, based on the variances in the three knowledge bases used.

2 Methodology

Mamdami interval type 2 Gaussian fuzzy for test 1 type, test 2 type, and test 3 type were developed using Juzzy online fuzzy toolkit. In order to ensure that only the knowledge bases of the fuzzy were varying, all the input and the output membership functions of the three Gaussian fuzzy were identical, with only the expert knowledge stored in the knowledge bases being different as suggested by [5 , 21].

As advised by [18 , 28] an interval type 2 Gaussian fuzzy logic was developed using the following five modules: -

2.1 Fuzzification module

In this module, we defined the linguistic variables and their terms, and then set the degree of memberships. Our linguistic terms and variables were as follows: -

2.1.1 Input linguistic variables and linguistic terms

Test score = {low, fair, average, good, excellent}

Time = {short, average, long}

2.1.2 Output linguistic variables and linguistic terms

Knowledge level = {none, shallow, deep, expert}

Using Equations (1) and (2) Gaussian membership functions with varying variances were plotted as illustrated by [9 , 25]. ${\bar{μ}}_{k}^{1} (x_{k}) = N (m_{k}^{1}, ϱ_{k 2}^{1}; x_{k})$ (1) ${\underline{μ}}_{k}^{1} (x_{k}) = N (m_{k}^{1}, ϱ_{k 1}^{1}; x_{k})$ (2)

Using Equations (1) and (2), and taking considerable estimation of the midpoints and the spread of each membership function, a non-singleton Gaussian MF for the antecedents and precedence, with each membership function having upper MF ${\bar{μ}}_{k}^{1}$ and the lower MF ${\underline{μ}}_{k}^{1}$ was plotted, as illustrated in Figs. 1, 2 and 3.

Fig. 1

Assessment test score membership function.

Fig. 2

Time taken on an assessment test membership function.

Fig. 3

Knowledge level membership function.

Figures 1, 2 and 3 shows the membership functions of an assessment test score, time taken to complete the assessment, and the knowledge level. All the three exam types used similar membership functions for consistency, and to allow only the knowledge bases to vary. Assessment test score, and time taken, formed the input membership functions, while knowledge level, formed the output membership function. Assessment test score ranges from 0 to 100 marks, time taken ranges from 0 to 20 minutes, while knowledge level ranges from 0 to 100%, as guided by [1].

2.2 Knowledge base and rules module

As informed by [16] the fuzzy having two input linguistic terms (assesment test score and time taken) with 5 and 3 linguistic terms respectively. This brings about 15 rules for every type of assessment test. Thus the three types of assessment tests (type 1, type 2, and type 3) will have a total of 45 rules using Equation (3).

$\begin{matrix} Rn : IF x_{i} is X_{i}^{n} AND x_{l} is X_{l}^{n}, THEN y is Y^{n}, \\ n = 1, 2 . . ., N \end{matrix}$ (3)

Where xi and x_l are inputs, $X_{i}^{n}$ and $X_{l}^{n}$ are antecedent sets, y is output and Yⁿ are consequent sets.

In developing the knowledge bases, expert knowledge was used to guide the relationship between the precedence and the consequents for each rule using Equation (3). The expert knowledge was then summarized using 45 rules in the three test type’s knowledge bases in Table 1.

Table 1

Knowledge base for the various test types

Knowledge bases
Test Type	Time Taken\Test score	Low	Fair	Average	Good	Excellent
Type 1	SHORT	None	Shallow	Shallow	Deep	Expert
	AVERAGE	None	Shallow	Shallow	Shallow	Deep
	LONG	None	None	None	Shallow	Deep
Type 2	SHORT	Shallow	Shallow	Deep	Deep	Expert
	AVERAGE	None	Shallow	Shallow	Deep	Expert
	LONG	None	None	Shallow	Shallow	Deep
Type 3	SHORT	Shallow	Deep	Deep	Expert	Expert
	AVERAGE	Shallow	Shallow	Deep	Deep	Expert
	LONG	None	Shallow	Shallow	Deep	Deep

2.3 Inference engine

Authors [8 , 29] explained that inference engine is the place where the rules stored at the knowledge base are fired in order to get the desired fuzzy outputs. The rules were fired by joining them using logical AND, thus min t-norms, using Equations (4), (5) and (6). ${\underline{F}}^{'} (x^{'}) = {\underline{μ}}_{f 1} ({x^{'}}_{1}) * \dots \dots . * {\underline{μ}}_{fp} ({x^{'}}_{p})$ (4) ${\bar{F}}^{'} (x^{'}) = {\bar{μ}}_{f 1} ({x^{'}}_{1}) * \dots \dots . * {\bar{μ}}_{fp} ({x^{'}}_{p})$ (5)

Where * represent min operation ${Rule}^{n} : R^{n} = [{\underline{μ}}_{A} (\bar{x}) * {\underline{μ}}_{B} (\bar{y}), {\bar{μ}}_{A} (\bar{x}) * {\bar{μ}}_{B} (\bar{y})]$ (6)

Where * is a min t-norm

2.4 Type reducer module

Taking the advice of authors [9 , 27] Karnik–Mendel (KM) type reducer was used for reducing the IT2FS to T1FS (type 1 fuzzy system). To achieve this Equations (7) and (8) was used. $yl = \frac{{\bar{f}}^{1} {\underline{y}}^{1} + {\underline{f}}^{2} {\underline{y}}^{2} + {\underline{f}}^{3} {\underline{y}}^{3} + {\underline{f}}^{4} {\underline{y}}^{4}}{{\bar{f}}^{1} + {\underline{f}}^{2} + {\underline{f}}^{3} + {\underline{f}}^{4}}$ (7) $yr = \frac{{\underline{f}}^{1} {\bar{y}}^{1} + {\bar{f}}^{2} {\bar{y}}^{2} + {\bar{f}}^{3} {\bar{y}}^{3} + {\bar{f}}^{4} {\bar{y}}^{4}}{{\underline{f}}^{1} + {\bar{f}}^{2} + {\bar{f}}^{3} + {\bar{f}}^{4}}$ (8)

Where

R¹ (rule 1) provides [ ${\underline{f}}^{1}$ , ${\bar{f}}^{1}$ ] and y¹ (output 1) provide [ ${\underline{y}}^{1}$ , ${\bar{y}}^{1}$ ]

R² (rule 2) provides [ ${\underline{f}}^{2}$ , ${\bar{f}}^{2}$ ] and y² (output 2) provide [ ${\underline{y}}^{2}$ , ${\bar{y}}^{2}$ ]

R³ (rule 3) provides [ ${\underline{f}}^{3}$ , ${\bar{f}}^{3}$ ] and y³ (output 3) provide [ ${\underline{y}}^{3}$ , ${\bar{y}}^{3}$ ]

R⁴ (rule 4) provides [ ${\underline{f}}^{4}$ , ${\bar{f}}^{4}$ ] and y⁴ (output 4) provide [ ${\underline{y}}^{4}$ , ${\bar{y}}^{4}$ ]

2.5 Defuzzification module

Centroid type reduction was used to perform defuzzification for obtaining crisp value from fuzzy values. Centroid type reduction involves finding the center between yl (minimum) and yr (maximum) values. Therefore the average of Equations (7) and (8) was the centroid value.

3 Fuzzy logic experimental results and discussion

As guided by [4 , 18] each of the three Gaussian fuzzy representing the three knowledge bases, were experimented with 50 random inputs values and their resultant outputs collected and recorded. Out of the 50 records collected for each test type, 35 were used for linear regression model training while the remaining 15 were used for the model validation.

3.1 Type 1 test knowledge level results

A 3Dimension surface view of knowledge level abilities for type 1 test was extracted as shown in Fig. 4.

Fig. 4

Type 1 test surface view.

Fifty (50) random trials on various acceptable assessment test scores and time taken to complete an assessment test were run on type 1 test Gaussian fuzzy, and their respective knowledge level abilities documented as indicated in Table 2.

Table 2

Experimental results for type 1 test

Trials	Test Score	Time	Knowledge level
	(%)	(Mins)	(%) (Type 1 test)
1	95	6	69.2
2	92	10	66.03
3	88	10	56.61
4	87	17	53.3
5	82	6	51.48
6	81	14	44.23
7	79	2	69.8
8	74	16	40.22
9	72	20	40.03
10	66	18	38.87
11	64	6	50.74
12	63	7	46.28
13	61	10	40.6
14	59	17	21.67
15	57	5	45.37
16	54	2	41
17	53	20	8.8
18	50	10	40.18
19	49	17	10.15
20	47	1	40.06
21	42	12	39.54
22	39	5	40
23	38	13	38.03
24	37	20	9.3
25	32	10	39.97
26	29	9	39.9
27	26	17	10.15
28	21	4	36.69
29	14	14	15.04
30	13	12	12.35
31	10	17	9
32	8	18	8.06
33	7	13	8.97
34	2	12	7.86
35	1	1	7.14
Validation Data
36	45	6	40.05
37	40	7	40
38	35	7	39.99
39	85	2	71.13
40	80	3	69.23
41	75	3	69.13
42	5	10	7.39
43	55	5	43.15
44	88	15	56.12
45	24	17	11.02
46	60	5	50.04
47	1	11	7.21
48	5	10	7.39
49	10	10	9
50	15	9	17.64

3.2 Type 2 test knowledge level results

A 3Dimension surface view of knowledge level abilities for type 2 test was extracted as shown in Fig. 5.

Fig. 5

Type 2 test surface view.

From the results (surface view), we can deduce that the lower the assessment test score and the higher the time taken to complete an assessment test, then the lower the student knowledge level. Although, it can be noted that for similar test score and time taken for type 1 test and type 2 test. Type 2 test had slightly higher knowledge levels compared to type 1 test, implying that type 2 test was more technical or advanced than type 1 test, which was practically correct based on the expert knowledge stored in the knowledge bases.

Fifty (50) random trials on various acceptable assessment test scores and time taken to complete an assessment test were run on type 2 test Gaussian fuzzy, and their respective knowledge level abilities documented as indicated in Table 3.

Table 3

Experimental results for type 2 test

Trials	Test Score	Time	Knowledge level
	(%)	(Mins)	(%) (Type 2 test)
1	100	2	92.54
2	100	18	69.71
3	95	1	90.7
4	93	10	87.74
5	93	15	69.2
6	90	2	81.21
7	83	3	70.48
8	80	5	70.12
9	75	3	69.9
10	74	15	49.53
11	70	4	69.09
12	65	14	60
13	65	4	66.83
14	64	18	40.35
15	60	16	44.11
16	56	3	69.13
17	53	8	41.53
18	49	19	40
19	38	15	28.04
20	36	1	45.03
21	29	2	40.3
22	28	3	40.07
23	26	10	39.75
24	21	13	36.29
25	19	8	31.79
26	18	7	29.3
27	14	12	14.66
28	9	9	10.15
29	8	5	35.36
30	7	6	26.85
31	6	2	39.97
32	6	12	7.84
33	5	10	7.89
34	4	4	38.94
35	1	11	7.31
Validation data
36	45	6	50.74
37	94	15	67.43
38	83	7	70.55
39	35	7	43.78
40	73	5	69.56
41	44	16	38.93
42	39	19	30.39
43	24	7	39.2
44	55	5	60.71
45	16	15	21.13
46	35	7	43.78
47	30	8	40.48
48	15	9	17.78
49	50	6	50.74
50	40	7	46.14

3.3 Type 3 test knowledge level results

A 3Dimension surface view of knowledge level abilities for type 3 test was extracted as shown in Fig. 6.

Fig. 6

Type 3 test surface view.

From the results (surface view), we can deduce that the lower the assessment test score and the higher the time taken to complete an assessment test, then the lower the student knowledge level. Although, it can be noted that for similar test score and time taken for type 1 test, type 2 test, and type 3 test. Type 3 test had slightly higher knowledge levels compared to type 2 test, and a much higher knowledge level compared to type 1 test. This implying that type 3 test was more technical or advanced than both type 2 and type 1 tests, which is practically correct.

Fifty (50) random trials on various acceptable assessment test scores and time taken to complete an assessment test were run on type 3 test Gaussian fuzzy, and their respective knowledge level abilities documented as indicated in Table 4.

Table 4

Experimental results for type 3 test

Trials	Test Score	Time	Knowledge level
	(%)	(Mins)	(%) (Type 3 test)
1	100	8	92.18
2	98	12	90.76
3	97	15	71.75
4	95	1	92.83
5	90	2	90.62
6	86	1	90.59
7	85	2	90.62
8	84	13	70.88
9	83	9	70.48
10	80	3	90.92
11	76	7	70.48
12	74	10	69.89
13	72	4	86.68
14	69	13	68.79
15	65	4	84.64
16	64	2	81.84
17	61	7	70.88
18	60	5	72.06
19	59	14	60
20	53	5	69.92
21	51	12	68.83
22	49	4	69.6
23	48	15	49.53
24	46	7	68.67
25	41	5	61.89
26	38	13	50.93
27	36	20	40
28	33	7	44.16
29	29	3	69.13
30	19	8	47.3
31	14	3	41.82
32	11	7	40.29
33	8	9	40.03
34	5	10	39.98
35	3	10	30.98
Validation data
36	94	7	87.42
37	89	20	69.95
38	81	15	70.3
39	78	2	92.6
40	55	5	70.12
41	56	3	70.14
42	42	18	40.28
43	27	10	40.21
44	20	9	42.4
45	40	7	59.02
46	55	5	70.12
47	70	4	86.68
48	15	9	41.53
49	10	10	40.12
50	45	6	66.91

4 Linear regression predictive models

References [3 , 29] explained that linear regression predictive models can predict values that are either part of their training data or new values. Due to more than one independent variables (assessment test score, and time taken), and one dependent variable (knowledge level ability), multiple linear regression Equation (9) was used to calculate the linear regression predictive model for this IT2GF. The linear regression model will enable the development of a predictive model, where the dependent variable will be predicted from the independent variables. $Y = a + b_{1} X_{1} + b_{2} X_{2} + ɛ$ (9)

Where:

Y = dependent variable.

X = Independent variable(s).

a = Intercept.

b = Slope.

ɛ= Regression residual.

Multiple linear regression variables with two variable b1 and b2, as well as intercept a, can be calculated using the formulas in Equations (10) to (12) respectively. $b_{1} = \frac{(\sum x_{2}^{2}) (\sum x_{1} y) - (\sum x_{1} x_{2}) (\sum x_{2} y)}{(\sum x_{1}^{2}) (\sum x_{2}^{2}) - {(\sum x_{1} x_{2})}^{2}}$ (10) $b_{1} = \frac{(\sum x_{1}^{2}) (\sum x_{2} y) - (\sum x_{1} x_{2}) (\sum x_{1} y)}{(\sum x_{1}^{2}) (\sum x_{2}^{2}) - {(\sum x_{1} x_{2})}^{2}}$ (11) $a = \bar{Y} - b_{1} {\bar{X}}_{1} - b_{2} {\bar{X}}_{2}$ (12)

Where

$\bar{y}$ = mean of Y.

$\bar{x}$ = mean of X.

5 Test types linear regression models training

References [2 , 10] pointed out that before training a regression model, there is need to receive sample simulation or experimental results to be used for training, and another set of simulation or experimental results for validation. Experimental results from Tables 2, 3 and 4 were used for training and validating multiple linear regression mathematical predictive models for the test types 1, 2 and 3 respectively. Multiple linear regression was calculated for each of the test type using Microsoft Excel, to establish the relationships between the independent and the dependent variables.

5.1 Type 1 test linear regression training and discussions

On performing multiple linear regression on type 1 test experimental results training data in Table 2, the following results and coefficients were obtained as summarized in Table 5.

Table 5
Type 1 test multiple linear regression results

Regression Statistics

Multiple R 0.915253045

R Square 0.837688137

Adjusted R Square 0.827543646

Standard Error 7.968517142

Observations 35

ANOVA

df SS MS F Significance F

Regression 2 10486.65798 5243.32899 82.5756661 2.3206E-13

Residual 32 2031.912494 63.49726544

Total 34 12518.57047

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 23.59515727 3.888470476 6.067979022 8.9169E-07 15.6746021 31.51571244

Test score (%) 0.539223122 0.04899893 11.00479389 2.0707E-12 0.43941557 0.639030676

Time (Min) –1.38510493 0.234776556 –5.89967307 1.4515E-06 –1.86332912 –0.90688073

5.1.1 Regression and coefficient results

There is R square and Adjusted R square of 83.77% and 82.75 respectively. This shows a very high percentage of variance proportion that the dependent variable could be predicted from the independent variables for this multiple linear regression model. The regression statistics also has an acceptable standard error of 7.97.

A significance F of 2.3206E-13 is far much less than acceptable value of 0.05, implying that there is a strong relationship between the independent variables and the dependent variables. On substituting coefficients in Equation (9), then a predictive multiple linear regression Equation (13) for predicting type 1 test knowledge level ability was achieved. There exist different residual error (ɛ) parameters for each set of independent and dependent values, though were very small values to interfere with the predictions.

$Y = a + b_{1} X_{1} + b_{2} X_{2} + ɛ$ (9)

$\begin{matrix} {Knowledge level}_{(Type 1 test)} \\ = 23.595 + 0.539 test score - 1.385 time + ɛ \end{matrix}$ (13)

There exist minimal standard error with intercept having a value of 3.888, test score having a value of 0.049, and time having a value of 0.235. These are acceptable values of t statistics (coefficient/standard error).

5.2 Type 2 test linear regression training and discussions

On performing multiple linear regression on type 2 test experimental results training data in Table 3, the following results and coefficients were obtained as summarized in Table 6.

Table 6
Type 2 test multiple linear regression results

Regression Statistics

Multiple R 0.957698621

R Square 0.917186649

Adjusted R Square 0.912010815

Standard Error 7.188679499

Observations 35

ANOVA

df SS MS F Significance F

Regression 2 18314.94365 9157.472 177.20556 4.8933E-18

Residual 32 1653.667614 51.67711

Total 34 19968.61127

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 28.71274818 2.772626804 10.35579 9.572E-12 23.0650922 34.360404

Marks 0.655858821 0.036870383 17.78823 3.753E-18 0.58075631 0.7309613

Time –1.44429191 0.221144223 –6.531 2.355E-07 –1.8947479 –0.9938359

5.2.1 Regression and coefficient results

The values of R square and Adjusted R square are 91.72% and 91.20 respectively. This shows a very high percentage of variance proportion that the independent variables could predict the dependent variables for this multiple linear regression model. The regression statistics also has an acceptable standard error of 7.19.

A significance F of 4.89E-18 is far much less than the accepted value of 0.05, implying that there is a strong relationship between the independent variables and the dependent variables. On substituting coefficients in Equation (9), a multiple linear regression Equation (14) for predicting type 2 test knowledge level ability was achieved. There exist different residual error (ɛ) parameters for each set of independent and dependent values, though were very small values to interfere with the prediction outcomes.

$\begin{matrix} {Knowledge level}_{(Type 2 test)} \\ = 28.713 + 0.656 test score - 1.444 time + ɛ \end{matrix}$ (14)

There exist minimal standard error with intercept having a value of 2.772, test score having a value of 0.037, and time having a value of 0.221. These are acceptable values of t statistics (coefficient/standard error).

5.3 Type 3 test linear regression training and discussions

On performing multiple linear regression on type 3 test experimental results training data in Table 4, the following results and coefficients were achieved as summarized in Table 7.

Table 7
Type 3 test multiple linear regression results

Regression Statistics

Multiple R 0.9574627

R Square 0.9167348

Adjusted R Square 0.9115307

Standard Error 5.4816133

Observations 35

ANOVA

df SS MS F Significance F

Regression 2 10586.3657 5293.1829 176.15708 5.33848E-18

Residual 32 961.5387 30.048084

Total 34 11547.9044

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 48.02606 2.71948457 17.659986 4.633E-18 42.48665146 53.565469

Test score (%) 0.5337287 0.03269683 16.323559 4.501E-17 0.467127432 0.60032997

Time (Min) –1.4400459 0.20075922 –7.173 3.826E-08 –1.84897906 –1.0311128

5.3.1 Regression and coefficient results

The values of R square and Adjusted R square are 91.67% and 91.15 respectively. This shows a very high percentage of variance proportion that the independent variables could predict the dependent variables for this regression model. The regression statistics also has an acceptable standard error of 5.48.

A significance F of 5.338E-18 is far much less than accepted values of 0.005, implying that there is a strong relationship between the independent variables and the dependent variables.

On substituting coefficients in Equation (9), a multiple linear regression Equation (15) for predicting type 3 test knowledge level ability was achieved. There exist different residual error (ɛ) parameters for each set of independent and dependent values, though very small values to interfere with the prediction outcomes.

$\begin{matrix} {Knowledge level}_{(Type 3 test)} \\ = 48.026 + 0.534 test score - 1.44 time + ɛ \end{matrix}$ (15)

There exist minimal standard error with intercept having a value of 2.719, test score having a value of 0.033, and time having a value of 0.201. These are acceptable values of t statistics (coefficient/standard error). The P values of the intercept and the independent variables are far much less than 0.001, implying that all the independent variables play a major role in the determination of the dependent variables to a high degree.

6 Linear regression test types model validation

References [11 , 17] advised that there is need to validate regression models to ascertain whether the model is making considerable predictions. The 50 sample experimental results for each test type were divided in a ratio of 70:30, for training and validation respectively. In order to validate our linear regression models, we used the validation data in Tables 2, 3 and 4.

Type 1 test validation experimental data results in Table 2 were used to validate the knowledge level ability for type 1 test multiple linear regression Equation (13). Type 2 test validation experimental data results in Table 3 were used to validate the knowledge level ability for type 2 test multiple linear regression Equation (14). Type 3 test validation experimental data results in Table 4 were used to validate the knowledge level ability for type 3 test multiple linear regression Equation (15).

6.1 Type 1 test linear regression model validation

Linear regression model Equation (13) was used to predict the knowledge levels abilities for type 1 test using the validation data inputs in Table 2.

$\begin{matrix} {Knowledge level}_{(Type 1 test)} \\ = 23.595 + 0.539 test score - 1.385 time + ɛ \end{matrix}$ (13)

Comparisons between predicted knowledge level verses the experimental knowledge level for type 1 test Gaussian fuzzy validation data, is summarized in Table 8 and plotted in graph 1. Considering Table 8 and graph 1 the values of residual (experimental values –predicted values) is within acceptable ranges, making the linear regression model Equation (13) valid. The model validity was also proved using the R square values in Regression and coefficient results in section 5.1.1.

Table 8

Type 1 test validation data results (Showing comparison between experimental results, predicted results and the residual)

Trials	Test score	Time taken	Experimental knowledge level	Predicted knowledge level	Residuals
	(%)	(Min)	(Type 1 test)	(Type 1 test)	(ɛ)
1	45	6	40.05	39.54	0.51
2	40	7	40	35.46	4.54
3	35	7	39.99	32.765	7.225
4	85	2	71.13	66.64	4.49
5	80	3	69.23	62.56	6.67
6	75	3	69.13	59.865	9.265
7	5	10	7.39	12.44	–5.05
8	55	5	43.15	46.315	–3.165
9	88	15	56.12	50.252	5.868
10	24	17	11.02	12.986	–1.966
11	60	5	50.04	49.01	1.03
12	1	11	7.21	8.899	–1.689
13	5	10	7.39	12.44	–5.05
14	10	10	9	15.135	–6.135
15	15	9	17.64	19.215	–1.575

6.2 Type 2 test linear regression model validation

Linear regression model Equation (14) was used to predict the knowledge levels abilities for type 1 test using the validation data inputs in Table 3.

$\begin{matrix} {Knowledge level}_{(Type 2 test)} \\ = 28.713 + 0.656 test score - 1.444 time + ɛ \end{matrix}$ (14)

Comparisons between predicted knowledge level verses the experimental knowledge level for type 1 test Gaussian fuzzy validation data, is summarized in Table 9 and plotted in graph 2. Considering Table 9 and graph 2 the values of residual (experimental values –predicted values) is within acceptable ranges, making the linear regression model Equation (14) valid. The model validity was also proved using the R square values in Regression and coefficient results in section 5.2.1.

Table 9

Type 2 test validation data results (Showing comparison between experimental results, predicted results and the residual)

Trials	Test score	Time taken	Experimental knowledge level	Predicted knowledge level	Residuals
	(%)	(Min)	(Type 2 test)	(Type 2 test)	(ɛ)
1	45	6	50.74	49.569	1.171
2	94	15	67.43	68.717	–1.287
3	83	7	70.55	73.053	–2.503
4	35	7	43.78	41.565	2.215
5	73	5	69.56	69.381	0.179
6	44	16	38.93	34.473	4.457
7	39	19	30.39	26.861	3.529
8	24	7	39.2	34.349	4.851
9	55	5	60.71	57.573	3.137
10	16	15	21.13	17.549	3.581
11	35	7	43.78	41.565	2.215
12	30	8	40.48	36.841	3.639
13	15	9	17.78	25.557	–7.777
14	50	6	50.74	52.849	–2.109
15	40	7	46.14	44.845	1.295

6.3 Type 3 test linear regression model validation

Linear regression model Equation (15) was used to predict the knowledge levels abilities for type 1 test using the validation data inputs in Table 4.

$\begin{matrix} {Knowledge level}_{(Type 3 test)} \\ = 48.026 + 0.534 test score - 1.44 time + ɛ \end{matrix}$ (15)

Comparisons between predicted knowledge level verses the experimental knowledge level for type 1 test Gaussian fuzzy validation data, is summarized in Table 10 and plotted in graph 3. Considering Table 10 and graph 3 the values of residual (experimental values –predicted values) is within acceptable ranges, making the linear regression model Equation (15) valid. The model validity was also proved using the R square values in Regression and coefficient results in section 5.3.1.

Table 10

Type 3 test validation data results (Showing comparison between experimental results, predicted results and the residual)

Trials	Test score	Time taken	Experimental knowledge level	Predicted knowledge level	Residuals
	(%)	(Min)	(Type 3 test)	(Type 3 test)	(ɛ)
1	94	7	87.42	88.142	–0.722
2	89	20	69.95	66.752	3.198
3	81	15	70.3	69.68	0.62
4	78	2	92.6	86.798	5.802
5	55	5	70.12	70.196	–0.076
6	56	3	70.14	73.61	–3.47
7	42	18	40.28	44.534	–4.254
8	27	10	40.21	48.044	–7.834
9	20	9	42.4	45.746	–3.346
10	40	7	59.02	59.306	–0.286
11	55	5	70.12	70.196	–0.076
12	70	4	86.68	79.646	7.034
13	15	9	41.53	43.076	–1.546
14	10	10	40.12	38.966	1.154
15	45	6	66.91	63.416	3.494

Graphs 1, 2 and 3, showed that the predicted knowledge level were very close to the exact knowledge levels, for all the test types. This implied that the linear regression models were valid, as they gave acceptable knowledge ability levels that could be relied upon.

7 Discussion and conclusion

7.1 Discussion

Knowledge base for type 1 test had linear regression predictive model Equation (13) with the coefficients of intercept, assessment test score, and time, being 23.595, 0.539 and –1.385 respectively. Thus, Knowledge level _{(Type 1 test)} = 23.595 + 0.539 test score –1.385time +ɛ.

Knowledge base for type 2 test had linear regression predictive model Equation (14) with the coefficients of intercept, assessment test score, and time, being 28.713, 0.656 and –1.444 respectively. Thus, Knowledge level _{(Type 2 test)} = 28.713 + 0.656 test score –1.444time +ɛ.

Knowledge base for type 3 test had the linear regression predictive model Equation (15) with the coefficients of intercept, assessment test score, and time, being 48.026, 0.534 and –1.44 respectively. Thus, Knowledge level _{(Type 3 test)} = 48.026 + 0.534 test score –1.44time +ɛ.

In order to compare the variances in the knowledge base, we made a comparison between the following test types: - type 1 test verses type 1 test, type 1 test verses type 2 test, and finally type 1 test verses type 3 test. Then we captured the change in coefficients verses the change in variances (where 0 shows no variance and 1 shows a variance), as illustrated in Table 11.

Table 11
Knowledge base variances for the various test types

KNOWLEDGE BASES VARIANCES

From (test type) To (test type) Test Taken\Time Score Low Fair Average Good Excellent Variances (Individual / Sum / % Sum)

Type 1 Type 1 SHORT 0 0 0 0 0 0

AVERAGE 0 0 0 0 0 0 0 0%

LONG 0 0 0 0 0 0

Type 1 Type 2 SHORT 1 0 1 0 0 2

AVERAGE 0 0 0 1 1 2 5 33%

LONG 0 0 1 0 0 1

Type 1 Type 3 SHORT 1 1 1 1 0 4

AVERAGE 1 0 1 1 1 4 11 73%

LONG 0 1 1 1 0 3

KNOWLEDGE BASES VARIANCES
Type 1	Type 1	SHORT	0	0	0	0	0	0
		AVERAGE	0	0	0	0	0	0	0	0%
		LONG	0	0	0	0	0	0
Type 1	Type 2	SHORT	1	0	1	0	0	2
		AVERAGE	0	0	0	1	1	2	5	33%
		LONG	0	0	1	0	0	1
Type 1	Type 3	SHORT	1	1	1	1	0	4
		AVERAGE	1	0	1	1	1	4	11	73%
		LONG	0	1	1	1	0	3

Comparing knowledge base of type 1 test with itself, there was no variance, thus resulting in similar linear regression models of Knowledge level _{(Type 1 test)} = 23.595 + 0.539 test score –1.385time +ɛ, Equation (13). Implying that when there is no variance in knowledge base, then the Gaussian fuzzy outputs and linear regression models will have no changes from the expected values. Thus, linear regression Equation (13) for type 1 test became our base for comparisons.

Alternatively, when we compared knowledge base of type 1 test with knowledge base of type 2 test, then there was variances in 5 rules, which implied 33% of the total 15 rules varied in the knowledge base. The variance changed the values of Gaussian fuzzy output and hence the linear regression model from Equation (13) to Knowledge level _{(Type 2 test)} = 28.713 + 0.656 test score –1.444time +ɛ, Equation (14).

The 5 rules that varied adjusted the linear regression model with an increase in intercept of 5.121, which is about 21% increase on the intercept. There was also a small increase in test score. On changing the subject of the formula to hold the time and test score to being similar to Equation (13) (held constant), then the intercept would have been increase to an average of about 6.12 (implying 25.09% change in intercept). This implied that when a variance exists on knowledge base, but towards higher values, then the intercept increases with a ratio of almost that magnitude.

Lastly when we compared the knowledge base of type 1 test with knowledge base of type 3 test, then there was a variance of 11 rules, which implied 73% of the total 15 rules varied in the knowledge base. The variance changed the linear regression model to, Knowledge level _{(Type 3 test)} = 48.026 + 0.534 test score –1.44time +ɛ, Equation (15). The 11 rules variance adjusted the linear regression model with an increase of intercept with 24.431, which is about 103.54% increase on the intercept. Luckily on this linear regression model both the time and test score, were held at constant (similar to the base Equation (13)), thus the intercept changes were the values of concern.

7.2 Conclusion

Expert knowledge stored in the knowledge bases of Gaussian fuzzy systems play an important role in the calculation of the consequent intelligent output of those systems. From the research, it was evident that the variances in knowledge base greatly affected the output values of the fuzzy as indicated in Tables 2, 3 and 4. These output values were the ones to be used to train linear regression model. Due to these changes it was observed that when 33% of the rules in knowledge base were varied, linear regression predictive model intercept changed by 25.09%. On the other hand when 73% of the rules varied then the linear regression model intercept changed by 103.54%. This implying that in Gaussian fuzzy systems, linear regression predictive models intercepts changes exponentially other than linearly. Hence the higher the variances in expert knowledge stored in the knowledge base, the excessively higher the deviation from the acceptable prediction values. Thus the accuracy of the model depends on the lower the variance in the knowledge base.

References

Sadollah

, Introductory Chapter: Which Membership Function is Appropriate in Fuzzy System? In: Fuzzy Logic Based in Optimization Methods and Control Systems and Its Applications (2018), 4–6.

Gelman

and Hill

, Data Analysis Using Regression and Multilevel/Hierarchical Models 1st Edition: Analytical methods for social research, John Fox Publishers. (2007).

Montgomery

C.D.

, Peck

A.E.

and Vining

G.G.

, Introduction to Linear Regression Analysis 5th Edition, John Wiley and Son s Publications (2012), 12–406.

Wagner

, Juzzy – A Java based Toolkit for Type-2 Fuzzy Logic, 2013 IEEE Symposium Series on Computational Intelligence, Singapore (2013), 234–240.

Abadi

D.M.

and Khooban

H.M.

, Design of optimal Mamdani-type fuzzy controller for non holonomic wheeled mobile robots, Journal of King Saud University – Engineering Sciences Elsevier 27 (2015), 92–100.

, A Brief Tutorial on Interval Type-2 Fuzzy Sets and Systems, IEEE Computational Intelligence Society (2015), 110–115.

Schulz

, Speekenbrink

and Krause

, A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions, in Journal of Mathematical Psychology 85 (2018), 1–16.

Baader

, et al., Decidability and Complexity of Fuzzy Description Logics,KünstlIntell, Springer-Verlag Berlin Heidelberg, 31 (2017), 85–90.

Liu

and Mendel

J.M.

, Aggregation Using Fuzzy Weighted Average, as Computed by the Karnic-Mendel algorithm, IEEE Trans. on Fuzzy Systems 16(1) (2008), 1–12.

10.

Cortese

, Scheike

T.H.

and Martinussen

, Flexible survival regression modelling, PubMed Sage Journals 19(1) (2010), 5–28.

11.

Frost

, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models Paperback, Jim publishing publishers, (2020).

12.

Gogo

K.O.

, Nderu

and Mwang

R.W.

, Fuzzy Logic Based Context Aware Recommender for Smart E-learning Content Delivery, 2018 5th International Conference on Soft Computing & Machine Intelligence (ISCMI), Nairobi, Kenya, (2018), 114–118.

13.

Kothari

C.R.

, Research methodology: methods and techniques, (2nd Eds), New age international (p) limited, New Delhi, India, (2004).

14.

Dambrosio

, Data-based Fuzzy Logic Control Technique Applied to aWind Systemin 72nd Conference of the Italian Thermal Machines Engineering Association, ATI 2017 on 6–8 September 2017, Lecce, Italy, Elsevier, Energy Procedia 126 (2017), 690–697.

15.

Meloun

, Militký

, Hill

and Brereton

G.R.

, Crucial problems in regression modelling and their solutions, Analyst, Tutorial Review 127 (2002), 433–450.

16.

Novak

and Oreski

, Fuzzy Knowledge-Based System for Calculating Course Difficulty Based on Student Perception, Wiley Periodicals, Inc. (2015), 225–233.

17.

Weisberg

, Simulation and Similarity: Using Models to Understand the World, Oxford University Press, New York, (2013), 7–42.

18.

Pasinato

, Mello

E.C.

, Aufaure

and Zimbrão

, Generating Synthetic Data for Context-Aware Recommender Systems in: 2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence, Ipojuca, on 8–11 Sept. (2013), 563–567.

19.

Aliyeva

and Ismayilov

, The analysis of how the choice of membership functions influences the quality of recognition system, 2012 International Symposium on Innovations in Intelligent Systems and Applications, Trabzon, (2012), 1–3.

20.

Talpur

, Saleh

M.N.

and Husain

, An investigation of membership functions on performance of ANFIS for solving classification problems, International Research and Innovation Summit (IRIS 2017), IOP Conf. Series: Materials Science and Engineering 226 (2017), 1–7in.

21.

Castillo

and Melin

, RecentAdvances in Interval Type-2 Fuzzy Systems, Springer Briefs in Computational Intelligence (2012), 7–12.

22.

Ali

, Ali

and Sumait

, Comparison between the Effects of Different Types of Membership Functions on Fuzzy Logic Controller Performance, International Journal of Emerging Engineering Research and Technology 3(3) (2015), 76–83.

23.

Romero

P.F.

, et al., A Fuzzy-based Recommender Approach for Learning Objects Management Systems, in: 11th International Conference on Intelligent Systems Design and Applications, IEEE (2011), 984–989.

24.

Liang

and Mendel

J.M.

, Interval type-2 fuzzy logic systems: Theory and design, IEEE Trans. on Fuzzy Systems 8(5) (2000), 535–550.

25.

Al-Otaibi

and Ykhlef

, Hybrid immunizing solution for job recommender system, Front Comput Sci Higher Education Press and Springer-Verlag Berlin Heidelberg 11(3) (2017), 511–527.

26.

Jing

, Luo

and Zhang

, A Fuzzy Dynamic Belief Logic System, Wiley Periodicals, Inc, International Journal of Intelligent Systems 29 (2014), 687–711.

27.

and Mendel

J.M.

, Aggregation Using the Fuzzy Weighted Average, as Computed by the KM algorithm, IEEE Trans. on Fuzzy Systems 15(6) (2007), 1145–1161.

28.

Ullah

, Ullah

, Ahmad

and Akgül

, On solutions of fuzzy fractional order complex population dynamical model, Numerical Methods for Partial Differential Equations (2020), 1–21.

29.

Akgül

and Akgül

E.K.

, A Novel Method for Solutions of Fourth-Order Fractional Boundary Value Problems, Fractal and Fractional 3(2) (2019), 33.

Regression Statistics
Multiple R	0.915253045
R Square	0.837688137
Adjusted R Square	0.827543646
Standard Error	7.968517142
Observations	35
ANOVA
df	SS	MS	F	Significance F
Regression	2	10486.65798	5243.32899	82.5756661	2.3206E-13
Residual	32	2031.912494	63.49726544
Total	34	12518.57047
	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	23.59515727	3.888470476	6.067979022	8.9169E-07	15.6746021	31.51571244
Test score (%)	0.539223122	0.04899893	11.00479389	2.0707E-12	0.43941557	0.639030676
Time (Min)	–1.38510493	0.234776556	–5.89967307	1.4515E-06	–1.86332912	–0.90688073

Regression Statistics
Multiple R	0.957698621
R Square	0.917186649
Adjusted R Square	0.912010815
Standard Error	7.188679499
Observations	35
ANOVA
df	SS	MS	F	Significance F
Regression	2	18314.94365	9157.472	177.20556	4.8933E-18
Residual	32	1653.667614	51.67711
Total	34	19968.61127
	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	28.71274818	2.772626804	10.35579	9.572E-12	23.0650922	34.360404
Marks	0.655858821	0.036870383	17.78823	3.753E-18	0.58075631	0.7309613
Time	–1.44429191	0.221144223	–6.531	2.355E-07	–1.8947479	–0.9938359

Regression Statistics
Multiple R	0.9574627
R Square	0.9167348
Adjusted R Square	0.9115307
Standard Error	5.4816133
Observations	35
ANOVA
df	SS	MS	F	Significance F
Regression	2	10586.3657	5293.1829	176.15708	5.33848E-18
Residual	32	961.5387	30.048084
Total	34	11547.9044
	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	48.02606	2.71948457	17.659986	4.633E-18	42.48665146	53.565469
Test score (%)	0.5337287	0.03269683	16.323559	4.501E-17	0.467127432	0.60032997
Time (Min)	–1.4400459	0.20075922	–7.173	3.826E-08	–1.84897906	–1.0311128

KNOWLEDGE BASES VARIANCES
From (test type)	To (test type)	Test Taken\Time Score	Low	Fair	Average	Good	Excellent	Variances (Individual / Sum / % Sum)
Type 1	Type 1	SHORT	0	0	0	0	0	0
		AVERAGE	0	0	0	0	0	0	0	0%
		LONG	0	0	0	0	0	0
Type 1	Type 2	SHORT	1	0	1	0	0	2
		AVERAGE	0	0	0	1	1	2	5	33%
		LONG	0	0	1	0	0	1
Type 1	Type 3	SHORT	1	1	1	1	0	4
		AVERAGE	1	0	1	1	1	4	11	73%
		LONG	0	1	1	1	0	3